scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Audio Engineering Society in 2005"


Journal Article
TL;DR: The theory of recording and reproduction of three-dimensional sound fields based on spherical harmonics is reviewed and extended in this paper, where mode-matching and simple source approaches to sound reproduction in anechoic environments are discussed.
Abstract: The theory of recording and reproduction of three-dimensional sound fields based on spherical harmonics is reviewed and extended. Free-field, sphere, and general recording arrays are reviewed, and the mode-matching and simple source approaches to sound reproduction in anechoic environments are discussed. Both methods avoid the need for both monopole and dipole loudspeakers—as required by the Kirchhoff–Helmholtz integral. An error analysis is presented and simulation examples are given. It is also shown that the theory can be extended to sound reproduction in reverberant environments.

467 citations


Journal Article
TL;DR: Reference LCAV-CONF-2005-031 URL: www.aes.org Record created on 2005-10-07, modified on 2016-08-08.
Abstract: Reference LCAV-CONF-2005-031 URL: www.aes.org Record created on 2005-10-07, modified on 2016-08-08

212 citations


Journal Article
TL;DR: This poster presents a poster presented at the 2016 American Association for the Advancement of Science (Aes) conference on quantitative analysis of liquid chromatography for the determination of Na6(CO3)2, Na2SO4, and Na2CO3 of LaSalle-Bouchut-Boyaval.
Abstract: Reference LCAV-CONF-2005-029 URL: www.aes.org Record created on 2005-10-07, modified on 2017-05-12

182 citations


Journal Article
TL;DR: Spatial impulse response rendering (SIRR) analyzes the time-dependent direction of arrival and diffuseness of measured room responses within frequency bands to synthesize a multichannel response suitable for reproduction with any chosen surround loudspeaker setup.
Abstract: Spatial impulse response rendering (SIRR) is a recent technique for the reproduction of room acoustics with a multichannel loudspeaker system. SIRR analyzes the time-dependent direction of arrival and diffuseness of measured room responses within frequency bands. Based on the analysis data, a multichannel response suitable for reproduction with any chosen surround loudspeaker setup is synthesized. When loaded to a convolving reverberator, the synthesized responses create a very natural perception of space corresponding to the measured room. A technical description of the analysis-synthesis method is provided. Results of formal subjective evaluation and further analysis of SIRR are presented in a companion paper to be published in JAES in 2006 Jan./Feb.

166 citations



Journal Article
TL;DR: It is contended that the NPP case is effectively solved by fast intensity change discrimination processes, but that stable pitch cues may provide a better tactic for the latter, with substantially worse onset detection overall for the PNP case.
Abstract: Whilst many onset detection algorithms for musical events in audio signals have been proposed, comparative studies of their efficacy for segmentation tasks are much rarer. This paper follows the lead of Bello et al. 04, using the same hand marked test database as a benchmark for comparison. That previous paper did not include in the comparison a psychoacoustically motivated algorithm originally proposed by Klapuri in 1999, an oversight which is corrected herein with respect to a number of variants of that model. Primary test domains are formed of non-pitched percussive (NPP) and pitched non-percussive (PNP) sound events. 16 detection functions are investigated, including a number of novel and recently published models. Different detection functions are seen to perform well in each case, with substantially worse onset detection overall for the PNP case. It is contended that the NPP case is effectively solved by fast intensity change discrimination processes, but that stable pitch cues may provide a better tactic for the latter.

97 citations


Journal Article
TL;DR: A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented and Rigorous error bounds and Nyquist-like sampling criterion for the representation of the sound field are presented and verified.
Abstract: A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented. The sound capture is performed with a spherical microphone array. The sound field at the location of the array is deduced from the captured sound and is represented using either spherical wave-functions or plane-wave expansions. The sound field representation is then transmitted to a remote location for immediate rendering or stored for later use. The sound renderer, coupled with the head tracker, reconstructs the acoustic field using individualized head-related transfer functions to preserve the perceptual spatial structure of the audio scene. Rigorous error bounds and Nyquist-like sampling criterion for the representation of the sound field are presented and verified.

90 citations


Journal Article
TL;DR: Reference LCAV-CONF-2005-033 URL: www.aes.org Record created on 2005-10-07, modified on 2017-05-12.
Abstract: Reference LCAV-CONF-2005-033 URL: www.aes.org Record created on 2005-10-07, modified on 2017-05-12

79 citations


Journal Article
TL;DR: The basic elements of the ALS codec are described with a focus on prediction, entropy coding, and related tools and the most important applications of this new lossless audio format are pointed out.
Abstract: MPEG-4 Audio Lossless Coding (ALS) is a new extension of the MPEG-4 audio coding family. The ALS core codec is based on forward-adaptive linear prediction, which offers remarkable compression together with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material. In this paper authors who have actively contributed to the standard describe the basic elements of the ALS codec with a focus on prediction, entropy coding, and related tools. We also present latest developments in the standardization process and point out the most important applications of this new lossless audio format.

67 citations


Journal Article
TL;DR: In this article, a listening experiment was done to estimate the lowest directional resolution with which HRTFs have to be measured to ensure that interpolations between them do not introduce audible errors, and the measurements were used to create HRTF data sets with low resolution from which interpolations were made in the horizontal, frontal, and median planes.
Abstract: In binaural synthesis a virtual sound source is implemented by convolving an anechoic signal with a pair of head-related transfer functions (HRTFs). In order to represent all possible directions of the sound source with respect to the listener a discrete number of HRTFs are measured and interpolations are made in between. A listening experiment was done to estimate the lowest directional resolution with which HRTFs have to be measured to ensure that interpolations between them do not introduce audible errors. In order to make this study the HRTFs of an artificial head were measured with a directional resolution of 2°. The measurements were used to create HRTF data sets with low resolution from which interpolations were made in the horizontal, frontal, and median planes. Measured and interpolated HRTFs were compared in a three-alternative forced-choice listening experiment for both stationary and moving sound sources. A criterion was found that predicts the experimental results. This criterion was used to estimate the directional resolution required in binaural synthesis for all directions on the sphere around the head.

62 citations



Journal Article
TL;DR: In this article, a model for predicting the audibility of time-varying signals in background sounds is described, which requires the calculation of time varying excitation patterns for the signal and background, using the methods described elsewhere.
Abstract: A model for predicting the audibility of time-varying signals in background sounds is described. The model requires the calculation of time-varying excitation patterns for the signal and background, using the methods described elsewhere. A quantity called instantaneous partial loudness (IPL) is calculated from the excitation patterns. The estimates of IPL, which are updated every 1 ms, are used to calculate the short-term partial loudness (STPL) using a form of running average similar to an automatic gain control system. It is assumed that the audibility of the signal is monotonically related to the average value of the STPL over the duration of the signal. In experiment 1 thresholds were measured for detecting a 1-kHz sinusoid in four different samples each of white and pink frozen noise. The results were used to determine the average value of the STPL required for threshold. In experiment 2 the model was evaluated by measuring detection thresholds for nine signal types in six backgrounds (54 combinations), using a two-alternative forced-choice task. The backgrounds were chosen to be relatively steady (such as traffic noise). The correlation between the measured masked thresholds and those predicted by the model was 0.94. The root-meansquare difference between the thresholds obtained and those predicted was 3 dB. In experiment 3 psychometric functions were measured for the detection of five signals in five backgrounds (five pairs), using a two-alternative forced-choice task. Experiment 4 used the same signals and backgrounds, but psychometric functions were measured using a single-interval yes-no task. The results of experiments 3 and 4 were used to construct functions relating signal detectability d' to the average value of the STPL.

Journal Article
TL;DR: In this article, an autoregressive modeling of the amplitude and frequency parameters of these components allows us to interpolate missing audio data realistically, especially in the case of musical modulations such as vibrato or tremolo.
Abstract: Within the context of sinusoidal modeling, a new method for the interpolation of sinusoidal components is proposed. It is shown that autoregressive modeling of the amplitude and frequency parameters of these components allows us to interpolate missing audio data realistically, especially in the case of musical modulations such as vibrato or tremolo. The problem of phase discontinuity at the gap boundaries is also addressed. Finally, an original algorithm for the interpolation of a missing region of a whole set of sinusoids is presented. Objective and subjective tests show that the quality is improved significantly compared to common sinusoidal and temporal interpolation techniques of missing audio data.

Journal Article
TL;DR: In this paper, a stable and load-invariant self-oscillation condition is developed for a class D amplifier employing only one single voltage feedback loop taking off after the output filter.
Abstract: A stable and load-invariant self-oscillation condition is developed for a class D amplifier employing only one single voltage feedback loop taking off after the output filter. The resulting control method is shown to effectively remove the output filter from the closed loop response. Practical discrete implementations of a comparator and gate-drive circuit are presented. A high-performance class D amplifier employing only 14 discrete transistors is constructed. Higher-order extensions of the control circuit are demonstrated which produce extremely low levels of distortion.


Journal Article
TL;DR: This paper computed the relation of this low-level feature to semantic music descriptions on 7750 tracks with manually annotated semantic labels supporting the hypothesis that it can be linked to a musical attribute which might be described as “danceability”.
Abstract: Detrended fluctuation analysis (DFA) has been proposed by Peng et al. [1] to be used on biomedical data. It originates from fractal analysis and reveals correlations within data series across different time scales. Jennings et al. [2] used a DFA-derived feature, the detrended variance fluctuation exponent, for musical genre classification introducing the method to the music analysis field. In this paper we further exploit the relation of this low-level feature to semantic music descriptions. It was computed on 7750 tracks with manually annotated semantic labels like “Energetic” or “Melancholic”. We found statistically strong associations between some of these labels and this feature supporting the hypothesis that it can be linked to a musical attribute which might be described as “danceability”.

Journal Article
TL;DR: The paper describes the basic elements of the ALS codec and presents the latest developments in the standardization process and describes several important applications of this new lossless audio format in practice.
Abstract: MPEG-4 Audio Lossless Coding (ALS) is a new addition to the suite of MPEG-4 audio coding standards. The ALS codec is based on forward-adaptive linear prediction, which offers remarkable compression even with low predictor orders. Nevertheless, performance can be significantly improved by using higher predictor orders, more efficient quantization and encoding of the predictor coefficients, and adaptive block length switching. The paper describes the basic elements of the ALS codec with a focus on these recent improvements. It also presents the latest developments in the standardization process and describes several important applications of this new lossless audio format in practice.




Journal Article
TL;DR: A family of digital parametric audio equalizers based on high-order Butterworth, Chebyshev, and elliptic analog prototype filters is derived that generalizes the conventional biquadratic designs and provides flatter passbands and sharper bandedges.
Abstract: A family of digital parametric audio equalizers based on high-order Butterworth, Chebyshev, and elliptic analog prototype filters is derived that generalizes the conventional biquadratic designs and provides flatter passbands and sharper bandedges. The equalizer filter coefficients are computable in terms of the center frequency, peak gain, bandwidth, and bandwidth gain. We consider the issues of filter order and bandwidth selection, and discuss frequency-shifted transposed, normalized-lattice, and minimum roundoff-noise state-space realization structures. The design equations apply equally well to lowpass and highpass shelving filters, and to ordinary bandpass and bandstop filters.

Journal Article
TL;DR: It can be concluded that in the case of broadcasting multichannel audio under highly restricted transmission conditions, it is better, in terms of basic audio quality, to sacrifice spatial fidelity by down-mixing original multich channel audio material to a lower number of broadcast audio channels than to sacrifice the timbral fidelity by transmitting all channels with limited bandwidths.
Abstract: The effect on audio quality of controlled multichannel audio bandwidth limitation and selected down-mix algorithms was quantified using one generic attribute (basic audio quality) and three specific attributes (timbral fidelity, frontal spatial fidelity, and surround spatial fidelity). The investigation was focused on the standard 5.1 multichannel audio setup (ITU-R BS.775-1) and was limited to the optimum listening position. The results obtained from a panel of experienced listeners indicate that the basic audio quality of multichannel recordings is more affected by timbral fidelity than by spatial fidelities. Therefore it can be concluded that in the case of broadcasting multichannel audio under highly restricted transmission conditions, it is better, in terms of basic audio quality, to sacrifice spatial fidelity by down-mixing original multichannel audio material to a lower number of broadcast audio channels than to sacrifice the timbral fidelity by transmitting all channels with limited bandwidths.

Journal Article
TL;DR: In this paper, a study of how the directivity characteristics of artificial mouths correspond to the directivities of a real speaker is presented, where the basic mechanisms that produce the directive patterns are discussed, and the contribution of the speech content is shown.
Abstract: A study of how the directivity characteristics of artificial mouths correspond to the directivity of a real speaker is presented. The primary motivation for the research was the measurement methods applied in the telecommunications industry for the microphones used in telephones and their accessories. The responses of three artificial mouth simulators were measured in several positions. The same measurements were repeated for a group of test subjects. The measurement positions corresponded to the same positions where the microphones of telephones and their accessories, so-called headsets, would be situated. The basic mechanisms that produce the directivity patterns are discussed, and the contribution of the speech content is shown. The main contributor to the directivity is the aperture size of the mouth. The acoustical characteristics of the upper body are also a significant factor if the microphone position is not directly in front of the mouth. A greater than 10-dB difference with wide-band speech was found between artificial mouths and test subjects. It appears that the directivities of the artificial mouths are too narrow at high frequencies. To improve the correspondence of telephonometry and real speakers, a simple equalization procedure and two structural improvements are proposed.

Journal Article
TL;DR: A new immersive multisensory environment recently constructed at McGill University, designed for network-based communication for music performance coordinated between remote sites, potentially over great distance is described.
Abstract: Broadband Internet (transmission rates more than a gigabit per second) enables bidirectional real-time transmission of multiple streams of audio, video, and motion data with latency dependent on distance plus network and processing delays. In this article we describe a new immersive multisensory environment recently constructed at McGill University, designed for network-based communication for music performance coordinated between remote sites, potentially over great distance. The system's architecture allows participants to experience the music with greatly enhanced presence through the use of multiple sensors and effectors and high-resolution multimodal transmission channels. Up to 24 channels of audio, digital video, and four channels of vibration can be sent and received over the network simultaneously, allowing a number of diverse applications such as remote music teaching, student auditions, jam sessions and concerts, recording sessions, and postproduction for remotely-captured live events. The technical and operational challenges of this undertaking are described, as well as potential future applications.

Journal Article
TL;DR: A multichannel audio quality expert system is developed to predict audio quality as a function of individual channel bandwidth and to find the optimum band-limitation algorithm or down-mix algorithm for a given total transmission bandwidth of a multich channel audio signal.
Abstract: The basic audio quality of 5.1 multichannel audio reproduction was evaluated subjectively under different technical conditions. The resulting database of subjective responses was used to develop a multichannel audio quality expert system. There are three aims of this development: 1) to predict audio quality as a function of individual channel bandwidth; 2) to predict audio quality as a function of selected down-mix algorithms; and 3) to find the optimum band-limitation algorithm or down-mix algorithm for a given total transmission bandwidth of a multichannel audio signal. Results indicate a close correspondence between predicted and actual quality ratings. It is intended that the final version of the quality expert system will be suitable as a decision-making aid for broadcasters and codec designers.


Journal Article
TL;DR: A new cognitive model is proposed that aims at overcoming some of the limitations of the current psychoacoustic models and extracts a set of parameters from the audio signal which provide high-quality information about the signal.
Abstract: The objective assessment of audio quality has made great progress with the introduction of methods based on improved psychoacoustic models Among these, PEAQ, an ITU standard, deserves special attention However, the performance of such methods is still poor for a large number of situations A new cognitive model is proposed that aims at overcoming some of the limitations The new model extracts a set of parameters from the audio signal which provide high-quality information about the signal An alternative mapping technique, based on Kohonen self-organizing maps, maps these parameters into an estimate of the subjective quality The performance of this new approach is compared to that achieved by PEAQ

Journal Article
TL;DR: In this paper, the effect of the signal frequency on the vertical position of low-and high-frequency auditory image pairs was investigated for the frequency bands characteristic of woofers and tweeters in loudspeakers.
Abstract: Practical wide-range loudspeakers are usually implemented with multiple drivers, but the systematic effect of the signal frequency upon the vertical localization of sound is scarcely used for loudspeaker enclosure design. Tendencies in vertical localization for the frequency bands characteristic of woofers and tweeters in loudspeakers are shown. Using vertical arrays of individually controlled loudspeakers, synchronous and asynchronous bands of noise were presented to subjects. The frequency of the source affected the vertical position of the low- and high-frequency auditory image pairs significantly and systematically, in a manner broadly consistent with previous studies concerned with single auditory images. Lower frequency sources are localized below their physical positions whereas high-frequency sources are localized at their true positions. This effect is also shown to occur for musical signals. It is demonstrated that low-frequency sources are not localized well when presented in exact synchrony with high-frequency sources, or when they only include energy below 500 Hz.

Journal Article
TL;DR: This paper addresses the issue of causal rhythmic analysis, primarily towards predicting the locations of musical beats such that they are consistent with a musical audio input.
Abstract: In this paper we address the issue of causal rhythmic analysis, primarily towards predicting the locations of musical beats such that they are consistent with a musical audio input. This will be a key component required for a system capable of automatic accompaniment with a live musician. We are implementing our approach as part of the aubio real-time audio library. While performance for this causal system is reduced in comparison to our previous non-causal system, it is still suitable for our intended purpose.

Journal Article
Ronald M. Aarts1
TL;DR: In this paper, the authors defined a theoretically relevant and analytically tractable optimality criterion, involving the loudspeaker parameters, for low-force-factor (LF) drivers.
Abstract: Normally, low-frequency sound reproduction with small transducers is quite inefficient. This is shown by calculating the efficiency and voltage sensitivity for loudspeakers with high, medium, and, in particular, low force factors. For these low-force-factor loudspeakers a practically relevant and analytically tractable optimality criterion, involving the loudspeaker parameters, will be defined. Actual prototype bass drivers are assessed according to this criterion. Because the magnet can be considerably smaller than usual, the loudspeaker can be of the moving-magnet type with a stationary coil. These so-called low-Bl drivers have a high efficiency, however, only in a limited frequency region. To deal with that, nonlinear processing essentially compresses the bandwidth of a 20–120-Hz bass signal down to a much more narrow span. This span is centered at the resonance of the low-Bl driver, where its efficiency is maximum. The signal processing preserves the temporal envelope modulations of the original bass signal. The compression is at the expense of a decreased sound quality and requires some additional electronics. This new, optimal design has a much higher power efficiency as well as a higher voltage sensitivity than current bass drivers, while the cabinet may be much smaller.