Topic
Audio signal processing
About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.
Papers published on a yearly basis
Papers
More filters
•
20 Nov 1987
TL;DR: In this article, the quantizing of the sample values in the sub-bands, e.g. 24 subbands, is controlled to the extent that the quantising noise levels of the individual sub-band signals are at approximately the same level difference from the masking threshold of the human auditory system resulting from the individual subsets.
Abstract: In the transmission of audio signals, the audio signal is digitally represented by use of quadrature mirror filtering in the form a plurality of spectral sub-band signals. The quantizing of the sample values in the sub-bands, e.g. 24 sub-bands, is controlled to the extent that the quantizing noise levels of the individual sub-band signals are at approximately the same level difference from the masking threshold of the human auditory system resulting from the individual sub-band signals. The differences of the quantizing noise levels of the sub-band signals with respect to the resulting masking threshold are set by the difference between the total information flow required for coding and the total information flow available for coding. The available total information flow is set and may then fluctuate as a function of the signal.
234 citations
•
01 Jan 1992
TL;DR: In this article, the authors proposed a method to solve the problem of "uniformity" in the literature. But, the method was ineffective. And, also, incomplete.
Abstract: Outline:
233 citations
••
TL;DR: The goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music, and results show that efficiency is exceptionally good, without sacrificing performance.
Abstract: Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.
232 citations
•
25 Jul 1974
TL;DR: In this article, a video system has been proposed in which a composite signal is formed by combining digital information with scanlines of an analog video signal generated by a line scanning device, and a receiver in which the digital information is recovered.
Abstract: A video system having a transmitter in which a composite signal is formed by combining digital information with scanlines of an analog video signal generated by a line scanning device, and a receiver in which the digital information is recovered. The digital information is combined with the analog video signal at predetermined locations along scanlines of the video signal, and these predetermined locations are varied in order to prevent visible deterioration of the video image. In the video receiver, the digital information is recovered by examining the composite signal at the predetermined locations to extract the digital information. Each bit of digital information to be conveyed is represented by a first pseudo-random digital pulse sequence (or its complement, depending on whether the data bit is 1 or 0) which is superimposed on a selected scanline of the analog video signal to form the composite signal. The digital information is recovered at the receiver by generating a second pseudo-random digital pulse sequence in synchronism with the first sequence, and by examining the composite signal at locations determined by the second digital pulse sequence to extract the digital information contained in the composite signal.
232 citations
••
TL;DR: This paper discusses the most relevant binaural perception phenomena exploited by BCC and presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer and suggests that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer.
Abstract: Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.
231 citations