scispace - formally typeset
Search or ask a question
Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.


Papers
More filters
Journal ArticleDOI
TL;DR: This framework can be used to consider the cortical basis of complex sound processing in humans, including implications for speech perception, spatial auditory processing and auditory scene segregation.

125 citations

Journal ArticleDOI
TL;DR: The combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.
Abstract: Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this paper, we address a higher level retrieval problem, which we refer to as audio matching: given a short query audio clip, the goal is to automatically retrieve all excerpts from all recordings within the database that musically correspond to the query. In our matching scenario, opposed to classical audio identification, we allow semantically motivated variations as they typically occur in different interpretations of a piece of music. To this end, this paper presents an efficient and robust audio matching procedure that works even in the presence of significant variations, such as nonlinear temporal, dynamical, and spectral deviations, where existing algorithms for audio identification would fail. Furthermore, the combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.

125 citations

Journal ArticleDOI
TL;DR: The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.
Abstract: Audio feature extraction and classification are important tools for audio signal analysis in many applications, such as multimedia indexing and retrieval, and auditory scene analysis. However, due to the nonstationarities and discontinuities exist in these signals, their quantification and classification remains a formidable challenge. In this paper, we develop a new approach for audio feature extraction to effectively quantify these nonstationarities in an attempt to achieve high classification accuracy for environmental audio signals. Our approach consists of three stages: first we propose to construct the time-frequency matrix (TFM) of audio signals using matching-pursuit time-frequency distribution (MP-TFD) technique, and then apply the non-negative matrix decomposition (NMF) technique to decompose the TFM into its significant components. Finally, we propose seven novel features from the spectral and temporal structures of the decomposed vectors in a way that they successfully represent joint TF structure of the audio signal, and combine them with the Mel-frequency cepstral coefficients (MFCCs) features. These features are examined using a database of 192 environmental audio signals which includes 20 aircraft, 17 helicopter, 20 drum, 15 flute, 20 piano, 20 animal, 20 bird, and 20 insect sounds, and the speech of 20 males and 20 females. The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.

124 citations

Patent
09 Mar 1994
TL;DR: An audio/video input/output (I/O) port apparatus was proposed in this article for acquiring digital audio samples from one or multiple channels of input audio and synthesizing audio samples into one or more channels of output audio.
Abstract: An audio/video input/output (I/O) port apparatus for acquiring digital audio samples from one or multiple channels of input audio and synthesizing digital audio samples into one or multiple channels of output audio. The apparatus comprises a video I/O port, a frequency synthesizer, and an audio I/O port. The video I/O port generates a video-rate clock, and is configured to digitize input video into digital video, and to synthesize output video from digital video. The frequency synthesizer is configured to derive an audio sampling clock based on the video-rate clock. The audio I/O port is configured to sample input audio and convert it into digital audio samples according to the sampling clock, and to synthesize digital audio samples into output audio according to the sampling clock. The apparatus ensures that the video and audio data track together, both when inputting the information from an external source and when outputting the audio/video data streams. The technique is particularly valuable in video editing, where it is critical to establish and maintain synchronization between the video of a speaking person and the audio representing the spoken material.

124 citations

Patent
04 Oct 1993
TL;DR: In this article, the analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters.
Abstract: Synthetic head related transfer functions (HRTFs) for imposing reprogrammable spatial cues to a plurality of audio input signals included, for example, in multiple narrow-band audio communications signals received simultaneously are generated and stored in interchangeable programmable read only memories (PROMs) which store both head related transfer function impulse response data and source positional information for a plurality of desired virtual source locations. The analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters. The outputs of the impulse response filters are subsequently reconverted to analog signals, filtered, mixed and fed to a pair of headphones.

123 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Robustness (computer science)
94.7K papers, 1.6M citations
78% related
Noise
110.4K papers, 1.3M citations
77% related
Image segmentation
79.6K papers, 1.8M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202263
2021217
2020525
2019659
2018597