scispace - formally typeset
Search or ask a question
Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.


Papers
More filters
Patent
04 Sep 2001
TL;DR: In this article, an audio converter device and a method for using the same is presented, in which the audio data is decompressed and converted into analog electrical data and then transferred to an audio playback device.
Abstract: An audio converter device and a method for using the same are provided. In one embodiment, the audio converter device receives the digital audio data from a first device via a local area network. The audio converter device decompresses the digital audio data and converts the digital audio data into analog electrical data. The audio converter device transfers the analog electrical data to an audio playback device.

141 citations

Journal ArticleDOI
TL;DR: The proposed method is based on detecting phase discontinuity of the power grid signal, referred to as electric network frequency (ENF), which is sometimes embedded in audio signals when the recording is carried out with the equipment connected to an electrical outlet or when certain microphones are in an ENF magnetic field.
Abstract: This paper addresses a forensic tool used to assess audio authenticity. The proposed method is based on detecting phase discontinuity of the power grid signal; this signal, referred to as electric network frequency (ENF), is sometimes embedded in audio signals when the recording is carried out with the equipment connected to an electrical outlet or when certain microphones are in an ENF magnetic field. After down-sampling and band-filtering the audio around the nominal value of the ENF, the result can be considered a single tone such that a high-precision Fourier analysis can be used to estimate its phase. The estimated phase provides a visual aid to locating editing points (signalled by abrupt phase changes) and inferring the type of audio editing (insertion or removal of audio segments). From the estimated values, a feature is used to quantify the discontinuity of the ENF phase, allowing an automatic decision concerning the authenticity of the audio evidence. The theoretical background is presented along with practical implementation issues related to the proposed technique, whose performance is evaluated on digitally edited audio signals.

141 citations

Journal ArticleDOI
TL;DR: An acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results and the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.
Abstract: We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries-which must be done to provide machine learning models with ground truth-by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100% accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.

140 citations

Posted Content
TL;DR: Madmom is an open-source audio processing and music information retrieval (MIR) library written in Python that features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters that facilitates fast prototyping of MIR applications.
Abstract: In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototyping of MIR applications. Prototypes can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, callable objects that run transparently on multiple cores. Processors can also be serialised, saved, and re-run to allow results to be easily reproduced anywhere. Apart from low-level audio processing, madmom puts emphasis on musically meaningful high-level features. Many of these incorporate machine learning techniques and madmom provides a module that implements some in MIR commonly used methods such as hidden Markov models and neural networks. Additionally, madmom comes with several state-of-the-art MIR algorithms for onset detection, beat, downbeat and meter tracking, tempo estimation, and piano transcription. These can easily be incorporated into bigger MIR systems or run as stand-alone programs.

140 citations

PatentDOI
Daniel N. Ozick1
TL;DR: In this article, a song-matching system was proposed to provide real-time, dynamic recognition of a song being sung and providing an audio accompaniment signal in synchronism therewith.
Abstract: A song-matching system, which provides real-time, dynamic recognition of a song being sung and providing an audio accompaniment signal in synchronism therewith, includes a song database having a repertoire of songs, each song of the database being stored as a relative pitch template, an audio processing module operative in response to the song being sung to convert the song being sung into a digital signal, an analyzing module operative in response to the digital signal to determine a definition pattern representing a sequence of pitch intervals of the song being sung that have been captured by the audio processing module, a matching module operative to compare the definition pattern of the song being sung with the relative pitch template of each song stored in the song database to recognize one song in the song database as the song being sung, the matching module being further operative to cause the song database to download the unmatched portion of the relative pitch template of the recognized song as a digital accompaniment signal; and a synthesizer module operative to convert the digital accompaniment signal to the audio accompaniment signal that is transmitted in synchronism with the song being sung.

140 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Robustness (computer science)
94.7K papers, 1.6M citations
78% related
Noise
110.4K papers, 1.3M citations
77% related
Image segmentation
79.6K papers, 1.8M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202263
2021217
2020525
2019659
2018597