scispace - formally typeset
Search or ask a question
Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a convolutional neural network based open-source pipeline was developed for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats.
Abstract: Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio.

149 citations

Patent
Gilad Cohen1, Yossef Cohen1, Doron Hoffman1, Hagai Krupnik1, Aharon Satt1 
04 Mar 1998
TL;DR: In this paper, a method for adaptively switching between transform audio coder and CELP coder, which makes use of the superior performance of cELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals.
Abstract: Apparatus is described for digitally encoding an input audio signal for storage or transmission. A distinguishing parameter is measure from the input signal. It is determined from the measured distinguishing parameter whether the input signal contains an audio signal of a first type or a second type. First and second coders are provided for digitally encoding the input signal using first and second coding methods respectively and a switching arrangement directs, at any particular time, the generation of an output signal by encoding the input signal using either the first or second coders according to whether the input signal contains an audio signal of the first type or the second type at that time. A method for adaptively switching between transform audio coder and CELP coder, is presented. In a preferred embodiment, the method makes use of the superior performance of CELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals. The combined coder is designed to handle both speech and music and achieve an improved quality.

148 citations

Patent
Michael M. Lee1
02 Apr 2008
TL;DR: In this paper, the authors present a system for altering an audio output to sound as if a different person had recorded it when it was played back when the audio data file was sent to the system.
Abstract: Methods, systems and computer readable media for altering an audio output are provided. In some embodiments, the system may change the original frequency content of an audio data file to a second frequency content so that a recorded audio track will sound as if a different person had recorded it when it is played back. In other embodiments, the system may receive an audio data file and a voice signature, and it may apply the voice signature to the audio data file to alter the audio output of the audio data file. In that instance, the audio data file may be a textual representation of a recorded audio data file.

148 citations

Journal ArticleDOI
TL;DR: A transition from mere association in beginner readers to more automatic, but still not “adult-like,” integration in advanced readers is indicated and evidence for an extended development of letter–speech sound integration is provided.
Abstract: In transparent alphabetic languages, the expected standard for complete acquisition of letter-speech sound associations is within one year of reading instruction. The neural mechanisms underlying the acquisition of letter-speech sound associations have, however, hardly been investigated. The present article describes an ERP study with beginner and advanced readers in which the influence of letters on speech sound processing is investigated by comparing the MMN to speech sounds presented in isolation with the MMN to speech sounds accompanied by letters. Furthermore, SOA between letter and speech sound presentation was manipulated in order to investigate the development of the temporal window of integration for letter-speech sound processing. Beginner readers, despite one year of reading instruction, showed no early letter-speech sound integration, that is, no influence of the letter on the evocation of the MMN to the speech sound. Only later in the difference wave, at 650 msec, was an influence of the letter on speech sound processing revealed. Advanced readers, with 4 years of reading instruction, showed early and automatic letter-speech sound processing as revealed by an enhancement of the MMN amplitude, however, at a different temporal window of integration in comparison with experienced adult readers. The present results indicate a transition from mere association in beginner readers to more automatic, but still not "adult-like," integration in advanced readers. In contrast to general assumptions, the present study provides evidence for an extended development of letter-speech sound integration.

147 citations

Journal ArticleDOI
TL;DR: A new frequency domain approach to blind source separation (BSS) of audio signals mixed in a reverberant environment using a joint diagonalization procedure on the cross power spectral density matrices to identify the mixing system at each frequency bin up to a scale and permutation ambiguity.
Abstract: In this paper, we propose a new frequency domain approach to blind source separation (BSS) of audio signals mixed in a reverberant environment. We propose a joint diagonalization procedure on the cross power spectral density matrices of the signals at the output of the mixing system to identify the mixing system at each frequency bin up to a scale and permutation ambiguity. The frequency domain joint diagonalization is performed using a new and quickly converging algorithm which uses an alternating least-squares (ALS) optimization method. The inverse of the mixing system is then used to separate the sources. An efficient dyadic algorithm to resolve the frequency dependent permutation ambiguities that exploits the inherent nonstationarity of the sources is presented. The effect of the unknown scaling ambiguities is partially resolved using an initialization procedure for the ALS algorithm. The performance of the proposed algorithm is demonstrated by experiments conducted in real reverberant rooms. Performance comparisons are made with previous methods.

147 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Robustness (computer science)
94.7K papers, 1.6M citations
78% related
Noise
110.4K papers, 1.3M citations
77% related
Image segmentation
79.6K papers, 1.8M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202263
2021217
2020525
2019659
2018597