scispace - formally typeset
Search or ask a question
Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.


Papers
More filters
Patent
11 Aug 2005
TL;DR: In this article, a telephony device consisting of a near-mouth microphone (M1) for picking up an input acoustic signal including the speaker's voice signal (S1) and an unwanted noise signal (N1,D1), a far-mouth microphones (M2) was used to pick up an unwanted audio signal (D2,D2) in addition to the near-end speaker's (S2) voice signal and an orientation sensor for measuring an orientation indication of the mobile device.
Abstract: The present invention relates to a telephony device comprising a near-mouth microphone (M1) for picking up an input acoustic signal including the speaker's voice signal (S1) and an unwanted noise signal (N1,D1), a far-mouth microphone (M2) for picking up an unwanted noise signal (N2,D2) in addition to the near-end speaker's voice signal (S2), said speaker's voice signal being at a lower level than the near-mouth microphone, and an orientation sensor for measuring an orientation indication of said mobile device. The telephony device further comprises an audio processing unit comprising an adaptive beam­former (BF) coupled to the near-mouth and far-mouth microphones, including spatial filters for spatially filtering the input signals (zl,z2) delivered by the two microphones, and a spectral post-processor (SPP) for post-processing the signal delivered by the beam-former so as to separate the desired voice signal from the unwanted noise signal so as to deliver the output signal (y).

106 citations

Proceedings ArticleDOI
Wu Chou1, Liang Gu2
07 May 2001
TL;DR: A new set of features derived from the harmonic coefficient and its 4 Hz modulation values are developed in this paper and these new features provide additional and reliable cues to separate speech from singing, which leads to further improvements in speech/music discrimination.
Abstract: In this paper, an approach for robust singing signal detection in speech/music discrimination is proposed and applied to applications of audio indexing. Conventional approaches in speech/music discrimination can provide reasonable performance with regular music signals but often perform poorly with singing segments. This is due mainly to the fact that speech and singing signals are extremely close and traditional features used in speech recognition do not provide a reliable cue for speech and singing signal discrimination. In order to improve the robustness of speech/music discrimination, a new set of features derived from the harmonic coefficient and its 4 Hz modulation values are developed in this paper, and these new features provide additional and reliable cues to separate speech from singing. In addition, a rule-based post-filtering scheme is also described which leads to further improvements in speech/music discrimination. Source-independent audio indexing experiments on the PBS Skills database indicate that the proposed approach can greatly reduce the classification error rate on singing segments in the audio stream. Comparing with existing approaches, the overall segmentation error rate is reduced by more than 30%, averaged over all shows in the database.

106 citations

Journal ArticleDOI
TL;DR: An information-theoretic measure for the amount of randomness or stochasticity that exists in a signal is presented, formulated in terms of the rate of growth of multi-information for every new signal sample of the signal that is observed over time.
Abstract: We present an information-theoretic measure for the amount of randomness or stochasticity that exists in a signal. This measure is formulated in terms of the rate of growth of multi-information for every new signal sample of the signal that is observed over time. In case of a Gaussian statistics it is shown that this measure is equivalent to the well-known spectral flatness measure that is commonly used in audio processing. For nonGaussian linear processes a generalized spectral flatness measure is developed, which estimates the excessive structure that is present in the signal due to the nonGaussianity of the innovation process. An estimator for this measure is developed using Negentropy approximation to the non-Gaussian signal and the innovation process statistics. Applications of this new measure are demonstrated for the problem of voiced/unvoiced determination, showing improved performance.

106 citations

Journal ArticleDOI
TL;DR: This study discusses a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW), and proposes a system diagram and discusses critical tasks associated with effective audio information retrieval.
Abstract: Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio information retrieval. These include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. For segmentation, a new evaluation criterion entitled fused error score (FES) is proposed, followed by application of the CompSeg segmentation scheme on DARPA Hub4 Broadcast News (30.5% relative improvement in FES) and NGSW data. Transcript generation is demonstrated for a six-decade portion of the NGSW corpus. Novel model adaptation using structure maximum likelihood eigenspace mapping shows a relative 21.7% improvement. Issues regarding copyright assessment and metadata construction are also addressed for the purposes of a sustainable audio collection of this magnitude. Advanced parameter-embedded watermarking is proposed with evaluations showing robustness to correlated noise attacks. Our experimental online system entitled "SpeechFind" is presented, which allows for audio retrieval from a portion of the NGSW corpus. Finally, a number of research challenges such as language modeling and lexicon for changing time periods, speaker trait and identification tracking, as well as new directions, are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.

106 citations

Proceedings ArticleDOI
18 Jun 2004
TL;DR: The method described is both aurally transparent and robust and can be applied to both analog and digital audio signals, the latter including uncompressed as well as compressed audio file formats such as MP3.
Abstract: Data hiding in media, including images, video, and audio, and in data files is currently of great interest both commercially, mainly for the protection of copyrighted digital media, and to the government and law enforcement in the context of information system security and covert communications. We present a technique for inserting and recovering "hidden" data in audio files. The phase of chosen components of the host audio signal is manipulated in a way that may be detected by a receiver with the proper "key". Without the key, the hidden data is undetectable, both aurally and via blind digital signal processing attacks. The method described is both aurally transparent and robust and can be applied to both analog and digital audio signals, the latter including uncompressed as well as compressed audio file formats such as MP3.

106 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Robustness (computer science)
94.7K papers, 1.6M citations
78% related
Noise
110.4K papers, 1.3M citations
77% related
Image segmentation
79.6K papers, 1.8M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202263
2021217
2020525
2019659
2018597