scispace - formally typeset
Search or ask a question
Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.


Papers
More filters
Patent
25 Jun 2002
TL;DR: In this paper, the authors present systems and methods for building distributed conversational applications using a Web services-based model where speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways.
Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

619 citations

Book
01 Jan 1969

612 citations

Journal ArticleDOI
Lie Lu1, Hong-Jiang Zhang1, Hao Jiang1
TL;DR: A robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence is proposed, and an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis is developed.
Abstract: We present our study of audio content analysis for classification and segmentation, in which an audio stream is segmented according to audio type or speaker identity. We propose a robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and nonspeech discrimination. In this step, a novel algorithm based on K-nearest-neighbor (KNN) and linear spectral pairs-vector quantization (LSP-VQ) is developed. The second step further divides nonspeech class into music, environment sounds, and silence with a rule-based classification scheme. A set of new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. We also develop an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis. Without a priori knowledge, this algorithm can support the open-set speaker, online speaker modeling and real time segmentation. Experimental results indicate that the proposed algorithms can produce very satisfactory results.

559 citations

Patent
27 Mar 1995
TL;DR: A code frequency component in the encoded audio signal is detected based on an expected code amplitude or on a noise amplitude within a range of audio frequencies including the frequency of the code component as discussed by the authors.
Abstract: Apparatus and methods for including a code (68) having at least one code frequency component in an audio signal (60) are provided. The abilities of various frequency components in the audio signal to mask the code frequency component to human hearing are evaluated (64), and based on these evaluations an amplitude (76) is assigned to the code frequency component. Methods and apparatus for detecting a code in an encoded audio signal are also provided. A code frequency component in the encoded audio signal is detected based on an expected code amplitude or on a noise amplitude within a range of audio frequencies including the frequency of the code component.

554 citations

Journal ArticleDOI
TL;DR: This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.
Abstract: Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description.

552 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Robustness (computer science)
94.7K papers, 1.6M citations
78% related
Noise
110.4K papers, 1.3M citations
77% related
Image segmentation
79.6K papers, 1.8M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202263
2021217
2020525
2019659
2018597