scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
30 Oct 1999
TL;DR: A real-time audio segmentation and indexing scheme that can be applied to almost any content-based audio management system and achieves an accuracy rate of more than 90% for audio classification is presented.
Abstract: A real-time audio segmentation and indexing scheme is presented in this paper. Audio recordings are segmented and classified into basic audio types such as silence, speech, music, song, environmental sound, speech with the music background, environmental sound with the music background, etc. Simple audio features such as the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak track are adopted in this system to ensure on-line processing. Morphological and statistical analysis for temporal curves of these features are performed to show differences among different types of audio. A heuristic rule-based procedure is then developed to segment and classify audio signals by using these features. The proposed approach is generic and model free. It can be applied to almost any content-based audio management system. It is shown that the proposed scheme achieves an accuracy rate of more than 90% for audio classification. Examples for segmentation and indexing of accompanying audio signals in movies and video programs are also provided.

89 citations

Journal Article
TL;DR: All aspects of this standardization eort are outlined, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs.
Abstract: In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) nalized the new MPEG-D Unied Speech and Audio Coding standard The new codec brings together the previously separated worlds of general audio coding and speech coding It does so by integrating elements from audio coding and speech coding into a unied system The present publication outlines all aspects of this standardization eort, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs

88 citations

Journal ArticleDOI
S. Ikeda1, A. Sugiyama
TL;DR: Computer simulation results using speech and diesel engine noise recorded in a special-purpose vehicle show that the proposed adaptive noise canceller with low signal distortion reduces signal distortion in the output signal by up to 15 dB compared with a conventional ANC.
Abstract: This paper proposes an adaptive noise canceller (ANC) with low signal distortion for speech codecs. The proposed ANC has two adaptive filters: a main filter (MF) and a subfilter (SF). The signal-to-noise ratio (SNR) of input signals is estimated using the SF. To reduce signal distortion in the output signal of the ANC, a step size for coefficient update in the MF is controlled according to the estimated SNR. Computer simulation results using speech and diesel engine noise recorded in a special-purpose vehicle show that the proposed ANC reduces signal distortion in the output signal by up to 15 dB compared with a conventional ANC. Results of subjective listening tests show that the mean opinion scores (MOSs) for the proposed ANC with and without a speech codec are one point higher than the scores for the conventional ANC.

88 citations

BookDOI
01 Jan 1991

87 citations

Proceedings ArticleDOI
04 May 2014
TL;DR: It is demonstrated that distortion caused by reverberation is substantially attenuated by the DNN whose outputs can be resynthesized to the dereverebrated speech signal.
Abstract: Reverberation distorts human speech and usually has negative effects on speech intelligibility, especially for hearing-impaired listeners. It also causes performance degradation in automatic speech recognition and speaker identification systems. Therefore, the dereverberation problem must be dealt with in daily listening environments. We propose to use deep neural networks (DNNs) to learn a spectral mapping from the reverberant speech to the anechoic speech. The trained DNN produces the estimated spectral representation of the corresponding anechoic speech. We demonstrate that distortion caused by reverberation is substantially attenuated by the DNN whose outputs can be resynthesized to the dereverebrated speech signal. The proposed approach is simple, and our systematic evaluation shows promising dereverberation results, which are significantly better than those of related systems.

87 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108