scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers.
Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.

241 citations

Journal ArticleDOI
01 Jun 2010
TL;DR: An overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the cocktail party problem.
Abstract: Sparse representations have proved a powerful tool in the analysis and processing of audio signals and already lie at the heart of popular coding standards such as MP3 and Dolby AAC. In this paper we give an overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the ?cocktail party problem.? In each case we will show how the prior assumption that the audio signals are approximately sparse in some time-frequency representation allows us to address the associated signal processing task.

239 citations

Patent
26 Feb 2001
TL;DR: In this paper, a speech manager interface allows the speech recognition process and the text-to-speech process to be accessed by other application processes in handheld electronic devices such as a personal digital assistant (PDA).
Abstract: A handheld electronic device such as a personal digital assistant (PDA) has multiple application processes. A speech recognition process takes input speech from a user and produces a recognition output representative of the input speech. A text-to-speech process takes output text and produces a representative speech output. A speech manager interface allows the speech recognition process and the text-to-speech process to be accessed by other application processes.

239 citations

Journal ArticleDOI
01 Apr 1975
TL;DR: This paper presents several digital signal processing methods for representing speech, including simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques.
Abstract: This paper presents several digital signal processing methods for representing speech. Included among the representations are simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques. The advantages and disadvantages of each of these representations for various speech processing applications are discussed.

238 citations

Journal ArticleDOI
TL;DR: The coder structure is described in detail and the reasons behind certain design choices are discussed and a summary of the subjective test results based on a real-time implementation of this version are presented.
Abstract: This paper describes the 8 kb/s speech coding algorithm G.729 which has been standardized by ITU-T. The algorithm is based on a conjugate-structure algebraic CELP (CS-ACELP) coding technique and uses 10 ms speech frames. The codec delivers toll-quality speech (equivalent to 32 kb/s ADPCM) for most operating conditions. This paper describes the coder structure in detail and discusses the reasons behind certain design choices. A 16-b fixed-point version has been developed as part of Recommendation G.729 and a summary of the subjective test results based on a real-time implementation of this version are presented.

236 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108