scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2007
TL;DR: Evaluations on the large vocabulary speech decoder developed at Tokyo Institute of Technology, which has developed a technique to allow parts of the decoder to be run on the graphics processor, which can lead to a very significant speed up.
Abstract: In this paper we present evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a fast, scalable, flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we have already implemented a impressive feature set and are achieving good accuracy and speed on a large vocabulary spontaneous speech task. We have developed a technique to allow parts of the decoder to be run on the graphics processor, this can lead to a very significant speed up.

66 citations

Proceedings ArticleDOI
B. Atal1
01 Apr 1986
TL;DR: Two new speech coding algorithms - multi-pulse LPC and stochastic coding (code-excited linear prediction) - have been proposed recently to achieve high quality speech at bit rates below 10 kbits/sec.
Abstract: We will present in this paper some recent developments in low bit rate speech coding research. Two new speech coding algorithms - multi-pulse LPC and stochastic coding (code-excited linear prediction) - have been proposed recently to achieve high quality speech at bit rates below 10 kbits/sec. Both of these algorithms use a linear filter to synthesize speech at the receiver but they differ in the methods used to generate the excitation for the linear filter. The multi-pulse model assumes that the excitation can be represented with sufficient accuracy as a sequence of pulses (typically 4 to 8 pulses every 5 msec). In stochastic coders, the excitation is selected from a random codebook of white Gaussian sequences. The optimum excitation in both these coders is chosen to minimize a subjective error criterion based on properties of human auditory perception. Although these coding algorithms are complex requiring over 10 million multiply-add operations per second, new fast digital signal processor chips offer the possibility of their real-time implementation.

66 citations

Proceedings Article
01 Sep 2003
TL;DR: This paper shows how this problem can be tackled using a data driven approach which selects appropriate speech examples as candidates for DTW-alignment, resulting in an explosion of the search space.
Abstract: The dominant acoustic modeling methodology based on Hidden Markov Models is known to have certain weaknesses Partial solutions to these flaws have been presented, but the fundamental problem remains: compression of the data to a compact HMM discards useful information such as time dependencies and speaker information In this paper, we look at pure example based recognition as a solution to this problem By replacing the HMM with the underlying examples, all information in the training data is retained We show how information about speaker and environment can be used, introducing a new interpretation of adaptation The basis for the recognizer is the wellknown DTW algorithm, which has often been used for small tasks However, large vocabulary speech recognition introduces new demands, resulting in an explosion of the search space We show how this problem can be tackled using a data driven approach which selects appropriate speech examples as candidates for DTW-alignment

66 citations

Journal ArticleDOI
TL;DR: A new stochastic model for generating speech signals suitable for coding at low bit rates is described, in which the speech waveform is represented as a zero mean Gaussian process with slowly-varying power spectrum.

66 citations

Journal ArticleDOI
TL;DR: It is found that real-time synthesis of vowels and consonants was possible with good intelligibility and open to future speech BCI applications using such articulatory-based speech synthesizer.
Abstract: Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

66 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108