Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The Titech large vocabulary WFST speech recognition system

[...]

Paul R. Dixon¹, Diamantino Caseiro², Tasuku Oonishi¹, Sadaoki Furui¹•Institutions (2)

Tokyo Institute of Technology¹, INESC-ID²

01 Dec 2007

TL;DR: Evaluations on the large vocabulary speech decoder developed at Tokyo Institute of Technology, which has developed a technique to allow parts of the decoder to be run on the graphics processor, which can lead to a very significant speed up.

...read moreread less

Abstract: In this paper we present evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a fast, scalable, flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we have already implemented a impressive feature set and are achieving good accuracy and speed on a large vocabulary spontaneous speech task. We have developed a technique to allow parts of the decoder to be run on the graphics processor, this can lead to a very significant speed up.

...read moreread less

66 citations

Proceedings Article•DOI•

High-quality speech at low bit rates: Multi-pulse and stochastically excited linear predictive coders

[...]

B. Atal¹•Institutions (1)

Bell Labs¹

01 Apr 1986

TL;DR: Two new speech coding algorithms - multi-pulse LPC and stochastic coding (code-excited linear prediction) - have been proposed recently to achieve high quality speech at bit rates below 10 kbits/sec.

...read moreread less

Abstract: We will present in this paper some recent developments in low bit rate speech coding research. Two new speech coding algorithms - multi-pulse LPC and stochastic coding (code-excited linear prediction) - have been proposed recently to achieve high quality speech at bit rates below 10 kbits/sec. Both of these algorithms use a linear filter to synthesize speech at the receiver but they differ in the methods used to generate the excitation for the linear filter. The multi-pulse model assumes that the excitation can be represented with sufficient accuracy as a sequence of pulses (typically 4 to 8 pulses every 5 msec). In stochastic coders, the excitation is selected from a random codebook of white Gaussian sequences. The optimum excitation in both these coders is chosen to minimize a subjective error criterion based on properties of human auditory perception. Although these coding algorithms are complex requiring over 10 million multiply-add operations per second, new fast digital signal processor chips offer the possibility of their real-time implementation.

...read moreread less

66 citations

Proceedings Article•

Data driven example based continuous speech recognition.

[...]

Mathias De Wachter¹, Kris Demuynck¹, Dirk Van Compernolle¹, Patrick Wambacq¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Sep 2003

TL;DR: This paper shows how this problem can be tackled using a data driven approach which selects appropriate speech examples as candidates for DTW-alignment, resulting in an explosion of the search space.

...read moreread less

Abstract: The dominant acoustic modeling methodology based on Hidden Markov Models is known to have certain weaknesses Partial solutions to these flaws have been presented, but the fundamental problem remains: compression of the data to a compact HMM discards useful information such as time dependencies and speaker information In this paper, we look at pure example based recognition as a solution to this problem By replacing the HMM with the underlying examples, all information in the training data is retained We show how information about speaker and environment can be used, introducing a new interpretation of adaptation The basis for the recognizer is the wellknown DTW algorithm, which has often been used for small tasks However, large vocabulary speech recognition introduces new demands, resulting in an explosion of the search space We show how this problem can be tackled using a data driven approach which selects appropriate speech examples as candidates for DTW-alignment

...read moreread less

66 citations

Journal Article•DOI•

Stochastic coding of speech signals at very low bit rates: The importance of speech perception

[...]

Manfred R. Schroeder¹, Bishnu S. Atal²•Institutions (2)

University of Göttingen¹, Bell Labs²

01 Aug 1985-Speech Communication

TL;DR: A new stochastic model for generating speech signals suitable for coding at low bit rates is described, in which the speech waveform is represented as a zero mean Gaussian process with slowly-varying power spectrum.

...read moreread less

66 citations

Journal Article•DOI•

Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces.

[...]

Florent Bocquelet¹, Thomas Hueber², Laurent Girin³, Christophe Savariaux², Christophe Savariaux³, Blaise Yvert¹, Blaise Yvert³ - Show less +3 more•Institutions (3)

French Institute of Health and Medical Research¹, Centre national de la recherche scientifique², University of Grenoble³

23 Nov 2016-PLOS Computational Biology

TL;DR: It is found that real-time synthesis of vowels and consonants was possible with good intelligibility and open to future speech BCI applications using such articulatory-based speech synthesizer.

...read moreread less

Abstract: Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

...read moreread less

66 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics