Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speech analysis/Synthesis based on a sinusoidal representation

[...]

R.J. McAulay¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1986-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.

...read moreread less

Abstract: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].

...read moreread less

1,659 citations

Proceedings Article•DOI•

Code-excited linear prediction(CELP): High-quality speech at very low bit rates

[...]

Manfred R. Schroeder¹, B. S. Atal²•Institutions (2)

University of Göttingen¹, AT&T²

26 Apr 1985

TL;DR: A code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion, indicating that a random code book has a slight speech quality advantage at low bit rates.

...read moreread less

Abstract: We describe in this paper a code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion. Each sample of the innovation sequence is filtered sequentially through two time-varying linear recursive filters, one with a long-delay (related to pitch period) predictor in the feedback loop and the other with a short-delay predictor (related to spectral envelope) in the feedback loop. We code speech, sampled at 8 kHz, in blocks of 5-msec duration. Each block consisting of 40 samples is produced from one of 1024 possible innovation sequences. The bit rate for the innovation sequence is thus 1/4 bit per sample. We compare in this paper several different random and deterministic code books for their effectiveness in providing the optimum innovation sequence in each block. Our results indicate that a random code book has a slight speech quality advantage at low bit rates. Examples of speech produced by the above method will be played at the conference.

...read moreread less

1,343 citations

Journal Article•DOI•

A statistical model-based voice activity detection

[...]

Jongseo Sohn¹, Nam Soo Kim, Wonyong Sung¹•Institutions (1)

Seoul National University¹

01 Jan 1999-IEEE Signal Processing Letters

TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.

...read moreread less

Abstract: In this letter, we develop a robust voice activity detector (VAD) for the application to variable-rate speech coding. The developed VAD employs the decision-directed parameter estimation method for the likelihood ratio test. In addition, we propose an effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences. According to our simulation results, the proposed VAD shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.

...read moreread less

1,341 citations

Journal Article•DOI•

Subband coding of images

[...]

John W. Woods¹, S. O'Neil¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Oct 1986-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A simple yet efficient extension of this concept to the source coding of images by specifying the constraints for a set of two-dimensional quadrature mirror filters for a particular frequency-domain partition and showing that these constraints are satisfied by a separable combination of one-dimensional QMF's.

...read moreread less

Abstract: Subband coding has become quite popular for the source encoding of speech. This paper presents a simple yet efficient extension of this concept to the source coding of images. We specify the constraints for a set of two-dimensional quadrature mirror filters (QMF's) for a particular frequency-domain partition, and show that these constraints are satisfied by a separable combination of one-dimensional QMF's. Bits are then optimally allocated among the subbands to minimize the mean-squared error for DPCM coding of the subbands. Also, an adaptive technique is developed to allocate the bits within each subband by means of a local variance mask. Optimum quantization is employed with quantizers matched to the Laplacian distribution. Subband coded images are presented along with their signal-to-noise ratios (SNR's). The SNR performance of the subband coder is compared to that of the adaptive discrete cosine transform (DCT), vector quantization, and differential vector quantization for bit rates of 0.67, 1.0, and 2.0 bits per pixel for 256 × 256 monochrome images. The adaptive subband coder has the best SNR performance.

...read moreread less

1,181 citations

Journal Article•DOI•

WORLD: A vocoder-based high-quality speech synthesis system for real-time applications

[...]

Masanori Morise¹, Fumiya Yokomori¹, Kenji Ozawa¹•Institutions (1)

University of Yamanashi¹

01 Jul 2016-IEICE Transactions on Information and Systems

TL;DR: A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of realtime applications using speech and showed that it was superior to the other systems in terms of both sound quality and processing speed.

...read moreread less

Abstract: A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of realtime applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing. key words: speech analysis, speech synthesis, vocoder, sound quality, realtime processing

...read moreread less

1,025 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics