Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multi-pitch estimation

[...]

Mads Græsbøll Christensen¹, Andreas Jakobsson•Institutions (1)

Aalborg University¹

01 Jan 2009-Synthesis Lectures on Speech and Audio Processing

TL;DR: In this book, an introduction to pitch estimation is given and a number of statistical methods for pitch estimation are presented, which include both single- and multi-pitch estimators based on statistical approaches, like maximum likelihood and maximum a posteriori methods.

...read moreread less

Abstract: Periodic signals can be decomposed into sets of sinusoids having frequencies that are integer multiples of a fundamental frequency. The problem of finding such fundamental frequencies from noisy observations is important in many speech and audio applications, where it is commonly referred to as pitch estimation. These applications include analysis, compression, separation, enhancement, automatic transcription and many more. In this book, an introduction to pitch estimation is given and a number of statistical methods for pitch estimation are presented. The basic signal models and associated estimation theoretical bounds are introduced, and the properties of speech and audio signals are discussed and illustrated. The presented methods include both single- and multi-pitch estimators based on statistical approaches, like maximum likelihood and maximum a posteriori methods, filtering methods based on both static and optimal adaptive designs, and subspace methods based on the principles of subspace orthogonality and shift-invariance. The application of these methods to analysis of speech and audio signals is demonstrated using both real and synthetic signals, and their performance is assessed under various conditions and their properties discussed. Finally, the estimators are compared in terms of computational and statistical efficiency, generalizability and robustness. (Less)

...read moreread less

72 citations

Proceedings Article•DOI•

Error correction via a post-processor for continuous speech recognition

[...]

Eric K. Ringger¹, James F. Allen²•Institutions (2)

University of Rochester¹, Carnegie Mellon University²

07 May 1996

TL;DR: This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.

...read moreread less

Abstract: This paper presents a new technique for overcoming several types of speech recognition errors by post-processing the output of a continuous speech recognizer. The post-processor output contains fewer errors, thereby making interpretation by higher-level modules, such as a parser, in a speech understanding system more reliable. The primary advantage to the post-processing approach over existing approaches for overcoming SR errors lies in its ability to introduce options that are not available in the SR module's output. This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.

...read moreread less

72 citations

Patent•

Symbiotic automatic speech recognition and vocoder

[...]

Arthur R. Zingher¹•Institutions (1)

IBM¹

31 Oct 1997

TL;DR: In this paper, an Acoustic Processor is used to produce a Mel-Cepstrum Vector and Pitch, which is then recalibrated and encoded over a narrow-band channel.

...read moreread less

Abstract: The device and method of the invention receives a digital speech signal, which is processed by an Acoustic Processor to produce a Mel-Cepstrum Vector and Pitch. This is recalibrated and encoded. The encoded signal is transmitted over a narrow-band Channel, then decoded, split and recalibrated. From the split signals, one signal feeds a Statistical Processor which produces Recognized Text. Another signal feeds a Regenerator, which produces Regenerated Speech. The device and method according to the invention achieve simultaneously very perceptive Automatic Speech Recognition and high quality VoCoding, using Speech communicated or stored via a Channel with narrow-bandwidth; very perceptive Automatic Speech Recognition on a Client & Server system without a need to store or to communicate wide-bandwidth Speech signals; very perceptive Automatic Speech Recognition with Deferred Review and Editing without storage of wide-bandwidth Speech signals; better feedback in a system for Automatic Speech Recognition particularly for Deferred Automatic Speech Recognition; and good usability for unified Automatic Speech Recognition and VoCoding.

...read moreread less

72 citations

Journal Article•DOI•

Effects of spectro-temporal modulation changes produced by multi-channel compression on intelligibility in a competing-speech task.

[...]

Michael A. Stone¹, Brian C. J. Moore•Institutions (1)

University of Cambridge¹

01 Feb 2008-Journal of the Acoustical Society of America

TL;DR: These experiments are concerned with the intelligibility of target speech in the presence of a background talker using a noise vocoder and showed that intelligibility was lower when fast single-channel compression was applied to the target and background after mixing rather than before.

...read moreread less

Abstract: These experiments are concerned with the intelligibility of target speech in the presence of a background talker. Using a noise vocoder, Stone and Moore [J. Acoust. Soc. Am. 114, 1023-1034 (2003)] showed that single-channel fast-acting compression degraded intelligibility, but slow compression did not. Stone and Moore [J. Acoust. Soc. Am. 116, 2311-2323 (2004)] showed that intelligibility was lower when fast single-channel compression was applied to the target and background after mixing rather than before, and suggested that this was partly due to compression after mixing introducing "comodulation" between the target and background talkers. Experiment 1 here showed a similar effect for multi-channel compression. In experiment 2, intelligibility was measured as a function of the speed of multi-channel compression applied after mixing. For both eight- and 12-channel vocoders with one compressor per channel, intelligibility decreased as compression speed increased. For the eight-channel vocoder, a compressor that only affected modulation depth for rates below 2 Hz still reduced intelligibility. Experiment 3 used 12- or 18-channel vocoders. There were between 1 and 12 compression channels, and four speeds of compression. Intelligibility decreased as the number and speed of compression channels increased. The results are interpreted using several measures of the effects of compression, especially "across-source modulation correlation."

...read moreread less

72 citations

Proceedings Article•DOI•

The influence of speech coding algorithms on automatic speech recognition

[...]

Stephan Euler¹, J. Zinke¹•Institutions (1)

Bosch¹

19 Apr 1994

TL;DR: The authors use a Gaussian classifier for estimation of the coding condition of a test utterance and the combination of this classifier and coder specific word models yields a high overall recognition performance.

...read moreread less

Abstract: Examines the influence of different coders in the range from 64 kbit/sec to 4.8 kbit/sec on both a speaker independent isolated word recognizer and a speaker verification system. Applying systems trained with 64 kbit/sec to e.g. the 4.8 kbit/sec data increases the error rate of the word recognizer by a factor of three. For rates below 13 kbit/sec the speaker verification is more affected than the word recognition. The performance improves significantly if word models are provided for the individual coding conditions. Therefore, the authors use a Gaussian classifier for estimation of the coding condition of a test utterance. The combination of this classifier and coder specific word models yields a high overall recognition performance. >

...read moreread less

72 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics