Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications

[...]

Robert Kaucic¹, Barney Dalton¹, Andrew Blake¹•Institutions (1)

University of Oxford¹

15 Apr 1996

TL;DR: Tests on small isolated-word vocabularies using a dynamic time warping based audio-visual recogniser demonstrate that real-time, contour-based lip tracking can be used to supplement acoustic-only speech recognisers enabling robust recognition of speech in the presence of acoustic noise.

...read moreread less

Abstract: Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and parts of faces in real-time without special hardware. This paper describes a real-time lip tracker that uses a Kalman filter based dynamic contour to track the outline of the lips. Two alternative lip trackers, one that tracks lips from a profile view and the other from a frontal view, were developed to extract visual speech recognition features from the lip contour. In both cases, visual features have been incorporated into an acoustic automatic speech recogniser. Tests on small isolated-word vocabularies using a dynamic time warping based audio-visual recogniser demonstrate that real-time, contour-based lip tracking can be used to supplement acoustic-only speech recognisers enabling robust recognition of speech in the presence of acoustic noise.

...read moreread less

96 citations

Patent•DOI•

Apparatuses and methods for developing and using models for speech recognition

[...]

Laurence S. Gillick, Francesco Scattone

23 Jan 1995-Journal of the Acoustical Society of America

TL;DR: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned againstspeech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group.

...read moreread less

Abstract: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned against speech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group. A decision tree classifies speech sounds into such groups, and related speech sound groups descend from common tree nodes. New speech samples time aligned against a given speech sound group's model update models of related speech sound groups, decreasing the training data required to adapt the system. The phonetic context classifications can be based on knowledge of which contextual features are associated with acoustic similarity. The computerized system samples speech sounds using a first, larger, parameter set; automatically selects combinations of phonetic context classifications which divide the speech sounds into groups whose frames are acoustically similar, such as by use of a decision tree; selects a second, smaller, set of parameters based on that set's ability to separate the frames aligned with each speech sound group, such as by used of linear discriminant analysis; and then uses these new parameters to represent frames and speech sound models. Then, using the new parameters, a decision tree classifier can be used to re-classify the speech sounds and to calculate new acoustic models for the resulting groups of speech sounds.

...read moreread less

95 citations

Journal Article•DOI•

G.722: a new CCITT coding standard for digital transmission of wideband audio signals

[...]

P. Mermelstein¹•Institutions (1)

McGill University¹

01 Jan 1988-IEEE Communications Magazine

TL;DR: A tutorial discussion is provided of the adaptive differential PCM (pulse-code modulation) coding method recommended by the group, which covers the subjective performance tests performed, mode initialization and mode switching, data-speed multiplexing, and communication between narrowband and wideband terminals.

...read moreread less

Abstract: CCITT Study Group XVIII recognized the need for a new international coding standard on high-quality audio to allow interconnection of diverse switching, transmission, and terminal equipment and organized an expert group in 1983 to recommend an appropriate coding technique. A tutorial discussion is provided of the adaptive differential PCM (pulse-code modulation) coding method recommended by the group. The discussion covers the subjective performance tests performed, mode initialization and mode switching, data-speed multiplexing, and communication between narrowband and wideband terminals. >

...read moreread less

95 citations

Proceedings Article•DOI•

Discrete wavelet transform techniques in speech processing

[...]

Johnson I. Agbinya¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

26 Nov 1996

TL;DR: It is observed that wavelets concentrate speech energy into bands which differentiate between voiced or unvoiced speech, and it is shown that the Battle-Lemarie wavelet concentrates more than 97.5% of the signal energy into the approximation part of the coefficients.

...read moreread less

Abstract: The trend towards real-time, low-bit-rate speech coders dictates current research efforts in speech compression. A method being evaluated uses wavelets for speech analysis and synthesis. Distinguishing between voiced and unvoiced speech, determining pitch, and methods for choosing optimum wavelets for speech compression are discussed. We observe that wavelets concentrate speech energy into bands which differentiate between voiced or unvoiced speech. Optimum wavelets are selected based on energy conservation properties in the approximation part of the wavelet coefficients. It is shown that the Battle-Lemarie wavelet concentrates more than 97.5% of the signal energy into the approximation part of the coefficients followed closely by the Daubechies D20, D12, D10 or D8 wavelets. The Haar wavelets are the worst. Listening tests show that the Daubechies 10 preserves perceptual information better than other Daubechies wavelets and, indeed, a host of other orthogonal wavelets. Pitch periods and evolution can be identified from contour plots of coefficients obtained at several scales.

...read moreread less

95 citations

Patent•

Noise reduction and audio-visual speech activity detection

[...]

Morio c o Sony Ericsson Mobile Taneda¹•Institutions (1)

Ericsson Mobile Communications¹

09 Jan 2004

TL;DR: In this article, an audio-visual speech activience recognition system (200b/c) of a video-enabled telecommunication device which runs a real-time lip tracking application that can advantageously be used for a near-speaker detection algorithm in an environment where a speaker's voice is interfered by a statistically distributed background noise (n'(t)) including both environmental noise and surrounding persons' voices.

...read moreread less

Abstract: The present invention generally relates to the field of noise reduction systems which are equipped with an audio-visual user interface, in particular to an audio-visual speech activity recognition system (200b/c) of a video-enabled telecommunication device which runs a real-time lip tracking application that can advantageously be used for a near-speaker detection algorithm in an environment where a speaker's voice is interfered by a statistically distributed background noise (n'(t)) including both environmental noise (n(t)) and surrounding persons' voices

...read moreread less

95 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics