scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
21 Apr 1997
TL;DR: The GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system provides wireline quality not only for error-free conditions but also for the most typical error conditions.
Abstract: This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).

84 citations

Proceedings ArticleDOI
15 Apr 2007
TL;DR: A supervised learning approach to monaural segregation of reverberant voiced speech is proposed, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features.
Abstract: Room reverberation degrades speech signals and poses a major challenge to current monaural speech segregation systems. Previous research relies on inverse filtering as a front-end for partially restoring the harmonicity of the reverberant signal. We show that the inverse filtering approach is sensitive to different room configurations, hence undesirable in general reverberation conditions. We propose a supervised learning approach to map a set of harmonic features into a pitch based grouping cue for each time-frequency (T-F) unit. We use a speech segregation method to estimate an ideal binary T-F mask which retains the reverberant mixture in a local T-F unit if and only if the energy of target is stronger than interference energy. Results show that our approach improves the segregation performance considerably.

84 citations

Book
02 Jan 2008
TL;DR: This book is a thorough reference to the 3GPP and MPEG Parametric Stereo standards and the MPEG Surround multi-channel audio coding standard and describes key developments in coding techniques, which is an important factor in the optimization of advanced entertainment, communications and signal processing applications.
Abstract: This book collects a wealth of information about spatial audio coding into one comprehensible volume. It is a thorough reference to the 3GPP and MPEG Parametric Stereo standards and the MPEG Surround multi-channel audio coding standard. It describes key developments in coding techniques, which is an important factor in the optimization of advanced entertainment, communications and signal processing applications. Until recently, technologies for coding audio signals, such as redundancy reduction and sophisticated source and receiver models did not incorporate spatial characteristics of source and receiving ends. Spatial audio coding achieves much higher compression ratios than conventional coders. It does this by representing multi-channel audio signals as a downmix signal plus side information that describes the perceptually-relevant spatial information. Written by experts in spatial audio coding, Spatial Audio Processing: reviews psychoacoustics (the relationship between physical measures of sound and the corresponding percepts) and spatial audio sound formats and reproduction systems; brings together the processing, acquisition, mixing, playback, and perception of spatial audio, with the latest coding techniques; analyses algorithms for the efficient manipulation of multiple, discrete and combined spatial audio channels, including both MP3 and MPEG Surround; shows how the same insights on source and receiver models can also be applied for manipulation of audio signals, such as the synthesis of virtual auditory scenes employing head-related transfer function (HRTF) processing and stereo to N-channel audio upmix. Audio processing research engineers and audio coding research and implementation engineers will find this an insightful guide. Academic audio and psychoacoustic researchers, including post-graduate and third/fourth year students taking courses in signal processing, audio and speech processing, and telecommunications, will also benefit from the information inside.

84 citations

Proceedings ArticleDOI
21 Oct 2001
TL;DR: Techniques for estimating the tempo and the swing, and locating the beats in audio recordings, under the assumption that the tempo is constant are presented.
Abstract: The problem of estimating the tempo of audio recordings (the number of beats per minute, or BPM) has received an increasing amount of attention in the past few years. Applications include the synchronization of multiple audio tracks for simultaneous playback, "tempo-synchronous" audio effects, automatic looping of audio tracks etc. This article presents techniques for estimating the tempo and the swing, and locating the beats in audio recordings, under the assumption that the tempo is constant. The techniques rely on a preliminary transient detection stage where note onsets/offsets, percussion hits and other time-localized events are detected. This first step is followed by a maximum likelihood estimation of the tempo, swing and downbeat. Suggestions are given to minimize the computation load of the methods.

84 citations

Patent
TL;DR: In this paper, speaker-dependent and speaker-independent speech recognition in a voice-controlled multi-station network has been discussed and a fallback procedure is maintained for any particular station in order to cater for failure of the speakerdependent recognition, whilst allowing reverting to the improvement procedure.
Abstract: A voice-controlled multi-station network has both speaker-dependent and speaker-independent speech recognition. Conditionally to recognizing items of an applicable vocabulary, the network executes a particular function. The method receives a call from a particular origin and executes speaker-independent speech recognition on the call. In an improvement procedure, in case of successful determination of what has been said, a template associated to the recognized speech items is stored and assigned to the origin. Next, speaker-dependent recognition is applied if feasible, for speech received from the same origin, using one or more templates associated to that station. Further, a fallback procedure to speaker-independent recognition is maintained for any particular station in order to cater for failure of the speaker-dependent recognition, whilst allowing reverting to the improvement procedure.

84 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108