Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

GSM enhanced full rate speech codec

[...]

Kari Jarvinen¹, Janne Vainio¹, Pekka Kapanen¹, Tero Honkanen¹, Petri Haavisto¹, R. Salami², Claude Laflamme, J.-P. Adoul² - Show less +4 more•Institutions (2)

Nokia¹, Université de Sherbrooke²

21 Apr 1997

TL;DR: The GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system provides wireline quality not only for error-free conditions but also for the most typical error conditions.

...read moreread less

Abstract: This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).

...read moreread less

84 citations

Proceedings Article•DOI•

A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

[...]

Zhaozhang Jin¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

15 Apr 2007

TL;DR: A supervised learning approach to monaural segregation of reverberant voiced speech is proposed, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features.

...read moreread less

Abstract: Room reverberation degrades speech signals and poses a major challenge to current monaural speech segregation systems. Previous research relies on inverse filtering as a front-end for partially restoring the harmonicity of the reverberant signal. We show that the inverse filtering approach is sensitive to different room configurations, hence undesirable in general reverberation conditions. We propose a supervised learning approach to map a set of harmonic features into a pitch based grouping cue for each time-frequency (T-F) unit. We use a speech segregation method to estimate an ideal binary T-F mask which retains the reverberant mixture in a local T-F unit if and only if the energy of target is stronger than interference energy. Results show that our approach improves the segregation performance considerably.

...read moreread less

84 citations

Book•

Spatial Audio Processing: MPEG Surround and Other Applications

[...]

Jeroen Breebaart, Christof Faller

02 Jan 2008

TL;DR: This book is a thorough reference to the 3GPP and MPEG Parametric Stereo standards and the MPEG Surround multi-channel audio coding standard and describes key developments in coding techniques, which is an important factor in the optimization of advanced entertainment, communications and signal processing applications.

...read moreread less

Abstract: This book collects a wealth of information about spatial audio coding into one comprehensible volume. It is a thorough reference to the 3GPP and MPEG Parametric Stereo standards and the MPEG Surround multi-channel audio coding standard. It describes key developments in coding techniques, which is an important factor in the optimization of advanced entertainment, communications and signal processing applications. Until recently, technologies for coding audio signals, such as redundancy reduction and sophisticated source and receiver models did not incorporate spatial characteristics of source and receiving ends. Spatial audio coding achieves much higher compression ratios than conventional coders. It does this by representing multi-channel audio signals as a downmix signal plus side information that describes the perceptually-relevant spatial information. Written by experts in spatial audio coding, Spatial Audio Processing: reviews psychoacoustics (the relationship between physical measures of sound and the corresponding percepts) and spatial audio sound formats and reproduction systems; brings together the processing, acquisition, mixing, playback, and perception of spatial audio, with the latest coding techniques; analyses algorithms for the efficient manipulation of multiple, discrete and combined spatial audio channels, including both MP3 and MPEG Surround; shows how the same insights on source and receiver models can also be applied for manipulation of audio signals, such as the synthesis of virtual auditory scenes employing head-related transfer function (HRTF) processing and stereo to N-channel audio upmix. Audio processing research engineers and audio coding research and implementation engineers will find this an insightful guide. Academic audio and psychoacoustic researchers, including post-graduate and third/fourth year students taking courses in signal processing, audio and speech processing, and telecommunications, will also benefit from the information inside.

...read moreread less

84 citations

Proceedings Article•DOI•

Estimating tempo, swing and beat locations in audio recordings

[...]

Jean Laroche¹•Institutions (1)

Advanced Technology Center¹

21 Oct 2001

TL;DR: Techniques for estimating the tempo and the swing, and locating the beats in audio recordings, under the assumption that the tempo is constant are presented.

...read moreread less

Abstract: The problem of estimating the tempo of audio recordings (the number of beats per minute, or BPM) has received an increasing amount of attention in the past few years. Applications include the synchronization of multiple audio tracks for simultaneous playback, "tempo-synchronous" audio effects, automatic looping of audio tracks etc. This article presents techniques for estimating the tempo and the swing, and locating the beats in audio recordings, under the assumption that the tempo is constant. The techniques rely on a preliminary transient detection stage where note onsets/offsets, percussion hits and other time-localized events are detected. This first step is followed by a maximum likelihood estimation of the tempo, swing and downbeat. Suggestions are given to minimize the computation load of the methods.

...read moreread less

84 citations

Patent•

A method and device for activating a voice-controlled function in a multi-station network through using both speaker-dependent and speaker-independent speech recognition

[...]

Franciscus Johannes Lambertus Dams¹, Piet Bernard Hesdahl¹, Jeroen G. Van Velden¹•Institutions (1)

Philips¹

08 Sep 1998-Journal of the Acoustical Society of America

TL;DR: In this paper, speaker-dependent and speaker-independent speech recognition in a voice-controlled multi-station network has been discussed and a fallback procedure is maintained for any particular station in order to cater for failure of the speakerdependent recognition, whilst allowing reverting to the improvement procedure.

...read moreread less

Abstract: A voice-controlled multi-station network has both speaker-dependent and speaker-independent speech recognition. Conditionally to recognizing items of an applicable vocabulary, the network executes a particular function. The method receives a call from a particular origin and executes speaker-independent speech recognition on the call. In an improvement procedure, in case of successful determination of what has been said, a template associated to the recognized speech items is stored and assigned to the origin. Next, speaker-dependent recognition is applied if feasible, for speech received from the same origin, using one or more templates associated to that station. Further, a fallback procedure to speaker-independent recognition is maintained for any particular station in order to cater for failure of the speaker-dependent recognition, whilst allowing reverting to the improvement procedure.

...read moreread less

84 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics