scispace - formally typeset
Search or ask a question
Author

Roch Lefebvre

Other affiliations: Philips
Bio: Roch Lefebvre is an academic researcher from Université de Sherbrooke. The author has contributed to research in topics: Speech coding & Codec. The author has an hindex of 25, co-authored 71 publications receiving 1921 citations. Previous affiliations of Roch Lefebvre include Philips.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the adaptive multirate wideband (AMR-WB) speech codec was selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services.
Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

312 citations

Proceedings ArticleDOI
19 Apr 2009
TL;DR: This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding, which results in a codec that exhibits consistently high quality for speech, music and mixed audio content.
Abstract: Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech, music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding.

108 citations

Proceedings ArticleDOI
18 Mar 2005
TL;DR: This paper presents a hybrid audio coding algorithm integrating an LP-based coding technique and a more general transform coding technique, which has consistently high performance for both speech and music signals.
Abstract: This paper presents a hybrid audio coding algorithm integrating an LP-based coding technique and a more general transform coding technique. ACELP is used in LP-based coding mode, whereas algebraic TCX is used in transform coding mode. The algorithm extends previously published work on ACELP/TCX coding in several ways. The frame length is increased to 80 ms, adaptive multi-length sub-frames are used with overlapping windowing, an extended multi-rate algebraic VQ is applied to the TCX spectrum to avoid quantizer saturation, and noise shaping is improved. Results show that the proposed hybrid coder has consistently high performance for both speech and music signals.

105 citations

Proceedings ArticleDOI
19 Apr 1994
TL;DR: This paper describes the application of transform coded excitation (TCX) coding to encoding wideband speech and audio signals in the bit rate range of 16 k bits/s to 32 kbits/s and proposes novel quantization procedures including inter-frame prediction in the frequency domain.
Abstract: This paper describes the application of transform coded excitation (TCX) coding to encoding wideband speech and audio signals in the bit rate range of 16 kbits/s to 32 kbits/s. The approach uses a combination of time domain (linear prediction; pitch prediction) and frequency domain (transform coding; dynamic bit allocation) techniques, and utilizes a synthesis model similar to that of linear prediction coders such as CELP. However, at the encoder, the high complexity analysis-by-synthesis technique is bypassed by directly quantizing the so-called target signal in the frequency domain. The innovative excitation is derived at the decoder by inverse filtering the quantized target signal. The algorithm is intended for applications whereby a large number of bits is available for the innovative excitation. The TCX algorithm is utilized to encode wideband speech and audio signals with a 50-7000 Hz bandwidth. Novel quantization procedures including inter-frame prediction in the frequency domain are proposed to encode the target signal. The proposed algorithm achieves very high quality for speech at 16 kbits/s, and for music at 24 kbits/s. >

93 citations

Journal Article
TL;DR: All aspects of this standardization eort are outlined, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs.
Abstract: In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) nalized the new MPEG-D Unied Speech and Audio Coding standard The new codec brings together the previously separated worlds of general audio coding and speech coding It does so by integrating elements from audio coding and speech coding into a unied system The present publication outlines all aspects of this standardization eort, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs

88 citations


Cited by
More filters
Journal ArticleDOI

1,008 citations

Journal Article
TL;DR: This paper has compared several techniques for edge detection in image processing and found that the most efficient and scalable approach is the one that focuses on directly detecting the edges of an image.
Abstract: Edge detection is one of the most commonly used operations in image analysis, andthere are probably more algorithms in the literature for enhancing and detecting edgesthan any other single subject. The reason for this is that edges form the outline of anobject. An edge is the boundary between an object and the background, and indicatesthe boundary between overlapping objects. This means that if the edges in an image canbe identified accurately, all of the objects can be located and basic properties such asarea, perimeter, and shape can be measured. Since computer vision involves theidentification and classification of objects in an image, edge detections is an essential tool. In this paper, we have compared several techniques for edge detection in image processing.

603 citations

Journal ArticleDOI
TL;DR: Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.
Abstract: This paper presents a new feature extraction algorithm called power normalized Cepstral coefficients (PNCC) that is motivated by auditory processing. Major new features of PNCC processing include the use of a power-law nonlinearity that replaces the traditional log nonlinearity used in MFCC coefficients, a noise-suppression algorithm based on asymmetric filtering that suppresses background excitation, and a module that accomplishes temporal masking. We also propose the use of medium-time power analysis in which environmental parameters are estimated over a longer duration than is commonly used for speech, as well as frequency smoothing. Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing, and without degrading the recognition accuracy that is observed while training and testing using clean speech. PNCC processing also provides better recognition accuracy in noisy environments than techniques such as vector Taylor series (VTS) and the ETSI advanced front end (AFE) while requiring much less computation. We describe an implementation of PNCC using "online processing" that does not require future knowledge of the input.

420 citations

Patent
06 Feb 2006
TL;DR: In this article, a rotatable microphone boom assembly includes a headband assembly, an earcup assembly and a power source assembly, which are mounted on opposite sides of a rotation axis to maintain a consistent orientation on the boom assembly with respect to a user.
Abstract: A headset terminal for speech applications includes a headband assembly, an earcup assembly and a power source assembly. Processing circuitry is positioned in at least one of the earcup assembly and the power source assembly and includes speech processing circuitry for recognizing and synthesizing speech. A radio communicates with a central system to process the activity information of the headset terminal user. A rotatable microphone boom assembly includes controls mounted on opposite sides of a rotation axis to maintain a consistent orientation on the boom assembly with respect to the head of a user. The boom assembly snaps together with the earcup assembly to rotate. The headband assembly includes at least one transverse band and a sliding arm coupled to the earcup assembly for dynamically adjusting the position of the earcup assembly. A latch of the power source assembly snaps into position to secure it with the assembly and slides between latched and unlatched positions to secure a battery.

331 citations