scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A structured overview of several established VAD features that target at different properties of speech, categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features.
Abstract: In many speech signal processing applications, voice activity detection (VAD) plays an essential role for separating an audio stream into time intervals that contain speech activity and time intervals where speech is absent. Many features that reflect the presence of speech were introduced in literature. However, to our knowledge, no extensive comparison has been provided yet. In this article, we therefore present a structured overview of several established VAD features that target at different properties of speech. We categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features. The importance of temporal context is discussed in relation to latency restrictions imposed by different applications. Our analyses allow for selecting promising VAD features and finding a reasonable trade-off between performance and complexity.

66 citations

Journal Article
TL;DR: A new set of cross-fade windows designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes are presented.
Abstract: The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-LPC based coding mode (based on AAC) operating in the transform domain and an LPC-based coding mode (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wLPT). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-LPC based coding. This paper presents the new set of windows which was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes.

66 citations

Patent
Amitava Das1
26 Feb 1999
TL;DR: In this paper, a closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low rate, frequency-domain encoding mode, and a closedloop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the Coder.
Abstract: A closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low-rate, frequency-domain coding mode, and a closed-loop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the coder. Transition speech (i.e., from unvoiced speech to voiced speech, or vice versa) frames are encoded with the high-rate, time-domain coding mode, which may be a CELP coding mode. Voiced speech frames are encoded with the low-rate, frequency-domain coding mode, which may be a harmonic coding mode. Phase parameters are not encoded by the frequency-domain coding mode, and are instead modeled in accordance with, e.g., a quadratic phase model. For each speech frame encoded with the frequency-domain coding mode, the initial phase value is taken to be the initial phase value of the immediately preceding speech frame encoded with the frequency-domain coding mode. If the immediately preceding speech frame was encoded with the time-domain coding mode, the initial phase value of the current speech frame is computed from the decoded speech frame information of the immediately preceding, time-domain-encoded speech frame. Each speech frame encoded with the frequency-domain coding mode may be compared with the corresponding input speech frame to obtain a performance measure. If the performance measure falls below a predefined threshold value, the input speech frame is encoded with the time-domain coding mode.

66 citations

Patent
13 Sep 2012
TL;DR: In this article, a speech recognition system and a voice activity detection unit are coupled to the speech recognition, and the VADU is used to detect whether an audio signal is a voice signal and accordingly generate a voice-activity detection result to control whether the system should perform speech recognition upon the audio signal.
Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.

66 citations

PatentDOI
TL;DR: A CELP type speech coding system is provided with an arithmetic processing unit which transforms a perceptual weighted input speech signal vector AX to a vector t AAX, a sparse adaptive codebook which stores a plurality of pitch prediction residual vectors P sparsed by a sparse unit, and a multiplying unit which multiplies the successively read out vectors P and the output tAAX from the arithmeticprocessing unit.
Abstract: A speech coding and decoding system, the system is operated under a known code-excited linear prediction (CELP) coding method. The CELP coding is achieved by selecting an optimum pitch vector P from an adaptive codebook and the corresponding first gain, and at the same time, selecting an optimum code vector from a stochastic codebook and the corresponding second gain. The system of the present invention is featured by a weighted orthogonalization transforming unit introduced therein. The perceptually weighted code vector AC is not used as is, as usual, but after the transformation thereof into a perceptually weighted code vector AC' by the above unit; the vector AC' being made orthogonal to the optimum perceptually weighted pitch vector AP.

65 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108