Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Features for voice activity detection: a comparative analysis

[...]

Simon Graf¹, Simon Graf², Tobias Herbig¹, Markus Buck¹, Gerhard Schmidt² - Show less +1 more•Institutions (2)

Nuance Communications¹, University of Kiel²

11 Nov 2015-EURASIP Journal on Advances in Signal Processing

TL;DR: A structured overview of several established VAD features that target at different properties of speech, categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features.

...read moreread less

Abstract: In many speech signal processing applications, voice activity detection (VAD) plays an essential role for separating an audio stream into time intervals that contain speech activity and time intervals where speech is absent. Many features that reflect the presence of speech were introduced in literature. However, to our knowledge, no extensive comparison has been provided yet. In this article, we therefore present a structured overview of several established VAD features that target at different properties of speech. We categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features. The importance of temporal context is discussed in relation to latency restrictions imposed by different applications. Our analyses allow for selecting promising VAD features and finding a reasonable trade-off between performance and complexity.

...read moreread less

66 citations

Journal Article•

Efficient Cross-Fade Windows for Transitions between LPC-Based and Non-LPC Based Audio Coding

[...]

Jeremie Lecomte, Philippe Gournay, Ralf Geiger, Bruno Bessette, Max Neuendorf - Show less +1 more

01 May 2009-Journal of The Audio Engineering Society

TL;DR: A new set of cross-fade windows designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes are presented.

...read moreread less

Abstract: The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-LPC based coding mode (based on AAC) operating in the transform domain and an LPC-based coding mode (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wLPT). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-LPC based coding. This paper presents the new set of windows which was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes.

...read moreread less

66 citations

Patent•

Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder

[...]

Amitava Das¹•Institutions (1)

Qualcomm¹

26 Feb 1999

TL;DR: In this paper, a closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low rate, frequency-domain encoding mode, and a closedloop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the Coder.

...read moreread less

Abstract: A closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low-rate, frequency-domain coding mode, and a closed-loop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the coder. Transition speech (i.e., from unvoiced speech to voiced speech, or vice versa) frames are encoded with the high-rate, time-domain coding mode, which may be a CELP coding mode. Voiced speech frames are encoded with the low-rate, frequency-domain coding mode, which may be a harmonic coding mode. Phase parameters are not encoded by the frequency-domain coding mode, and are instead modeled in accordance with, e.g., a quadratic phase model. For each speech frame encoded with the frequency-domain coding mode, the initial phase value is taken to be the initial phase value of the immediately preceding speech frame encoded with the frequency-domain coding mode. If the immediately preceding speech frame was encoded with the time-domain coding mode, the initial phase value of the current speech frame is computed from the decoded speech frame information of the immediately preceding, time-domain-encoded speech frame. Each speech frame encoded with the frequency-domain coding mode may be compared with the corresponding input speech frame to obtain a performance measure. If the performance measure falls below a predefined threshold value, the input speech frame is encoded with the time-domain coding mode.

...read moreread less

66 citations

Patent•

Signal processing apparatus having voice activity detection unit and related signal processing methods

[...]

Hung Chia-Yu¹, Yeh Tsung-Li¹, Tu Yi-Chang¹•Institutions (1)

Realtek¹

13 Sep 2012

TL;DR: In this article, a speech recognition system and a voice activity detection unit are coupled to the speech recognition, and the VADU is used to detect whether an audio signal is a voice signal and accordingly generate a voice-activity detection result to control whether the system should perform speech recognition upon the audio signal.

...read moreread less

Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.

...read moreread less

66 citations

Patent•DOI•

Speech coding and decoding system

[...]

Tomohiko Taniguchi¹, Mark Johnson¹•Institutions (1)

Fujitsu¹

18 Sep 1991-Journal of the Acoustical Society of America

TL;DR: A CELP type speech coding system is provided with an arithmetic processing unit which transforms a perceptual weighted input speech signal vector AX to a vector t AAX, a sparse adaptive codebook which stores a plurality of pitch prediction residual vectors P sparsed by a sparse unit, and a multiplying unit which multiplies the successively read out vectors P and the output tAAX from the arithmeticprocessing unit.

...read moreread less

Abstract: A speech coding and decoding system, the system is operated under a known code-excited linear prediction (CELP) coding method. The CELP coding is achieved by selecting an optimum pitch vector P from an adaptive codebook and the corresponding first gain, and at the same time, selecting an optimum code vector from a stochastic codebook and the corresponding second gain. The system of the present invention is featured by a weighted orthogonalization transforming unit introduced therein. The perceptually weighted code vector AC is not used as is, as usual, but after the transformation thereof into a perceptually weighted code vector AC' by the above unit; the vector AC' being made orthogonal to the optimum perceptually weighted pitch vector AP.

...read moreread less

65 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics