scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
01 Sep 2005
TL;DR: In this paper, the authors present a method and apparatus for obtaining complete speech signals for speech recognition applications using a Hidden Markov Model (HMM) and a sequence of frames.
Abstract: The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

78 citations

Patent
Jie Su1, Samuel Oyetunji1
08 Mar 2013
TL;DR: In this article, a controller configured to be coupled to an audio speaker is presented, where the controller receives an audio input signal, and based on a displacement transfer function associated with the audio speaker, processes the audio input signals to generate an output audio signal communicated to the speaker.
Abstract: In accordance with these and other embodiments of the present disclosure, systems and methods may include a controller configured to be coupled to an audio speaker, wherein the controller receives an audio input signal, and based on a displacement transfer function associated with the audio speaker, processes the audio input signal to generate an output audio signal communicated to the audio speaker, wherein the displacement transfer function correlates an amplitude and a frequency of the audio input signal to an expected displacement of the audio speaker in response to the amplitude and the frequency of the audio input signal.

78 citations

Journal ArticleDOI
TL;DR: With a hidden Markov model (HMM) tracking the evolution of speech signal parameters, it is demonstrated how PLC is performed within a statistical signal processing framework and how the HMM is used to index a specially designed PLC module for the particular signal context, leading to signal-contingent PLC.
Abstract: As voice over IP proliferates, packet loss concealment (PLC) at the receiver has emerged as an important factor in determining voice quality of service. Through the use of heuristic variations of signal and parameter repetition and overlap-add interpolation to handle packet loss, conventional PLC systems largely ignore the dynamics of the statistical evolution of the speech signal, possibly leading to perceptually annoying artifacts. To address this problem, we propose the use of hidden Markov models for PLC. With a hidden Markov model (HMM) tracking the evolution of speech signal parameters, we demonstrate how PLC is performed within a statistical signal processing framework. Moreover, we show how the HMM is used to index a specially designed PLC module for the particular signal context, leading to signal-contingent PLC. Simulation examples, objective tests, and subjective listening tests are provided showing the ability of an HMM-based PLC built with a sinusoidal analysis/synthesis model to provide better loss concealment than a conventional PLC based on the same sinusoidal model for all types of speech signals, including onsets and signal transitions

78 citations

Journal ArticleDOI
TL;DR: The proposed method for time-delay estimation is found to perform better than the generalized cross-correlation (GCC) approach and a method for enhancement of speech is also proposed using the knowledge of the time- delay and the information of the excitation source.
Abstract: In this paper, we present a method of extracting the time-delay between speech signals collected at two microphone locations. Time-delay estimation from microphone outputs is the first step for many sound localization algorithms, and also for enhancement of speech. For time-delay estimation, speech signals are normally processed using short-time spectral information (either magnitude or phase or both). The spectral features are affected by degradations in speech caused by noise and reverberation. Features corresponding to the excitation source of the speech production mechanism are robust to such degradations. We show that these source features can be extracted reliably from the speech signal. The time-delay estimate can be obtained using the features extracted even from short segments (50-100 ms) of speech from a pair of microphones. The proposed method for time-delay estimation is found to perform better than the generalized cross-correlation (GCC) approach. A method for enhancement of speech is also proposed using the knowledge of the time-delay and the information of the excitation source.

78 citations

PatentDOI
TL;DR: In this paper, the effect of uncorrectable bit errors is reduced by adaptively smoothing the spectral parameters in a speech decoder, depending upon the number of errors detected during the error control decoding of the received data.
Abstract: The performance of digital communication over a noisy communication channel is improved. An encoder combines bit modulation with error control encoding to allow the decoder to use the redundancy in the error control codes to detect uncorrectable bit errors. This method improves the efficiency of the communication system since fewer bits are required for error control, leaving more bits available for data. In the context of a speech coding system, speech quality is improved without sacrificing robustness to bit errors. A bit prioritization method further improves performance over noisy channels. Individual bits in a set of quantizer values are arranged according to their sensitivity to bit errors. Error control codes having higher levels of redundancy are used to protect the most sensitive (highest priority) bits, while lower levels of redundancy are used to protest less sensitive bits. This method improves efficiency of the error control system, since only the highest priority data is encoded with the highest levels of redundancy. The effect of uncorrectable bit errors is reduced by adaptively smoothing the spectral parameters in a speech decoder. The amount of smoothing is varied depending upon the number of errors detected during the error control decoding of the received data. More smoothing is used when a large number of errors are detected, thereby reducing the perceived effect of any uncorrectable bit errors which may be present.

78 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108