scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A point process-based computational framework for the task of spotting keywords in continuous speech is formulated and it is found that even with a noisy and extremely sparse phonetic landmark-based point process representation, keywords can be spotted with accuracy levels comparable to recently studied hidden Markov model-based keyword spotting systems.
Abstract: We investigate the hypothesis that the linguistic content underlying human speech may be coded in the pattern of timings of various acoustic ldquoeventsrdquo (landmarks) in the speech signal. This hypothesis is supported by several strands of research in the fields of linguistics, speech perception, and neuroscience. In this paper, we put these scientific motivations to the test by formulating a point process-based computational framework for the task of spotting keywords in continuous speech. We find that even with a noisy and extremely sparse phonetic landmark-based point process representation, keywords can be spotted with accuracy levels comparable to recently studied hidden Markov model-based keyword spotting systems. We show that the performance of our keyword spotting system in the high-precision regime is better predicted by the median duration of the keyword rather than simply the number of its constituent syllables or phonemes. When we are confronted with very few (in the extreme case, zero) examples of the keyword in question, we find that constructing a keyword detector from its component syllable detectors provides a viable approach.

57 citations

Proceedings ArticleDOI
15 Mar 1999
TL;DR: The results indicate that the new paradigm in general and the auditory model in particular form a promising basis for the coding of both speech and audio at low bit rates.
Abstract: For speech coders which fall within the class of waveform coders, the reconstructed signal approaches the original with increasing bit rate. In such coders, the distortion criterion generally operates on the speech signal or a signal obtained by adaptive linear filtering of the speech signal. To satisfy computational and delay constraints, the distortion criterion must be reduced to a very simple approximation of the auditory system. This drawback of conventional approaches motivates a new speech coding paradigm in which the coding is performed in a domain where the single-letter squared-error criterion forms an accurate representation of perception. The new paradigm requires a model of the auditory periphery which is accurate, can be be inverted with relatively low computational effort, and which represents the signal with relatively few parameters. We develop such a model of the auditory periphery and discuss its suitability for speech coding. The results indicate that the new paradigm in general and our auditory model in particular form a promising basis for the coding of both speech and audio at low bit rates.

57 citations

Journal Article
TL;DR: The structure of SLS is presented and its latest developments during the MPEG standardization process, which provides a universal digital audio format for a variety of application domains including professional audio, Internet music, consumer electronics, broadcasting and others.
Abstract: As the latest extension of MPEG-4 Audio coding, MPEG-4 Lossless Audio Coding includes a scalable audio coding solution (SLS) that integrates the functionalities of lossless audio coding, perceptual audio coding, and fine granular scalable audio coding into a single coder framework while providing backward compatibility to MPEG Advanced Audio Coding (AAC) at the bit-stream level. Despite its abundant functionalities, SLS still achieves a compression performance that is comparable to state-of-the-art non-scalable lossless audio coding algorithms. As a result, SLS provides a universal digital audio format for a variety of application domains including professional audio, Internet music, consumer electronics, broadcasting and others. This paper presents the structure of SLS and its latest developments during the MPEG standardization process.

57 citations

Journal ArticleDOI
TL;DR: This work presents a novel technique, called variable-dimension vector quantization (VDVQ), where the input variable- dimension vector is directly quantized with a single universal codebook and demonstrates significant gain in subjective quality as well as in rate-distortion performance over prior indirect methods.
Abstract: In many signal compression applications, the evolution of the signal over time can be represented by a sequence of random vectors with varying dimensionality. Frequently, the generation of such variable-dimension vectors can be modeled as a random sampling of another signal vector with a large but fixed dimension. Efficient quantization of these variable-dimension vectors is a challenging task and a critical issue in speech coding algorithms based on harmonic spectral modeling. We introduce a simple and effective formulation of the problem and present a novel technique, called variable-dimension vector quantization (VDVQ), where the input variable-dimension vector is directly quantized with a single universal codebook. The application of VDVQ to low bit-rate speech coding demonstrates significant gain in subjective quality as well as in rate-distortion performance over prior indirect methods.

57 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: A novel non-negative dynamical system for sequences of sequences of such data, and its application to modeling speech and audio power spectra is described.
Abstract: Non-negative data arise in a variety of important signal processing domains, such as power spectra of signals, pixels in images, and count data. This paper introduces a novel non-negative dynamical system (NDS) for sequences of such data, and describes its application to modeling speech and audio power spectra. The NDS model can be interpreted both as an adaptation of linear dynamical systems (LDS) to non-negative data, and as an extension of non-negative matrix factorization (NMF) to support Markovian dynamics. Learning and inference algorithms were derived and experiments on speech enhancement were conducted by training sparse non-negative dynamical systems on speech data and adapting a noise model to the unknown noise condition. Results show that the model can capture the dynamics of speech in a useful way.

57 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108