Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion

[...]

Bowen Zhou¹, John H. L. Hansen¹•Institutions (1)

University of Colorado Boulder¹

20 Jun 2005-IEEE Transactions on Speech and Audio Processing

TL;DR: An efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion via the BIC is proposed, which is particularly successful for short segment turns of less than 2 s in duration.

...read moreread less

Abstract: In many speech and audio applications, it is first necessary to partition and classify acoustic events prior to voice coding for communication or speech recognition for spoken document retrieval. In this paper, we propose an efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan. In our formulation, Hotelling's T/sup 2/-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to perform the segmentation decision. The proposed algorithm also incorporates a variable-size increasing window scheme and a skip-frame test. Our experiments show that we can improve the final algorithm speed by a factor of 100 compared to that in Chen and Gopalakrishnan's while achieving a 6.7% reduction in the acoustic boundary miss rate at the expense of a 5.7% increase in false alarm rate using DARPA Hub4 1997 evaluation data. The approach is particularly successful for short segment turns of less than 2 s in duration. The results suggest that the proposed algorithm is sufficiently effective and efficient for audio stream segmentation applications.

...read moreread less

97 citations

Proceedings Article•DOI•

Perceptually based linear predictive analysis of speech

[...]

Hynek Hermansky, Brian A. Hanson, Hisashi Wakita

26 Apr 1985

TL;DR: It is demonstrated through analysis of both synthetic and natural speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F1, F2' concept of Carlson and Fant and the 3.5 Bark auditory integration concept of Chistovich, are well modeled by the PLP method.

...read moreread less

Abstract: A novel speech analysis method which uses several established psychoacoustic concepts, the perceptually based linear predictive analysis (PLP), models the auditory spectrum by the spectrum of the low-order all-pole model. The auditory spectrum is derived from the speech waveform by critical-band filtering, equal-loudness curve pre-emphasis, and intensity-loudness root compression. We demonstrate through analysis of both synthetic and natural speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F1, F2' concept of Carlson and Fant and the 3.5 Bark auditory integration concept of Chistovich, are well modeled by the PLP method. A complete speech analysis-synthesis system based on the PLP method is also described in the paper.

...read moreread less

97 citations

Proceedings Article•DOI•

An Approach to Information Hiding in Low Bit-Rate Speech Stream

[...]

Bo Xiao¹, Yongfeng Huang¹, Shanyu Tang²•Institutions (2)

Tsinghua University¹, London Metropolitan University²

08 Dec 2008

TL;DR: This is the first work adopting graph theory to improve the codebook partition while using QIM in low bit-rate streaming media and guarantees that every codeword is in the opposite part to its nearest neighbor, and the distortion is limited by a bound.

...read moreread less

Abstract: In this paper we introduce a novel codebook partition algorithm for quantization index modulation (QIM), which is applied to information hiding in instant low bit-rate speech stream. The QIM method divides the codebook into two parts, each representing '0' and '1' respectively. Instead of randomly partitioning the codebook, the relationship between codewords is considered. The proposed algorithm - complementary neighbor vertices (CNV) guarantees that every codeword is in the opposite part to its nearest neighbor, and the distortion is limited by a bound. The feasibility of CNV is proved with graph theory. Moreover, in our work the secret message is embedded in the field of vector quantization index of LPC coefficients, getting the benefit that the distortion due to QIM is lightened adaptively by the rest of the encoding procedure. Experiments on iLBC and G.723.1 verify the effectiveness of the proposed method. Both objective and subjective assessments show the proposed method only slightly decreases the speech quality to an indistinguishable degree. The hiding capacity is no less than 100 bps. To the best of our knowledge, this is the first work adopting graph theory to improve the codebook partition while using QIM in low bit-rate streaming media.

...read moreread less

96 citations

Proceedings Article•DOI•

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder

[...]

Cristina Garbacea¹, Aaron van den Oord, Yazhe Li, Felicia S. C. Lim², Alejandro Luebs², Oriol Vinyals, Thomas C. Walters - Show less +3 more•Institutions (2)

University of Michigan¹, Google²

12 May 2019

TL;DR: This work demonstrates that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality.

...read moreread less

Abstract: In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

...read moreread less

96 citations

Patent•

Scalable coding method for high quality audio

[...]

Louis Dunn Fielder¹, Stephen Decker Vernon¹•Institutions (1)

Dolby Laboratories¹

04 Aug 2000

TL;DR: In this paper, a scalable data structure for audio transmission includes core and augmentation layers, the former for carrying a first coding of an audio signal that places post decode noise beneath a desired noise spectrum, the latter for carrying offset data regarding the desired noises spectrum and data about coding of the audio signal.

...read moreread less

Abstract: Scalable coding of audio into a core layer in response to a desired noise spectrum established according to psychoacoustic principles supports coding augmentation data into augmentation layers in response to various criteria including offset of such desired noise spectrum. Compatible decoding provides a plurality of decoded resolutions from a single signal. Coding is preferably performed on subband signals generated according to spectral transform, quadrature mirror filtering, or other conventional processing of audio input. A scalable data structure for audio transmission includes core and augmentation layers, the former for carrying a first coding of an audio signal that places post decode noise beneath a desired noise spectrum, the later for carrying offset data regarding the desired noise spectrum and data about coding of the audio signal that places post decode noise beneath the desired noise spectrum shifted by the offset data.

...read moreread less

96 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics