Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Adaptive transform coding of speech signals

[...]

R. Zelinski, P. Noll¹•Institutions (1)

Bell Labs¹

01 Aug 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The main result is that this adaptive transform coder performs better than all known nonpitch-tracking coding schemes; it extends the range of speech waveform coding to lower bit rates and closes the gap between vocoders and predictive waveform coders.

...read moreread less

Abstract: This paper discusses speech coding systems based upon transform coding (TC). It compares several transforms and shows that the cosine transform leads to a nearly optimum performance for almost all speech sounds. Various adaptive coding strategies are then investigated, and a coding scheme is proposed that is based on a nonadaptive discrete cosine transform (DCT), on an adaptive bit assignment, and on adaptive quantization. The adaptation is controlled by a short-term basis spectrum that is derived from the transform coefficients prior to coding and transmission and that is transmitted as side information to the receiver. The main result is that this adaptive transform coder performs better than all known nonpitch-tracking coding schemes; it extends the range of speech waveform coding to lower bit rates and closes the gap between vocoders and predictive waveform coders.

...read moreread less

340 citations

Journal Article•DOI•

Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

[...]

Patrick A. Naylor¹, A. Kounoudes², Jon Gudnason¹, Mike Brookes¹•Institutions (2)

Imperial College London¹, Philips²

01 Jan 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) is automatic and operates using the speech signal alone without the need for an EGG signal for automatic estimation of glottal closure instants (GCIs) in voiced speech.

...read moreread less

Abstract: We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified

...read moreread less

337 citations

Journal Article•DOI•

Experimental evaluation of features for robust speaker identification

[...]

Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: This correspondence presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification, and it is shown that performance differences between the basic features is small, and the major gains are due to the channel Compensation techniques.

...read moreread less

Abstract: This correspondence presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification. The goal is to keep all processing and classification steps constant and to vary only the features and compensations used to allow a controlled comparison. A general, maximum-likelihood classifier based on Gaussian mixture densities is used as the classifier, and experiments are conducted on the King speech database, a conversational, telephone-speech database. The features examined are mel-frequency and linear-frequency filterbank cepstral coefficients, linear prediction cepstral coefficients, and perceptual linear prediction (PLP) cepstral coefficients. The channel compensation techniques examined are cepstral mean removal, RASTA processing, and a quadratic trend removal technique. It is shown for this database that performance differences between the basic features is small, and the major gains are due to the channel compensation techniques. The best "across-the-divide" recognition accuracy of 92% is obtained for both high-order LPC features and band-limited filterbank features. >

...read moreread less

336 citations

Proceedings Article•DOI•

Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition

[...]

Jun Deng, Zixing Zhang, Erik Marchi, Björn Schuller¹•Institutions (1)

University of Passau¹

02 Sep 2013

TL;DR: A sparse auto encoder method for feature transfer learning for speech emotion recognition using a common emotion-specific mapping rule from a small set of labelled data in a target domain to improve the performance relative to learning each source domain independently.

...read moreread less

Abstract: In speech emotion recognition, training and test data used for system development usually tend to fit each other perfectly, but further 'similar' data may be available. Transfer learning helps to exploit such similar data for training despite the inherent dissimilarities in order to boost a recogniser's performance. In this context, this paper presents a sparse auto encoder method for feature transfer learning for speech emotion recognition. In our proposed method, a common emotion-specific mapping rule is learnt from a small set of labelled data in a target domain. Then, newly reconstructed data are obtained by applying this rule on the emotion-specific data in a different domain. The experimental results evaluated on six standard databases show that our approach significantly improves the performance relative to learning each source domain independently.

...read moreread less

335 citations

Journal Article•DOI•

ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications

[...]

A. Benyassine¹, E. Shlomot, Huan-Yu Su, D. Massaloux, Claude Lamblin, J.-P. Petit - Show less +2 more•Institutions (1)

Rockwell Automation¹

01 Sep 1997-IEEE Communications Magazine

TL;DR: Annan B defines a low-bit-rate silence compression scheme designed and optimized to work in conjunction with both the full version of G.729 and its low-complexity Annex A, which enables the achievement of bit-rate savings for coded speech at average rates as low as 4 kb/s during normal speech conversation while maintaining reproduction quality.

...read moreread less

Abstract: This article describes Annex B to ITU-T Recommendation G.729. Annex B defines a low-bit-rate silence compression scheme designed and optimized to work in conjunction with both the full version of G.729 and its low-complexity Annex A. To achieve good quality low-bit-rate silence compression, a robust frame-based voice activity detector module is essential to detect inactive voice frames, also called silence or background noise frames. For these detected inactive voice frames, a discontinuous transmission module measures the changes over time of the inactive voice signal characteristics and decides whether a new silence information descriptor frame should be sent to maintain the reproduction quality of the background noise at the receiving end. If such a frame is needed, the spectrum and energy parameters describing the perceptual characteristics of the background noise are efficiently coded and transmitted using 15 b/frame. At the receiving end, the comfort noise generation module regenerates the output background noise using transmitted updates or previously available parameters. The synthesized background noise is obtained by linear predictive filtering of a locally generated pseudo-white excitation signal of a controlled level. This method of coding the background noise enables the achievement of bit-rate savings for coded speech at average rates as low as 4 kb/s during normal speech conversation while maintaining reproduction quality.

...read moreread less

332 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics