scispace - formally typeset
Reference EntryDOI

Speech Coding: Fundamentals and Applications

TLDR
An overview of the most widely used algorithms, standards, and applications of wideband and narrowband speech coding, including perceptually transparent multi-rate and embedded coding, is presented.
Abstract
In this chapter, we present an overview of the most widely used algorithms, standards, and applications of wideband and narrowband speech coding. Algorithms for speech coding are classified into four broad headings: (1) waveform coding techniques (including PCM, companded PCM, and DPCM), which are typically used for landline telephony, internet telephony, and secure military communications; (2) subband coding, including perceptually transparent multi-rate and embedded coding which is mainly used for internet and digital audio applications; (3) linear predictive analysis by synthesis coding (LPC-AS) algorithms, including multipulse LPC, CELP, SELP, VSELP, and low-delay CELP, which are typically used for digital cellular and telephony; and (4) LPC vocoders, including advanced vocoder algorithms (e.g., MELP, MBE, and PWI) are used for applications such as secure telephony and satellite telephony. Applications in areas such as voiceover IP (VoIP) and digital cellular are emerging and require a speech coder to gracefully adapt to rapidly changing channel conditions—a need that is met by embedded and multirate speech coders associated with joint source-channel coding algorithms. Measures of speech coder perceptual quality include subjective measures of intelligibility (DRT and DALT) and naturalness (MOS and DAM), as well as objective measures such as segmental SNR, Bark spectral distortion, PSQM, and PESQ. Speech coding standards are set by organizations including the ITU (for landline telephony), MPEG (for multimedia applications), ETSI (for European digital cellular), TIA (for U.S. digital cellular), and DDVPC (for United States military applications). Keywords: speech coding; PCM; subband coding; CELP; LPC; digital cellular; multimedia; voiceover IP; mean opinion score (MOS)

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Digital processing of speech signals

Book

Digital speech processing, synthesis, and recognition

貞煕 古井
TL;DR: This paper presents principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing.
Journal ArticleDOI

2007 Presidential Address—The End(s) of Testing:

TL;DR: In the 2007 Presidential Address for the American Educational Research Association (AERA), this paper, the authors proposed that secondary school assessment be modified to focus students on acquiring concrete Qualifications that certify important accomplishments, with a wide choice of Qualification areas.
Journal ArticleDOI

Speech Technology Progress Based on New Machine Learning Paradigm

TL;DR: An insight is given into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems.
Proceedings ArticleDOI

Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization

TL;DR: In this paper, a collaborative quantization (CQ) scheme is proposed to jointly learn the codebook of LPC coefficients and the corresponding residuals, which achieves much higher quality than its predecessor at 9 kbps with even lower model complexity.
References
More filters
Journal ArticleDOI

A 2dvEv- bit distributed algorithm for the directed Euler trail problem

TL;DR: The algorithm can be used as a building block for solving other distributed graph problems, and can be slightly modified to run on a strongly-connected diagraph for generating the existent Euler trail or to report that no Euler trails exist.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Book

Digital Processing of Speech Signals

TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.
Book

An Introduction to the Psychology of Hearing

TL;DR: In this paper, the nature of sound and the structure and function of the auditory system are discussed, including absolute thresholds, frequency selectivity, masking and the critical band, and the perception of loudness.
Journal ArticleDOI

Variable-rate variable-power MQAM for fading channels

TL;DR: There is a constant power gap between the spectral efficiency of the proposed technique and the channel capacity, and this gap is a simple function of the required bit-error rate (BER).