scispace - formally typeset
Search or ask a question
Author

R.V. Cox

Bio: R.V. Cox is an academic researcher from AT&T Labs. The author has contributed to research in topics: Speech coding & Codec. The author has an hindex of 2, co-authored 2 publications receiving 196 citations.

Papers
More filters
Journal ArticleDOI
R.V. Cox1, P. Kroon
TL;DR: The attributes of speech coders such as bit rate, complexity, delay, and quality are described, which are applicable to low-bit-rate multimedia communications.
Abstract: The International Telecommunications Union (ITU) has standardized three speech coders which are applicable to low-bit-rate multimedia communications. ITU Rec. G.729 8 kb/s CS-ACELP has a 15 ms algorithmic codec delay and provides network-quality speech. It was originally designed for wireless applications, but is applicable to multimedia communications as well. Annex A of Rec. G.729 is a reduced-complexity version of the CS-ACELP coder. It was designed explicitly for simultaneous voice and data applications that are prevalent in low-bit-rate multimedia communications. These two coders use the same bitstream format and can interoperate. The ITU Rec. G.723.1 6.3 and 5.3 kb/s speech coder for multimedia communications was designed originally for low-bit-rate videophones. Its frame size of 30 ms and one-way algorithmic codec delay of 37.5 ms allow for a further reduction in bit rate compared to the G.729 coder. In applications where low delay is important, the delay of G.723.1 may be too large. However, if the delay is acceptable, G.723.1 provides a lower-complexity alternative to G.729 at the expense of a slight degradation in quality. This article describes the attributes of speech coders such as bit rate, complexity, delay, and quality. Then it discusses the basic concepts of the three new ITU coders by comparing their specific attributes. The second part of this article describes the standardization process for each of these coders.

123 citations

Journal ArticleDOI
R.V. Cox1
TL;DR: Three new speech coding recommendations from the ITU-T provide good coverage for a wide range of applications that have low bit rate requirements (i.e., from 5.3 to 8 kb/s).
Abstract: Many new speech coding standards have been created in the 10-year period 1987-1996. The author reviews the key attributes that determine what coder to select for different applications. The article then focuses on three new speech coding recommendations from the ITU-T, namely G.723.1, G.729, and Annex A of G.729. They provide good coverage for a wide range of applications that have low bit rate requirements (i.e., from 5.3 to 8 kb/s). In addition to bit rate, the article reviews their delay, complexity, and performance. Also reviewed are the history of these standards, and what considerations influenced the requirements each of these coders had to meet.

79 citations


Cited by
More filters
01 Aug 2002
TL;DR: The purpose of this document is to summarise the current understanding of the biometrics community of the best scientific practices for conducting technical performance testing towards the end of field performance estimation.
Abstract: The purpose of this document is to summarise the current understanding of the biometrics community of the best scientific practices for conducting technical performance testing towards the end of field performance estimation.

402 citations

Journal ArticleDOI
TL;DR: This work examines possible architectures for voice over IP and discusses measured Internet delay and loss characteristics, and considers the feasibility and expected quality of service of audio applications over IP networks such as the Internet.
Abstract: We discuss the architecture and technical viability of transporting real-time voice over packet-switched networks such as the Internet. The value of integrating voice and data networks onto a common platform is well known. The telephony industry has proposed the ATM standard as a means of upgrading the Internet to provide both real-time and data services. In contrast, voice services may be added to traditional IP networks that were originally designed for data transmission alone. We consider the feasibility and expected quality of service of audio applications over IP networks such as the Internet. In particular, we examine possible architectures for voice over IP and discuss measured Internet delay and loss characteristics.

242 citations

Journal ArticleDOI
TL;DR: The findings indicate that although voice services can be adequately provided by some ISPs, a significant number of Internet backbone paths lead to poor performance.
Abstract: As the Internet evolves into a ubiquitous communication infrastructure and provides various services including telephony, it will be expected to meet the quality standards achieved in the public switched telephone network. Our objective in this paper is to assess to what extent today's Internet meets this expectation. Our assessment is based on delay and loss measurements taken over wide-area backbone networks and uses subjective voice quality measures capturing the various impairments incurred. First, we compile the results of various studies into a single model for assessing the voice-over-IP (VoIP) quality. Then, we identify different types of typical Internet paths and study their VoIP performance. For each type of path, we identify those characteristics that affect the VoIP perceived quality. Such characteristics include the network loss and the delay variability that should be appropriately handled by the playout scheduling at the receiver. Our findings indicate that although voice services can be adequately provided by some ISPs, a significant number of Internet backbone paths lead to poor performance.

220 citations

Journal ArticleDOI
TL;DR: This work chronicles the development of rate-distortion theory and provides an overview of its influence on the practice of lossy source coding.
Abstract: Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 years of its existence, rate-distortion theory had relatively little impact on the methods and systems actually used to compress real sources. Today, however, rate-distortion theoretic concepts are an important component of many lossy compression techniques and standards. We chronicle the development of rate-distortion theory and provide an overview of its influence on the practice of lossy source coding.

213 citations

Journal ArticleDOI
TL;DR: This paper designs a deep learning (DL)-enabled semantic communication system for speech signals, named DeepSC-S, developed based on an attention mechanism by utilizing a squeeze-and-excitation (SE) network, which outperforms the traditional communications in both cases in terms of the speech signals metrics.
Abstract: Semantic communications could improve the transmission efficiency significantly by exploring the semantic information. In this paper, we make an effort to recover the transmitted speech signals in the semantic communication systems, which minimizes the error at the semantic level rather than the bit or symbol level. Particularly, we design a deep learning (DL)-enabled semantic communication system for speech signals, named DeepSC-S. In order to improve the recovery accuracy of speech signals, especially for the essential information, DeepSC-S is developed based on an attention mechanism by utilizing a squeeze-and-excitation (SE) network. The motivation behind the attention mechanism is to identify the essential speech information by providing higher weights to them when training the neural network. Moreover, in order to facilitate the proposed DeepSC-S for dynamic channel environments, we find a general model to cope with various channel conditions without retraining. Furthermore, we investigate DeepSC-S in telephone systems as well as multimedia transmission systems to verify the model adaptation in practice. The simulation results demonstrate that our proposed DeepSC-S outperforms the traditional communications in both cases in terms of the speech signals metrics, such as signal-to-distortion ration and perceptual evaluation of speech distortion. Besides, DeepSC-S is more robust to channel variations, especially in the low signal-to-noise (SNR) regime.

195 citations