Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speech coding methods, standards, and applications

[...]

Jerry D. Gibson¹•Institutions (1)

University of California, Santa Barbara¹

05 Dec 2005-IEEE Circuits and Systems Magazine

TL;DR: In this article, the most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed, and specific networks and applications for each standard are included.

...read moreread less

Abstract: Voice is the preferred method of human communication. Although there have been times when it seemed that the voice communications problem was solved, such as when the PSTN was our primary network or later when digital cellular networks reached maturity, such is not the case today. This paper addresses the challenges and opportunities starting from the basic issues in speech coder design, developing the important speech coding techniques and standards, discussing current and future applications, outlining techniques for evaluating speech coder performance, and identifying research directions. The most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed. Particular networks and applications for each standard are included. Further, reflecting upon the issues and developments highlighted in this paper, it becomes evident that there is a diverse set of challenges and opportunities for research and innovation in speech coding and voice communications.

...read moreread less

80 citations

Patent•DOI•

System and method for locating program boundaries and commercial boundaries using audio categories

[...]

Serhan Dagtas¹, Nevenka Dimitrova¹•Institutions (1)

Philips¹

22 Dec 2000-Journal of the Acoustical Society of America

TL;DR: In this article, a system and method for locating program boundaries and commercial boundaries using audio categories is described. But the system is not suitable for use in a video signal processor, as it requires the use of an audio classifier controller that determines the rates of change of audio categories.

...read moreread less

Abstract: For use in a video signal processor, there is disclosed a system and method for locating program boundaries and commercial boundaries using audio categories. The system comprises an audio classifier controller that obtains information concerning the audio categories of the segments of an audio signal. Audio categories include such categories as silence, music, noise and speech. The audio classifier controller determines the rates of change of the audio categories. The audio classifier controller then compares each rate of change of the audio categories with a threshold value to locate the boundaries of the programs and commercials. The audio classifier controller is also capable of classifying at least one feature of an audio category change rate using a multifeature classifier to locate the boundaries of the programs and commercials.

...read moreread less

79 citations

Patent•

Digital speech coding circuit

[...]

John P. Bertrand

24 Dec 1985

TL;DR: In this paper, the authors proposed a digital speech coding circuit that makes use of linear predictive coding, vector quantization and difference, Huffman coding and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality.

...read moreread less

Abstract: A digital speech coding circuit makes use of linear predictive coding, vector quantization and difference, Huffman coding, and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality. The transmitter portion of the circuit comprises a series connection of a low pass filter, analog to digital converter, linear predictive coding module comprising five resonators for establishing five center frequencies and bandwidths of the analog speech, vector quantization module comprising binary representation of the likely combinations of resonances found in human speech, Huffman coding module, a variable bit rate to fixed bit rate converter, and optionally, an encryption module. Another branch of the transmitter circuit extends from the output of the analog to digital converter to the bit rate converter and comprises a series combination of an inverse filter and an excitation estimation module having parallel outputs respectively representative of a voiced/unvoiced signal, the excitation amplitude, and the excitation pulse position. The receiver portion of the circuit comprises a series connection of a fixed bit rate to variable bit rate converter, a bit unmapping module which produces separate outputs representative of the reflection coefficients and excitation of the speech, a synthesis filter which receives these outputs and produces a digital signal representative of the analog speech, a digital to analog converter, and a low pass filter.

...read moreread less

79 citations

Patent•

Directed audio for speech recognition

[...]

Ramy Sadek¹, Edward Dietz Crump¹, Joshua Pollack¹•Institutions (1)

Amazon.com¹

21 Sep 2012

TL;DR: In this article, the authors describe techniques for selecting audio from locations that are most likely to be sources of spoken commands or words, where directional audio signals are generated to emphasize sounds from different regions of an environment.

...read moreread less

Abstract: Techniques are described for selecting audio from locations that are most likely to be sources of spoken commands or words. Directional audio signals are generated to emphasize sounds from different regions of an environment. The directional audio signals are processed by an automated speech recognizer to generate recognition confidence values corresponding to each of the different regions, and the region resulting in the highest recognition confidence value is selected as the region most likely to contain a user who is speaking commands.

...read moreread less

79 citations

Proceedings Article•DOI•

Content-based classification and retrieval of audio

[...]

Tong Zhang¹, C.-C. Jay Kuo¹•Institutions (1)

University of Southern California¹

02 Oct 1998

TL;DR: An on-line audio classification and segmentation system is presented, where audio recordings are classified and segmented into speech, music, several types of environmental sounds and silence based on audio content analysis.

...read moreread less

Abstract: An on-line audio classification and segmentation system is presented in this research, where audio recordings are classified and segmented into speech, music, several types of environmental sounds and silence based on audio content analysis. This is the first step of our continuing work towards a general content-based audio classification and retrieval system. The extracted audio features include temporal curves of the energy function,the average zero- crossing rate, the fundamental frequency of audio signals, as well as statistical and morphological features of these curves. The classification result is achieved through a threshold-based heuristic procedure. The audio database that we have built, details of feature extraction, classification and segmentation procedures, and experimental results are described. It is shown that, with the proposed new system, audio recordings can be automatically segmented and classified into basic types in real time with an accuracy of over 90 percent. Outlines of further classification of audio into finer types and a query-by-example audio retrieval system on top of the coarse classification are also introduced.

...read moreread less

79 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics