scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed, and specific networks and applications for each standard are included.
Abstract: Voice is the preferred method of human communication. Although there have been times when it seemed that the voice communications problem was solved, such as when the PSTN was our primary network or later when digital cellular networks reached maturity, such is not the case today. This paper addresses the challenges and opportunities starting from the basic issues in speech coder design, developing the important speech coding techniques and standards, discussing current and future applications, outlining techniques for evaluating speech coder performance, and identifying research directions. The most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed. Particular networks and applications for each standard are included. Further, reflecting upon the issues and developments highlighted in this paper, it becomes evident that there is a diverse set of challenges and opportunities for research and innovation in speech coding and voice communications.

80 citations

PatentDOI
TL;DR: In this article, a system and method for locating program boundaries and commercial boundaries using audio categories is described. But the system is not suitable for use in a video signal processor, as it requires the use of an audio classifier controller that determines the rates of change of audio categories.
Abstract: For use in a video signal processor, there is disclosed a system and method for locating program boundaries and commercial boundaries using audio categories. The system comprises an audio classifier controller that obtains information concerning the audio categories of the segments of an audio signal. Audio categories include such categories as silence, music, noise and speech. The audio classifier controller determines the rates of change of the audio categories. The audio classifier controller then compares each rate of change of the audio categories with a threshold value to locate the boundaries of the programs and commercials. The audio classifier controller is also capable of classifying at least one feature of an audio category change rate using a multifeature classifier to locate the boundaries of the programs and commercials.

79 citations

Patent
24 Dec 1985
TL;DR: In this paper, the authors proposed a digital speech coding circuit that makes use of linear predictive coding, vector quantization and difference, Huffman coding and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality.
Abstract: A digital speech coding circuit makes use of linear predictive coding, vector quantization and difference, Huffman coding, and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality. The transmitter portion of the circuit comprises a series connection of a low pass filter, analog to digital converter, linear predictive coding module comprising five resonators for establishing five center frequencies and bandwidths of the analog speech, vector quantization module comprising binary representation of the likely combinations of resonances found in human speech, Huffman coding module, a variable bit rate to fixed bit rate converter, and optionally, an encryption module. Another branch of the transmitter circuit extends from the output of the analog to digital converter to the bit rate converter and comprises a series combination of an inverse filter and an excitation estimation module having parallel outputs respectively representative of a voiced/unvoiced signal, the excitation amplitude, and the excitation pulse position. The receiver portion of the circuit comprises a series connection of a fixed bit rate to variable bit rate converter, a bit unmapping module which produces separate outputs representative of the reflection coefficients and excitation of the speech, a synthesis filter which receives these outputs and produces a digital signal representative of the analog speech, a digital to analog converter, and a low pass filter.

79 citations

Patent
21 Sep 2012
TL;DR: In this article, the authors describe techniques for selecting audio from locations that are most likely to be sources of spoken commands or words, where directional audio signals are generated to emphasize sounds from different regions of an environment.
Abstract: Techniques are described for selecting audio from locations that are most likely to be sources of spoken commands or words. Directional audio signals are generated to emphasize sounds from different regions of an environment. The directional audio signals are processed by an automated speech recognizer to generate recognition confidence values corresponding to each of the different regions, and the region resulting in the highest recognition confidence value is selected as the region most likely to contain a user who is speaking commands.

79 citations

Proceedings ArticleDOI
02 Oct 1998
TL;DR: An on-line audio classification and segmentation system is presented, where audio recordings are classified and segmented into speech, music, several types of environmental sounds and silence based on audio content analysis.
Abstract: An on-line audio classification and segmentation system is presented in this research, where audio recordings are classified and segmented into speech, music, several types of environmental sounds and silence based on audio content analysis. This is the first step of our continuing work towards a general content-based audio classification and retrieval system. The extracted audio features include temporal curves of the energy function,the average zero- crossing rate, the fundamental frequency of audio signals, as well as statistical and morphological features of these curves. The classification result is achieved through a threshold-based heuristic procedure. The audio database that we have built, details of feature extraction, classification and segmentation procedures, and experimental results are described. It is shown that, with the proposed new system, audio recordings can be automatically segmented and classified into basic types in real time with an accuracy of over 90 percent. Outlines of further classification of audio into finer types and a query-by-example audio retrieval system on top of the coarse classification are also introduced.

79 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108