scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
Hisashi Kobayashi1, L. R. Bahl1
TL;DR: Predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images and techniques for encoding the prediction error pattern to achieve compression of data are presented.
Abstract: This paper deals with predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images. Part I discusses algorithms for prediction. A predictor transforms the two-dimensional dependence in the original data into a form which can be handled by coding techniques for one-dimensional data. The implementation and performance of a fixed predictor, an adaptive predictor with finite memory, and an adaptive linear predictor are discussed. Results of experiments performed on various types of scanned images are also presented. Part II deals with techniques for encoding the prediction error pattern to achieve compression of data.

61 citations

PatentDOI
TL;DR: An excitation vector of the previous frame stored in an adaptive codebook is cut out with a selected pitch period and is repeated until one frame is formed, by which a periodic component codevector is generated.
Abstract: An excitation vector of the previous frame stored in an adaptive codebook is cut out with a selected pitch period. The excitation vector thus cut out is repeated until one frame is formed, by which a periodic component codevector is generated. An optimum pitch period is searched for so that distortion of a reconstructed speech obtained by exciting a linear predictive synthesis filter with the periodic component codevector is minimized. Thereafter, a random codevector selected from a random codebook is cut out with the optimum pitch period and is repeated until one frame is formed, by which a repetitious random codevector is generated. The random codebook is searched for a random codevector which minimizes the distortion of the reconstructed speech which is provided by exciting the synthesis filter with the repetitious random codevector.

61 citations

Dissertation
01 Jan 2006
TL;DR: In this article, the role of F0 and speech rate (word duration) in age perception was investigated and it was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age.
Abstract: Speaker age is an important paralinguistic feature in speech which has to be considered in the study of phonetic variation. Knowledge about this feature may be used to improve speech technology applications, e.g. automatic speech recognition and speech synthesis. The present thesis describes six studies of several phonetic aspects of age-related variation in speech. As the speech production mechanism changes from young adulthood to old age, speech is affected in numerous ways. Human perception of speaker age is based on cues such as pitch, speech rate and voice quality, and is fairly accurate. However, it is still unclear which cues are the most important ones. The first study included in this thesis investigated the role of F0 and speech rate (word duration) in age perception. It was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age. In the second study, two stimulus types of various lengths were compared. Results indicated that while longer stimulus duration (regardless of speech type) seems to improve the age estimation of females, spontaneous speech (regardless of duration) appears to contain more important cues for perception of male speaker age. In the next two studies, several automatic estimators of speaker age were built, none of which reached the same accuracy as humans. Important features in machine perception of age were also investigated. It was found that prosodic features seem to be more important in the estimation of female age, while spectral features (e.g. F2 ) appear to be more important for male age. Although several acoustic correlates of speaker age are known, their relative importance has not yet been established. The next study analysed 161 features, automatically extracted from segments in six words produced by 527 speakers. Normalised means were used to ensure that the features could be compared directly. The most important acoustic correlates of speaker age were identified to be speech rate (segment duration) and intensity range. However, F0 and some spectral measures (e.g. F1 and F2 ) may also, if used in combination with other features, be important correlates of age. Synthetic speech may sound more natural if speaker age is included as a parameter. The final study developed a research tool which used data- driven formant synthesis and age-weighted linear interpolation to simulate an age between the ages of any two of four female differently aged reference speakers. Evaluation of the tool showed that speaker age may in fact be simulated using formant synthesis. The tool will be used in further studies of analysis by synthesis of speaker age.

61 citations

Proceedings ArticleDOI
22 Apr 2001
TL;DR: This work develops a recovery method, called DSPWR (Double Sided Pitch Waveform Replication), which is able to tolerate a much higher packet loss rate and develops an adaptive mechanism that can select the recovery method with the minimal complexity in accordance with different packet loss rates encountered.
Abstract: There are a number of packet-loss recovery techniques proposed for streaming audio applications. However, there are few works that are able to exploit the tradeoff between the recovery quality and the computational complexity. We develop a recovery method, called DSPWR (Double Sided Pitch Waveform Replication) which is able to tolerate a much higher packet loss rate. In essence, DSPWR is composed of several procedures devised to improve the quality of the reconstructed speech. It is noted that a more sophisticated recovery scheme that can tolerate a higher degree of packet loss in general requires a larger computational cost. In view of this, we evaluate the quality of the reconstructed speech under different packet loss rates for various receiver-based recovery methods, and compare the computational complexity among these methods. Under the acceptable speech quality whose MOS (Mean Opinion Score) is above 3.5, we develop an adaptive mechanism that can select the recovery method with the minimal complexity in accordance with different packet loss rates encountered. To conduct real experiments in the networks, we implement these recovery methods and evaluate the performance of DSPWR devised and the adaptive recovery techniques empirically. As validated by our experimental results, the adaptive mechanism is able to strike a compromise between the computational overhead and the quality of the speech desired.

61 citations

PatentDOI
TL;DR: In this article, a transcoder (TRCU1, TRCU2) was proposed for preventing tandem coding of speech in a mobile to mobile (MS1, MS2) call within a mobile communication system which employs a speech coding method for reducing transmission rate on the radio path.
Abstract: The invention relates to a transcoder (TRCU1, TRCU2) having means for preventing tandem coding of speech in a mobile to mobile (MS1, MS2) call within a mobile communication system which employs a speech coding method for reducing transmission rate on the radio path. The transcoder (TRCU1, TRCU2) comprises a speech coder (52, 73) which encodes the speech signal into speech parameters for transmission to a mobile station, and decodes the speech parameters received from the mobile station into a speech signal according to said speech coding method, as well as a PCM coder (54, 72) for transmitting an uplink speech signal to and for receiving a downlink speech signal from a PCM interface in the form of PCM speech samples. In addition to the normal operation, the transcoder transmits and receives speech parameters through a PCM interface in a subchannel formed by least significant bits of the PCM speech samples. Thus, it is possible to prevent tandem coding but at the same time maintain the standard PCM interface and the signallings and services associated thereto.

61 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108