Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Image data compression by predictive coding I: prediction algorithms

[...]

Hisashi Kobayashi¹, L. R. Bahl¹•Institutions (1)

IBM¹

01 Mar 1974-Ibm Journal of Research and Development

TL;DR: Predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images and techniques for encoding the prediction error pattern to achieve compression of data are presented.

...read moreread less

Abstract: This paper deals with predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images. Part I discusses algorithms for prediction. A predictor transforms the two-dimensional dependence in the original data into a form which can be handled by coding techniques for one-dimensional data. The implementation and performance of a fixed predictor, an adaptive predictor with finite memory, and an adaptive linear predictor are discussed. Results of experiments performed on various types of scanned images are also presented. Part II deals with techniques for encoding the prediction error pattern to achieve compression of data.

...read moreread less

61 citations

Patent•DOI•

Speech coding and decoding methods using adaptive and random code books

[...]

Satoshi Miki¹, Takehiro Moriya¹, Kazunori Mano¹, Hitoshi Ohmuro¹, Hirohito Suda¹ - Show less +1 more•Institutions (1)

Nippon Telegraph and Telephone¹

20 May 1992-Journal of the Acoustical Society of America

TL;DR: An excitation vector of the previous frame stored in an adaptive codebook is cut out with a selected pitch period and is repeated until one frame is formed, by which a periodic component codevector is generated.

...read moreread less

Abstract: An excitation vector of the previous frame stored in an adaptive codebook is cut out with a selected pitch period. The excitation vector thus cut out is repeated until one frame is formed, by which a periodic component codevector is generated. An optimum pitch period is searched for so that distortion of a reconstructed speech obtained by exciting a linear predictive synthesis filter with the periodic component codevector is minimized. Thereafter, a random codevector selected from a random codebook is cut out with the optimum pitch period and is repeated until one frame is formed, by which a repetitious random codevector is generated. The random codebook is searched for a random codevector which minimizes the distortion of the reconstructed speech which is provided by exciting the synthesis filter with the repetitious random codevector.

...read moreread less

61 citations

Dissertation•

Perception, Analysis and Synthesis of Speaker Age

[...]

Susanne Schötz

01 Jan 2006

TL;DR: In this article, the role of F0 and speech rate (word duration) in age perception was investigated and it was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age.

...read moreread less

Abstract: Speaker age is an important paralinguistic feature in speech which has to be considered in the study of phonetic variation. Knowledge about this feature may be used to improve speech technology applications, e.g. automatic speech recognition and speech synthesis. The present thesis describes six studies of several phonetic aspects of age-related variation in speech. As the speech production mechanism changes from young adulthood to old age, speech is affected in numerous ways. Human perception of speaker age is based on cues such as pitch, speech rate and voice quality, and is fairly accurate. However, it is still unclear which cues are the most important ones. The first study included in this thesis investigated the role of F0 and speech rate (word duration) in age perception. It was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age. In the second study, two stimulus types of various lengths were compared. Results indicated that while longer stimulus duration (regardless of speech type) seems to improve the age estimation of females, spontaneous speech (regardless of duration) appears to contain more important cues for perception of male speaker age. In the next two studies, several automatic estimators of speaker age were built, none of which reached the same accuracy as humans. Important features in machine perception of age were also investigated. It was found that prosodic features seem to be more important in the estimation of female age, while spectral features (e.g. F2 ) appear to be more important for male age. Although several acoustic correlates of speaker age are known, their relative importance has not yet been established. The next study analysed 161 features, automatically extracted from segments in six words produced by 527 speakers. Normalised means were used to ensure that the features could be compared directly. The most important acoustic correlates of speaker age were identified to be speech rate (segment duration) and intensity range. However, F0 and some spectral measures (e.g. F1 and F2 ) may also, if used in combination with other features, be important correlates of age. Synthetic speech may sound more natural if speaker age is included as a parameter. The final study developed a research tool which used data- driven formant synthesis and age-weighted linear interpolation to simulate an age between the ages of any two of four female differently aged reference speakers. Evaluation of the tool showed that speaker age may in fact be simulated using formant synthesis. The tool will be used in further studies of analysis by synthesis of speaker age.

...read moreread less

61 citations

Proceedings Article•DOI•

Adaptive recovery techniques for real-time audio streams

[...]

Wen-Tsai Liao¹, Jeng-Chun Chen², Ming-Syan Chen•Institutions (2)

National Taiwan University¹, Philips²

22 Apr 2001

TL;DR: This work develops a recovery method, called DSPWR (Double Sided Pitch Waveform Replication), which is able to tolerate a much higher packet loss rate and develops an adaptive mechanism that can select the recovery method with the minimal complexity in accordance with different packet loss rates encountered.

...read moreread less

Abstract: There are a number of packet-loss recovery techniques proposed for streaming audio applications. However, there are few works that are able to exploit the tradeoff between the recovery quality and the computational complexity. We develop a recovery method, called DSPWR (Double Sided Pitch Waveform Replication) which is able to tolerate a much higher packet loss rate. In essence, DSPWR is composed of several procedures devised to improve the quality of the reconstructed speech. It is noted that a more sophisticated recovery scheme that can tolerate a higher degree of packet loss in general requires a larger computational cost. In view of this, we evaluate the quality of the reconstructed speech under different packet loss rates for various receiver-based recovery methods, and compare the computational complexity among these methods. Under the acceptable speech quality whose MOS (Mean Opinion Score) is above 3.5, we develop an adaptive mechanism that can select the recovery method with the minimal complexity in accordance with different packet loss rates encountered. To conduct real experiments in the networks, we implement these recovery methods and evaluate the performance of DSPWR devised and the adaptive recovery techniques empirically. As validated by our experimental results, the adaptive mechanism is able to strike a compromise between the computational overhead and the quality of the speech desired.

...read moreread less

61 citations

Patent•DOI•

Transcoder with prevention of tandem coding of speech

[...]

Matti Lehtimäki

11 Apr 1996-Journal of the Acoustical Society of America

TL;DR: In this article, a transcoder (TRCU1, TRCU2) was proposed for preventing tandem coding of speech in a mobile to mobile (MS1, MS2) call within a mobile communication system which employs a speech coding method for reducing transmission rate on the radio path.

...read moreread less

Abstract: The invention relates to a transcoder (TRCU1, TRCU2) having means for preventing tandem coding of speech in a mobile to mobile (MS1, MS2) call within a mobile communication system which employs a speech coding method for reducing transmission rate on the radio path. The transcoder (TRCU1, TRCU2) comprises a speech coder (52, 73) which encodes the speech signal into speech parameters for transmission to a mobile station, and decodes the speech parameters received from the mobile station into a speech signal according to said speech coding method, as well as a PCM coder (54, 72) for transmitting an uplink speech signal to and for receiving a downlink speech signal from a PCM interface in the form of PCM speech samples. In addition to the normal operation, the transcoder transmits and receives speech parameters through a PCM interface in a subchannel formed by least significant bits of the PCM speech samples. Thus, it is possible to prevent tandem coding but at the same time maintain the standard PCM interface and the signallings and services associated thereto.

...read moreread less

61 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics