Topic
Speech coding
About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.
Papers published on a yearly basis
Papers
More filters
•
25 Feb 2002TL;DR: In this paper, a method for time aligning audio signal, wherein one signal has been derived from the other or both have been derived separately from another signal, comprises deriving reduced-information characterizations of the audio signals, auditory scene analysis.
Abstract: A method for time aligning audio signal, wherein one signal has been derived from the other or both have been derived from another signal, comprises deriving reduced-information characterizations of the audio signals, auditory scene analysis. The time offset of one characterization with respect to the other characterization is calculated and the temporal relationship of the audio signals with respect to each other is modified in response to the time offset such that the audio signals are coicident with each other. These principles may also be applied to a method for time aligning a video signal and an audio signal that will be subjected to differential time offsets.
122 citations
•
13 Mar 1998
TL;DR: In this paper, a method for coding or de-coding an audio signal combining the advantages of TNS processing and noise substitution was proposed, where a time discrete audio signal is initially transformed in a frequency range in order to obtain spectral value of the temporal audio signal.
Abstract: The invention relates to a method for coding or de-coding an audio signal combining the advantages of TNS processing and noise substitution. A time discrete audio signal is initially transformed in a frequency range in order to obtain spectral value of the temporal audio signal. A prediction of the spectral values in relation to frequency is subsequently made in order to enable spectral residual values. Areas within the spectral values encompassing spectral values with noise properties are detected . The spectral residual values are noise substituted in the noise areas, whereupon data relating to the noise areas and noise substitution are incorporated into side information pertaining to a coded audio signal.
122 citations
••
TL;DR: In this paper, a series of original sentences for messages is segmented and stored as audio files with search criteria, and the audio files of the segments in which the examination resulted in the prerequisites for optimal maintaining of the natural speech rhythm are combined and output for reproduction.
Abstract: A method of composing messages for speech output and the improvement of the quality of reproduction of speech outputs. A series of original sentences for messages is segmented and stored as audio files with search criteria. The length, position, and transition values for the respective segments can be recorded and stored. A sentence to be reproduced is transmitted in a format corresponding to the format of the search criteria. It is determined whether the sentence to be reproduced can be fully reproduced by one segment or a succession of stored segments. The segments found in each case are examined using their entries as to how far the individual segments match as regards speech rhythm. The audio files of the segments in which the examination resulted in the pre-requisites for optimal maintaining of the natural speech rhythm are combined and output for reproduction.
122 citations
•
05 Aug 1982TL;DR: In this article, the authors proposed a means for simultaneous transmission of data and speech with only a minimal expansion of the bandwidth of the speech signal, where a Fourier transform is performed on the speech signals and a predetermined number of phase components are replaced with data (d(n)) in an appropriate form.
Abstract: The present invention relates to a means for achieving simultaneous transmission of data and speech with only a minimal expansion of the bandwidth of the speech signal. A Fourier transform (14) is performed on the speech signal and a predetermined number of phase components are replaced with data (d(n)) in an appropriate form. The number of phase components replaced with data is determined by approximately classifying the speech (16) as either "silence", no data inserted; "unvoiced" speech, M phase components convey data; and "voiced" speech, J phase components convey data; where J is less than M, and M is not greater than the number of phase components in the message band of the speech signal. An inverse Fourier transform (22) is subsequently performed on the combined data and speech signal. The combined message signal (G(t)) will comprise approximately the same bandwidth as the original speech signal, by virtue of the frequency domain insertion of the data into the speech. At the receiver the signal is inspected and a classifier (38) determines if data is embedded in the received signal. If data is deemed embedded, a Fourier transformation is performed, the data carrying phase components are inspected, and the data signal regenerated in an appropriate form. The phase components used for the conveyance of data are replaced by random phase components, and the inverse Fourier transformation performed. Median filtering is employed to mitigate the effects of end-of-block distortion and yield the recovered speech signal.
121 citations
••
07 Apr 1986TL;DR: The results indicate the importance of detailed modeling of the period of glottal closure for accurate analysis and describe a method for simultaneously estimating theglottal source and vocal: tract parameters.
Abstract: Speech analysis for high quality speech synthesis or high accuracy speech recognition requires realistic models not only for the vocal tract but also for the voice source. In the present paper, we investigate models for the glottal volume velocity waveform. Previously proposed models are reviewed and classified according to their level of elaboration in expressing the glottal characteristics. A new model is then proposed which possesses all the important features of previously proposed models. A method is also described for simultaneously estimating the glottal source and vocal: tract parameters. Using this method, evaluation of glottal model parameters is carried out on real speech by varying the number of parameters in the proposed model. The results indicate the importance of detailed modeling of the period of glottal closure for accurate analysis.
121 citations