Showing papers on "Speech coding published in 1995"

PDF

Open Access

Journal Article•DOI•

Speech recognition in noisy environments: a survey

[...]

Yifan Gong¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Apr 1995-Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

712 citations

Book•

Speech Coding and Synthesis

[...]

Willem Bastiaan Kleijn, Kuldip K. Paliwal

01 Nov 1995

TL;DR: An introduction to speech coding, W.B. Kleijn evaluation of speech coders, and a robust algorithm for pitch tracking (RAPT), D. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis.

...read moreread less

Abstract: An introduction to speech coding, W.B. Kleijn and K.K. Paliwal speech coding standards, R.V. Cox linear-prediction based analysis-by-synthesis coding, P. Kroon and W.B. Kleijn sinusoidal coding, R.J. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis, W.B. Kleijn and J. Haagen low-delay coding of speech, J.-H. Chen multimode and variable-rate coding of speech, A. Das et al wideband speech coding, J.-P. Adoul and R. Lefebvre vector quantization for speech transmission, P. Hedelin et al theory for transmission of vector quantization data, P. Hedelin et al waveform coding and auditory masking, R. Veldhuis and A. Kohlrausch quantization of LPC parameters, K.K. Paliwal and W.B. Kleijn evaluation of speech coders, P. Kroon a robust algorithm for pitch tracking (RAPT), D. Talkin time-domain and frequency-domain techniques for prosodic modification of speech, E. Moulines and W. Verhelst nonlinear processing of speech, G. Kubin an approach to text-to-speech synthesis, R. Sproat and J. Olive the generation of prosodic structure and intonation in speech synthesis, J. Terken and R. Collier computation of timing in text-to-speech synthesis, J.P.H. van Santen objective optimization in algorithms for text-to-speech synthesis, Y. Sagisaka and N. Iwahashi quality evaluation of synthesized speech, V.J. van Heuven and R. van Bezooijen.

...read moreread less

621 citations

Patent•

Apparatus and methods for including codes in audio signals and decoding

[...]

James M. Jensen, Lynch Wendell D, Perelshteyn Michael M, Robert B Graybill, Hassan Sayed, Sabin Wayne - Show less +2 more

27 Mar 1995

TL;DR: A code frequency component in the encoded audio signal is detected based on an expected code amplitude or on a noise amplitude within a range of audio frequencies including the frequency of the code component as discussed by the authors.

...read moreread less

Abstract: Apparatus and methods for including a code (68) having at least one code frequency component in an audio signal (60) are provided. The abilities of various frequency components in the audio signal to mask the code frequency component to human hearing are evaluated (64), and based on these evaluations an amplitude (76) is assigned to the code frequency component. Methods and apparatus for detecting a code in an encoded audio signal are also provided. A code frequency component in the encoded audio signal is detected based on an expected code amplitude or on a noise amplitude within a range of audio frequencies including the frequency of the code component.

...read moreread less

554 citations

Journal Article•DOI•

Source-controlled channel decoding

[...]

Joachim Hagenauer

01 Sep 1995-IEEE Transactions on Communications

TL;DR: A modification of the Viterbi decoding algorithm (VA) for binary trellises which uses a priori or a posteriori information about the source bit probability for better decoding in addition to soft inputs and channel state information is proposed.

...read moreread less

Abstract: Source and channel coding have been treated separately in most cases. It can be observed that most source coding algorithms for voice, audio and images still have correlation in certain bits. Transmission errors in these bits usually account for the significant errors in the reconstructed source signal. This paper proposes a modification of the Viterbi decoding algorithm (VA) for binary trellises which uses a priori or a posteriori information about the source bit probability for better decoding in addition to soft inputs and channel state information. Analytical upper bounds for the BER of convolutional codes for this modified VA (APRI-VA) are given. The algorithm is combined with the soft output viterbi algorithm (SOVA) and an estimator for the residual correlation of the source bits to achieve source-controlled channel decoding for framed source bits. The description is simplified by an algebra for the log-likelihood ratio L(u)=log(P(u=+1)/P(u=-1)) which allows a clear definition of the "soft" values of source-, channel-, and decoded bits as well as a simplified description of the traceback version of the SOVA. Applications are given for PCM transmission and the full rate GSM speech codec. For an PCM coded oversampled bandlimited Gaussian source transmitted over Gaussian and Rayleigh channels with convolutional codes the decoding errors are reduced by a factor of 4 to 5 when the APRI-SOVA is used instead of the VA. A simple dynamic Markov correlation estimator is used. With these receiver-only modifications the channel SNR in a bad mobile environment can be lowered by 2 to 4 dB resulting in the same voice quality. Further applications are briefly discussed. >

...read moreread less

476 citations

Book•

Digital Speech: Coding for Low Bit Rate Communication Systems

[...]

A. Kindoz, Ahmet M. Kondoz

01 Feb 1995

TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

453 citations

Journal Article•DOI•

Adaptive postfiltering for quality enhancement of coded speech

[...]

Juin-Hwey Chen¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

01 Jan 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: This paper presents a complete description of the original postfiltering algorithm and the underlying ideas that motivated its development, and achieves noticeable noise reduction while introducing only minimal distortion in speech.

...read moreread less

Abstract: An adaptive postfiltering algorithm for enhancing the perceptual quality of coded speech is presented. The postfilter consists of a long-term postfilter section in cascade with a short-term postfilter section and includes spectral tilt compensation and automatic gain control. The long-term section emphasizes pitch harmonics and attenuates the spectral valleys between pitch harmonics. The short-term section, on the other hand, emphasizes speech formants and attenuates the spectral valleys between formants. Both filter sections have poles and zeros. Unlike earlier postfilters that often introduced a substantial amount of muffling to the output speech, our postfilter significantly reduces this effect by minimizing the spectral tilt in its frequency response. As a result, this postfilter achieves noticeable noise reduction while introducing only minimal distortion in speech. The complexity of the postfilter is quite low. Variations of this postfilter are now being used in several national and international speech coding standards. This paper presents for the first time a complete description of our original postfiltering algorithm and the underlying ideas that motivated its development. >

...read moreread less

278 citations

Patent•DOI•

CS-ACELP speech compression system with adaptive pitch prediction filter gain based on a measure of periodicity

[...]

Peter Kroon¹•Institutions (1)

Alcatel-Lucent¹

07 Jun 1995-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech coding system employing an adaptive codebook model of periodicity is augmented with a pitch-predictive filter (PPF), which has a delay equal to the integer component of the pitch-period and a gain which is adaptive based on a measure of the periodicity of the speech signal.

...read moreread less

Abstract: A speech coding system employing an adaptive codebook model of periodicity is augmented with a pitch-predictive filter (PPF). This PPF has a delay equal to the integer component of the pitch-period and a gain which is adaptive based on a measure of periodicity of the speech signal. In accordance with an embodiment of the present invention, speech processing systems which include a first portion comprising an adaptive codebook and corresponding adaptive codebook amplifier and a second portion comprising a fixed codebook coupled to a pitch filter, are adapted to delay the adaptive codebook gain; determine the pitch filter gain based on the delayed adaptive codebook gain, and amplify samples of a signal in the pitch filter based on said determined pitch filter gain. The adaptive codebook gain is delayed for one subframe. The pitch filter gain equals the delayed. adaptive codebook gain, except when the adaptive codebook gain is either less than 0.2 or greater than 0.8., in which cases the pitch filter gain is set equal to 0.2 or 0.8, respectively.

...read moreread less

271 citations

Patent•DOI•

Method and apparatus for using visual images to mix sound

[...]

David A. Gibson

18 Apr 1995-Journal of the Acoustical Society of America

TL;DR: In this paper, each audio signal is digitized and then transformed into a predefined visual image, which is displayed in a 3D space, and selected audio characteristics, such as frequency, amplitude, time and spatial placement, are correlated to selected visual characteristics of the visual image.

...read moreread less

Abstract: A method and apparatus for mixing audio signals. Each audio signal is digitized and then transformed into a predefined visual image, which is displayed in a three-dimensional space. Selected audio characteristics of the audio signal, such as frequency, amplitude, time and spatial placement, are correlated to selected visual characteristics of the visual image, such as size, location, texture, density and color. Dynamic changes or adjustment to any one of these parameters causes a corresponding change in the correlated parameter.

...read moreread less

218 citations

Journal Article•DOI•

Theoretical analysis of the high-rate vector quantization of LPC parameters

[...]

Gardner William R¹, Bhaskar D. Rao•Institutions (1)

Qualcomm¹

01 Sep 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.

...read moreread less

Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

...read moreread less

182 citations

Patent•

Apparatus and methods for including codes in audio signals

[...]

James M. Jensen¹, Lynch Wendell D¹, Perelshteyn Michael M¹, Robert B Graybill, Hassan Sayed, Sabin Wayne - Show less +2 more•Institutions (1)

Nielsen Holdings N.V.¹

27 Mar 1995

...read moreread less

Abstract: Apparatus and methods for including a code having at least one code frequency component in an audio signal are provided. The abilities of various frequency components in the audio signal to mask the code frequency component to human hearing are evaluated and based on these evaluations an amplitude is assigned to the code frequency component. Methods and apparatus for detecting a code in an encoded audio signal are also provided. A code frequency component in the encoded audio signal is detected based on an expected code amplitude or on a noise amplitude within a range of audio frequencies including the frequency of the code component.

...read moreread less

179 citations

Patent•DOI•

Speech coding system and method using voicing probability determination

[...]

Suat Yeldener¹, Joseph Gerard Aguilar¹•Institutions (1)

Princeton University¹

13 Sep 1995-Journal of the Acoustical Society of America

TL;DR: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination and the use of the system in the generation of a variety of voice effects.

...read moreread less

Abstract: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced portion of the signal spectrum, as determined by the parameter Pv, is encoded using a set of harmonically related amplitudes corresponding to the estimated pitch. The unvoiced portion of the signal is processed in a separate processing branch which uses a modified linear predictive coding algorithm. Parameters representing both the voiced and the unvoiced portions of a speech segment are combined in data packets for transmission. In the decoder, speech is synthesized from the transmitted parameters representing voiced and unvoiced portions of the speech in a reverse order. Boundary conditions between voiced and unvoiced segments are established to ensure amplitude and phase continuity for improved output speech quality. Perceptually smooth transition between frames is ensured by using an overlap and add method of synthesis. Also disclosed is the use of the system in the generation of a variety of voice effects.

...read moreread less

Patent•DOI•

Linear Prediction Coefficient Generation During Frame Erasure or Packet Loss

[...]

Juin-Hwey Chen¹, Craig Robert Watkins¹•Institutions (1)

Alcatel-Lucent¹

08 Mar 1995-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech coding system robust to frame erasure (or packet loss) is described, where vectors of an excitation signal are synthesized based on previously stored excitation signals generated during non-erased frames.

...read moreread less

Abstract: A speech coding system robust to frame erasure (or packet loss) is described. Illustrative embodiments are directed to a modified version of CCITT standard G.728. In the event of frame erasure, vectors of an excitation signal are synthesized based on previously stored excitation signal vectors generated during non-erased frames. This synthesis differs for voiced and non-voiced speech. During erased frames, linear prediction filter coefficients are synthesized as a weighted extrapolation of a set of linear prediction filter coefficients determined during non-erased frames. The weighting factor is a number less than 1. This weighting accomplishes a bandwidth-expansion of peaks in the frequency response of a linear predictive filter. Computational complexity during erased frames is reduced through the elimination of certain computations needed during non-erased frames only. This reduction in computational complexity offsets additional computation required for excitation signal synthesis and linear prediction filter coefficient generation during erased frames.

...read moreread less

Patent•DOI•

Excitation signal synthesis during frame erasure or packet loss

[...]

Chen Juin-Hwey¹•Institutions (1)

Alcatel-Lucent¹

28 Feb 1995-Journal of the Acoustical Society of America

...read moreread less

Proceedings Article•DOI•

On the effects of speech rate in large vocabulary speech recognition systems

[...]

Matthew A. Siegler¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

09 May 1995

TL;DR: It is suggested that phone rate is a more meaningful measure of speech rate than the more common word rate, and it is found that when data sets are clustered according to the phone rate metric, recognition errors increase when thePhone rate is more than 1 standard deviation greater than the mean.

...read moreread less

Abstract: It is well known that a higher-than-normal speech rate will cause the rate of recognition errors in large vocabulary automatic speech recognition (ASR) systems to increase. In this paper we attempt to identify and correct for errors due to fast speech. We first suggest that phone rate is a more meaningful measure of speech rate than the more common word rate. We find that when data sets are clustered according to the phone rate metric, recognition errors increase when the phone rate is more than 1 standard deviation greater than the mean. We propose three methods to improve the recognition accuracy of fast speech, each addressing different aspects of performance degradation. The first method is an implementation of Baum-Welch codebook adaptation. The second method is based on the adaptation of HMM state-transition probabilities. In the third method, the pronunciation dictionaries are modified using rule-based techniques and compound words are added. We compare improvements in recognition accuracy for each method using data sets clustered according to the phone rate metric. Adaptation of the HMM state-transition probabilities to fast speech improves recognition of fast speech by a relative amount of 4 to 6 percent.

...read moreread less

Patent•DOI•

Apparatuses and methods for developing and using models for speech recognition

[...]

Laurence S. Gillick, Francesco Scattone

23 Jan 1995-Journal of the Acoustical Society of America

TL;DR: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned againstspeech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group.

...read moreread less

Abstract: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned against speech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group. A decision tree classifies speech sounds into such groups, and related speech sound groups descend from common tree nodes. New speech samples time aligned against a given speech sound group's model update models of related speech sound groups, decreasing the training data required to adapt the system. The phonetic context classifications can be based on knowledge of which contextual features are associated with acoustic similarity. The computerized system samples speech sounds using a first, larger, parameter set; automatically selects combinations of phonetic context classifications which divide the speech sounds into groups whose frames are acoustically similar, such as by use of a decision tree; selects a second, smaller, set of parameters based on that set's ability to separate the frames aligned with each speech sound group, such as by used of linear discriminant analysis; and then uses these new parameters to represent frames and speech sound models. Then, using the new parameters, a decision tree classifier can be used to re-classify the speech sounds and to calculate new acoustic models for the resulting groups of speech sounds.

...read moreread less

Proceedings Article•DOI•

A speech coder based on decomposition of characteristic waveforms

[...]

Willem Bastiaan Kleijn¹, J. Haagen¹•Institutions (1)

Bell Labs¹

09 May 1995

TL;DR: A 2.4 kb/s coder using waveform interpolation principles to represent the speech signal as an evolving characteristic waveform (CW) and a significant increase in coding efficiency is obtained by coding these two components separately.

...read moreread less

Abstract: For low-rate speech coding it is advantageous to represent the speech signal as an evolving characteristic waveform (CW). The CW evolves slowly when the speech signal is clearly voiced and rapidly when the speech signal is clearly unvoiced. The voiced (periodic) and unvoiced (nonperiodic) components of the speech signal can be separated by a simple nonadaptive filter in the CW domain. Because of perceptual effects, a significant increase in coding efficiency is obtained by coding these two components separately. A 2.4 kb/s coder using these principles was developed. In an independent evaluation, the performance of the 2.4 kb/s waveform interpolation (WI) coder was found to be at least equivalent to the 4.8 kb/s FS1016 standard for all of the many tests.

...read moreread less

Patent•

System and method for scaleable streamed audio transmission over a network

[...]

Philippe Ferriere¹•Institutions (1)

Microsoft¹

11 Oct 1995

TL;DR: In this article, an audio data transmission system uses computing units which are designed to select an appropriate combination of block size and input sampling rate to maximize the available bandwidth of the receiving modem.

...read moreread less

Abstract: An audio data transmission system encodes audio files into individual audio data blocks which contain a variable number bits of digital audio data that were sampled at a selectable sample rate. The number of bits of digital data and the input sampling rate are scaleable to produce an encoded bit stream bit rate that is less than or equal to an effective operational bit rate of a recipient's modem. The audio data transmission system uses computing units which are designed to select an appropriate combination of block size and input sampling rate to maximize the available bandwidth of the receiving modem. For example, if the modem connection speed for one modem is 14.4 kbps, a version of the audio data compressed at 13000 bits/s might be sent to the recipient; if the modem connection speed for another modem is 28.8 kbps, a version of the audio data compressed at 24255 bits/s might be sent to the receiver. The audio data blocks are then transmitted at the encoded bit stream bit rate to the intended recipient's modem. The audio data blocks are decoded at the recipient to reconstruct the audio file and immediately play the audio file as it is received. The audio data transmission system can be implemented in online service systems, ITV systems, computer data network systems, and communication systems.

...read moreread less

Journal Article•DOI•

DPCM system design for diversity systems with applications to packetized speech

[...]

A. Ingle¹, Vinay A. Vaishampayan¹•Institutions (1)

Texas A&M University¹

01 Jan 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: This work shows that significant improvements in performance are obtained as compared to an earlier system proposed by Jayant and Christensen (1981) for packetized speech systems and shows that for a first-order Gauss-Markov source significant performance improvements can be obtained by using a second-order predictor instead of a first -order predictor.

...read moreread less

Abstract: Speech quality in packetized speech systems can degrade substantially when packets are lost. We consider the problem of DPCM system design for packetized speech systems. The problem is formulated as a multiple description problem and the problem of optimal selection of the encoder and decoder filters is addressed. We show that significant improvements in performance are obtained as compared to an earlier system proposed by Jayant and Christensen (1981). Further, we show that for a first-order Gauss-Markov source significant performance improvements can be obtained by using a second-order predictor instead of a first-order predictor. >

...read moreread less

Patent•

System and method for providing described television services

[...]

C. Eric Kirkland

02 Mar 1995

TL;DR: In this paper, a method for providing described television services includes the steps of generating description data corresponding to an audiovisual program, converting the description data to a speech signal corresponding to the description signals, synchronizing the speech signal with the audi-cation program using a time code signal from the audio-coding program, and mixing the synchronized speech signals with the audio track of the audiovi cation program to create a combined audio signal.

...read moreread less

Abstract: An apparatus for providing described television services includes a receiver for receiving description data corresponding to an audiovisual program; a text-to-speech converter for converting the description data into a speech signal corresponding to the description data; a memory device for receiving and storing the speech signal and a corresponding time code from the audiovisual program; a mixing circuit for retrieving the speech signal from the memory device and mixing the retrieved speech signal with the audio track of the audiovisual program to produce a combined audio signal; and a transmitter for simultaneously providing the combined speech signal and the audiovisual program to a viewer. The apparatus provides the combined speech signal to the viewer via the SAP channel. The apparatus may also include a translator for translating the description data into a foreign language prior to converting the description data into the speech signal. A method for providing described television services includes the steps of generating description data corresponding to an audiovisual program; converting the description data to a speech signal corresponding to the description data; synchronizing the speech signal with the audiovisual program using a time code signal from the audiovisual program; mixing the synchronized speech signal with the audio track of the audiovisual program to create a combined audio signal; and simultaneously transmitting the combined audio signal and the audiovisual program to the viewer.

...read moreread less

Patent•DOI•

Decomposition in noise and periodic signal waveforms in waveform interpolation

[...]

Willem Bastiaan Kleijn¹•Institutions (1)

AT&T¹

02 Feb 1995-Journal of the Acoustical Society of America

TL;DR: In this article, a plurality of sets of indexed parameters are generated based on samples of the speech signal, each set corresponds to a waveform characterizing the speech signals at a discrete point in time.

...read moreread less

Abstract: A method of coding a speech signal is described. In accordance with the method, a plurality of sets of indexed parameters are generated based on samples of the speech signal. Each set of indexed parameters corresponds to a waveform characterizing the speech signal at a discrete point in time. Parameters of the plurality of sets are grouped based on index value to form a first set of signals which represents the evolution of characterizing waveform shape; the signals of the first set are filtered to remove low frequency components and thereby produce a second set of signals which represents relatively high rates of evolution of characterizing waveform shape. The speech signal is then coded based on the second set of signals representing high rates of characterizing waveform shape evolution. Coding of the speech signal may further be based on a set of smoothed first signals.

...read moreread less

Patent•

System for tdma mobile-to-mobile vselp codec bypass

[...]

Hermon Pon¹, Rafi Rabipour¹, Chung Cheung Chu¹•Institutions (1)

Nortel¹

13 Dec 1995

TL;DR: In this paper, the authors proposed a TDMA mobile-to-mobile (M2M) communication protocol where the two digital signal processors are virtually connected at the channel codecs.

...read moreread less

Abstract: In a TDMA mobile-to-mobile connection, the end-to-end audio signal quality as well as system performance can be improved by providing digital signal processors the capability to automatically switch configuration such that each digital signal processor in a mobile-to-mobile communication connection can automatically identify a TDMA mobile-to-mobile connection and bypass the speech encoding and decoding processes within the digital signal processors. The two digital signal processors are virtually connected at the channel codecs.

...read moreread less

Patent•

Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus

[...]

Toshiyuki Morii¹•Institutions (1)

Panasonic¹

27 Nov 1995

TL;DR: In this article, a speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sampled characteristic parameters in each of a plurality of coding modules.

...read moreread less

Abstract: A sample speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sample characteristic parameters in each of a plurality of coding modules. The sample characteristic parameters and the coding distortions are statistically processed by a statistical processing unit to obtain a coding module selecting rule. Thereafter, when a speech is analyzed by the speech analyzing unit to obtain characteristic parameters, an appropriate coding module is selected by a coding module selecting unit from the coding modules according to the coding module selecting rule on condition that a coding distortion for the characteristic parameters is minimized in the appropriate coding module. Thereafter, the characteristic parameters of the speech are coded in the appropriate coding module, and a coded speech is obtained. When the coded speech is decoded, a reproduced speech is obtained. Accordingly, because an appropriate coding module can be easily selected from a plurality of coding modules according to the coding module selecting rule, any allophone occurring in a reproduced speech can be prevented at a low calculation volume.

...read moreread less

Patent•

Linear prediction coefficient generation during frame erasure or packet loss.

[...]

Peter Kroon¹•Institutions (1)

AT&T¹

28 Feb 1995

TL;DR: In this article, a speech coding system robust to frame erasure (or packet loss) is described, where vectors of an excitation signal are synthesized based on previously stored excitation signals generated during non-erased frames.

...read moreread less

Journal Article•DOI•

Design and implementation of AC-3 coders

[...]

S. Vernon¹•Institutions (1)

Dolby Laboratories¹

01 Aug 1995

TL;DR: The design and implementation of AC-3 coders are described, focusing on issues relevant to minimum cost solutions, and an overview of encoding and decoding strategies is presented.

...read moreread less

Abstract: AC-3 is the perceptual coding technology used for HDTV audio compression. This paper describes the design and implementation of AC-3 coders, focusing on issues relevant to minimum cost solutions. AC-3 coding technology has been adopted by the Advanced Television Systems Committee (ATSC) as the audio service standard for high definition television (HDTV) in the United States. The AC-3 audio data compression system is described, and an overview of encoding and decoding strategies is presented. >

...read moreread less

Journal Article•DOI•

Digital audio coding for visual communications

[...]

P. Noll

01 Jun 1995

TL;DR: Basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications are explained and it will become obvious that the use of the knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago.

...read moreread less

Abstract: Current and future visual communications for applications such as broadcasting videotelephony, video- and audiographic-conferencing, and interactive multimedia services assume a substantial audio component. Even text, graphics, fax, still images, email documents, etc. will gain from voice annotation and audio clips. A wide range of speech, wideband speech, and wideband audio coders is available for such applications. In the context of audiovisual communications, the quality of telephone-bandwidth speech is acceptable for some videotelephony and videoconferencing services. Higher bandwidths (wideband speech) may be necessary to improve the intelligibility and naturalness of speech. High quality audio coding including multichannel audio will be necessary in advanced digital TV and multimedia services. This paper explains basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications. These signal classes differ in bandwidth, dynamic range, and in listener expectation of offered quality. It will become obvious that the use of our knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago. The paper concentrates on worldwide source coding standards beneficial for consumers, service providers, and manufacturers. >

...read moreread less

Proceedings Article•DOI•

MPEG audio layer II. A generic coding standard for two and multichannel sound for DVB, DAB and computer multimedia

[...]

G. Stoll

14 Sep 1995

TL;DR: The first phase of the development of high quality audio coding for widespread use in broadcasting, telecommunication, computer and consumer applications has been finished with ISO/IEC 11172-3, but the finalisation of MPEG-1 is not the end of standardisation of highquality audio coding systems.

...read moreread less

Abstract: The first phase of the development of high quality audio coding for widespread use in broadcasting, telecommunication, computer and consumer applications has been finished with ISO/IEC 11172-3, but the finalisation of MPEG-1 is not the end of standardisation of high quality audio coding systems. MPEG-2 Audio multichannel coding system ensuring forward and backward compatibility with ISO/IEC 11172-3 encoded audio signals is designed for universal applications with and without accompanying picture. Envisaged applications beside DAB are digital television systems, digital video tape recorders and interactive storage media. Configurability with respect to the sound channel allocation and to the bit-rate offers useful combinations of various levels of multi-channel stereo performance and various numbers of channels in the composite and independent coding mode.

...read moreread less

Proceedings Article•

Audio-visual speech recognition compared across two architectures.

[...]

Ali Adjoudani, Christian Benoît

01 Jan 1995

Patent•

Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders

[...]

Willem Bastiaan Kleijn¹•Institutions (1)

Alcatel-Lucent¹

04 Apr 1995

TL;DR: In this paper, the authors exploit the synergy between operations performed by a speech rate modification system and those operations performed in a speech coding system to provide a speech-rate modification system with reduced hardware requirements.

...read moreread less

Abstract: Synergy between operations performed by a speech-rate modification system and those operations performed in a speech coding system is exploited to provide a speech-rate modification system with reduced hardware requirements. The speech rate of an input signal is modified based on a signal representing a predetermined change in speech rate. The modified speech-rate signal is then filtered to generate a speech signal having increased short-term correlation. Modification of the input speech signal may be performed by inserting in the input speech signal a previous sequence of samples corresponding substantially to a pitch cycle. Alternatively, the input speech signal may be modified by removing from the input speech signal a sequence of samples corresponding substantially to a pitch cycle.

...read moreread less

Journal Article•DOI•

Lip synchronization using speech-assisted video processing

[...]

T. Chen¹, H.P. Graf¹, Kuansan Wang²•Institutions (2)

Bell Labs¹, AT&T²

01 Apr 1995-IEEE Signal Processing Letters

TL;DR: The marriage of speech analysis and image processing can solve problems related to lip synchronization and speech information is utilized to improve the quality of audio-visual communications such as videotelephony and videoconferencing.

...read moreread less

Abstract: We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined. >

...read moreread less

Proceedings Article•DOI•

Description Of The Proposed ITU-T 8 Kb/S Speech Coding Standard

[...]

R. Salami¹, C. Laftamme, J.-P. Adoul, A. Kataoba, S. Hayashi, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham - Show less +6 more•Institutions (1)

Université de Sherbrooke¹

20 Sep 1995

Collapse