scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Efficient coding of LPC parameters by temporal decomposition

Bishnu S. Atal1
14 Apr 1983-Vol. 8, pp 81-84
TL;DR: The aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality.
Abstract: This paper describes a method for efficient coding of LPC log area parameters. It is now well recognized that sample-by-sample quantization of LPC parameters is not very efficient in minimizing the bit rate needed to code these parameters. Recent methods for reducing the bit rate have used vector and segment quantization methods. Much of the past work in this area has focussed on efficient coding of LPC parameters in the context of vocoders which put a ceiling on achievable speech quality. The results from these studies cannot be directly applied to synthesis of high quality speech. This paper describes a different approach to efficient coding of log area parameters. Our aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality. Speech events occur generally at non-uniformly spaced time intervals. Moreover, some speech events are slow while others are fast. Uniform sampling of speech parameters is thus not efficient. We describe a non-uniform sampling and interpolation procedure for efficient coding of log area parameters. A temporal decomposition technique is used to represent the continuous variation of these parameters as a linearly-weighted sum of a number of discrete elementary components. The location and length of each component is automatically adapted to speech events. We find that each elementary component can be coded as a very low information rate signal.
Citations
More filters
Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Patent
19 Oct 2007
TL;DR: In this paper, various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors.
Abstract: The various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors. The response by the device may include a change of state of the device, and the response may be automatically performed after the user activity is determined.

844 citations

Patent
28 Sep 2012
TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.
Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

593 citations

Journal ArticleDOI
TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

507 citations


Cites background from "Efficient coding of LPC parameters ..."

  • ...…(ML) criterion, defining a QSS as the longest segment that has most probably been generated by the same AR process.3 Another approach is proposed in Atal (1983), which describes a temporal decomposition technique to represent the continuous variation of the LPC parameters as a linearly weighted…...

    [...]

  • ...Another approach is proposed in [10], which describes a temporal decomposition technique to represent the continu ous variation of the LPC parameters as a linearly weighted sum of a number of discrete elementary components....

    [...]

Patent
08 Sep 2006
TL;DR: In this paper, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontologies includes at least one active processing element that models a domain.
Abstract: A method and apparatus are provided for building an intelligent automated assistant. Embodiments of the present invention rely on the concept of “active ontologies” (e.g., execution environments constructed in an ontology-like manner) to build and run applications for use by intelligent automated assistants. In one specific embodiment, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontology includes at least one active processing element that models a domain. At least one of the remote services is then registered for use in the domain.

389 citations

References
More filters
Book
05 Sep 1978
TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.
Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

3,103 citations

Book
02 Dec 2011
TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.
Abstract: 1. Introduction.- 1.1 Basic Physical Principles.- 1.2 Acoustical Waveform Examples.- 1.3 Speech Analysis and Synthesis Models.- 1.4 The Linear Prediction Model.- 1.5 Organization of Book.- 2. Formulations.- 2.1 Historical Perspective.- 2.2 Maximum Likelihood.- 2.3 Minimum Variance.- 2.4 Prony's Method.- 2.5 Correlation Matching.- 2.6 PARCOR (Partial Correlation).- 2.6.1 Inner Products and an Orthogonality Principle.- 2.6.2 The PARCOR Lattice Structure.- 3. Solutions and Properties.- 3.1 Introduction.- 3.2 Vector Spaces and Inner Products.- 3.2.1 Filter or Polynomial Norms.- 3.2.2 Properties of Inner Products.- 3.2.3 Orthogonality Relations.- 3.3 Solution Algorithms.- 3.3.1 Correlation Matrix.- 3.3.2 Initialization.- 3.3.3 Gram-Schmidt Orthogonalization.- 3.3.4 Levinson Recursion.- 3.3.5 Updating Am(z).- 3.3.6 A Test Example.- 3.4 Matrix Forms.- 4. Acoustic Tube Modeling.- 4.1 Introduction.- 4.2 Acoustic Tube Derivation.- 4.2.1 Single Section Derivation.- 4.2.2 Continuity Conditions.- 4.2.3 Boundary Conditions.- 4.3 Relationship between Acoustic Tube and Linear Prediction.- 4.4 An Algorithm, Examples, and Evaluation.- 4.4.1 An Algorithm.- 4.4.2 Examples.- 4.4.3 Evaluation of the Procedure.- 4.5 Estimation of Lip Impedance.- 4.5.1 Lip Impedance Derivation.- 4.6 Further Topics.- 4.6.1 Losses in the Acoustic Tube Model.- 4.6.2 Acoustic Tube Stability.- 5. Speech Synthesis Structures.- 5.1 Introduction.- 5.2 Stability.- 5.2.1 Step-up Procedure.- 5.2.2 Step-down Procedure.- 5.2.3 Polynomial Properties.- 5.2.4 A Bound on |Fm(z)|.- 5.2.5 Necessary and Sufficient Stability Conditions.- 5.2.6 Application of Results.- 5.3 Recursive Parameter Evaluation.- 5.3.1 Inner Product Properties.- 5.3.2 Equation Summary with Program.- 5.4 A General Synthesis Structure.- 5.5 Specific Speech Synthesis Structures.- 5.5.1 The Direct Form.- 5.5.2 Two-Multiplier Lattice Model.- 5.5.3 Kelly-Lochbaum Model.- 5.5.4 One-Multiplier Models.- 5.5.5 Normalized Filter Model.- 5.5.6 A Test Example.- 6. Spectral Analysis.- 6.1 Introduction.- 6.2 Spectral Properties.- 6.2.1 Zero Mean All-Pole Model.- 6.2.2 Gain Factor for Spectral Matching.- 6.2.3 Limiting Spectral Match.- 6.2.4 Non-uniform Spectral Weighting.- 6.2.5 Minimax Spectral Matching.- 6.3 A Spectral Flatness Model.- 6.3.1 A Spectral Flatness Measure.- 6.3.2 Spectral Flatness Transformations.- 6.3.3 Numerical Evaluation.- 6.3.4 Experimental Results.- 6.3.5 Driving Function Models.- 6.4 Selective Linear Prediction.- 6.4.1 Selective Linear Prediction (SLP) Algorithm.- 6.4.2 A Selective Linear Prediction Program.- 6.4.3 Computational Considerations.- 6.5 Considerations in Choice of Analysis Conditions.- 6.5.1 Choice of Method.- 6.5.2 Sampling Rates.- 6.5.3 Order of Filter.- 6.5.4 Choice of Analysis Interval.- 6.5.5 Windowing.- 6.5.6 Pre-emphasis.- 6.6 Spectral Evaluation Techniques.- 6.7 Pole Enhancement.- 7. Automatic Formant Trajectory Estimation.- 7.1 Introduction.- 7.2 Formant Trajectory Estimation Procedure.- 7.2.1 Introduction.- 7.2.2 Raw Data from A(z).- 7.2.3 Examples of Raw Data.- 7.3 Comparison of Raw Data from Linear Prediction and Cepstral Smoothing.- 7.4 Algorithm 1.- 7.5 Algorithm 2.- 7.5.1 Definition of Anchor Points.- 7.5.2 Processing of Each Voiced Segment.- 7.5.3 Final Smoothing.- 7.5.4 Results and Discussion.- 7.6 Formant Estimation Accuracy.- 7.6.1 An Example of Synthetic Speech Analysis.- 7.6.2 An Example of Real Speech Analysis.- 7.6.3 Influence of Voice Periodicity.- 8. Fundamental Frequency Estimation.- 8.1 Introduction.- 8.2 Preprocessing by Spectral Flattening.- 8.2.1 Analysis of Voiced Speech with Spectral Regularity.- 8.2.2 Analysis of Voiced Speech with Spectral Irregularities.- 8.2.3 The STREAK Algorithm.- 8.3 Correlation Techniques.- 8.3.1 Autocorrelation Analysis.- 8.3.2 Modified Autocorrelation Analysis.- 8.3.3 Filtered Error Signal Autocorrelation Analysis.- 8.3.4 Practical Considerations.- 8.3.5 The SIFT Algorithm.- 9. Computational Considerations in Analysis.- 9.1 Introduction.- 9.2 Ill-Conditioning.- 9.2.1 A Measure of Ill-Conditioning.- 9.2.2 Pre-emphasis of Speech Data.- 9.2.3 Prefiltering before Sampling.- 9.3 Implementing Linear Prediction Analysis.- 9.3.1 Autocorrelation Method.- 9.3.2 Covariance Method.- 9.3.3 Computational Comparison.- 9.4 Finite Word Length Considerations.- 9.4.1 Finite Word Length Coefficient Computation.- 9.4.2 Finite Word Length Solution of Equations.- 9.4.3 Overall Finite Word Length Implementation.- 10. Vocoders.- 10.1 Introduction.- 10.2 Techniques.- 10.2.1 Coefficient Transformations.- 10.2.2 Encoding and Decoding.- 10.2.3 Variable Frame Rate Transmission.- 10.2.4 Excitation and Synthesis Gain Matching.- 10.2.5 A Linear Prediction Synthesizer Program.- 10.3 Low Bit Rate Pitch Excited Vocoders.- 10.3.1 Maximum Likelihood and PARCOR Vocoders.- 10.3.2 Autocorrelation Method Vocoders.- 10.3.3 Covariance Method Vocoders.- 10.4 Base-Band Excited Vocoders.- 11. Further Topics.- 11.1 Speaker Identification and Verification.- 11.2 Isolated Word Recognition.- 11.3 Acoustical Detection of Laryngeal Pathology.- 11.4 Pole-Zero Estimation.- 11.5 Summary and Future Directions.- References.

1,945 citations

Journal ArticleDOI
TL;DR: Application of this method for efficient transmission and storage of speech signals as well as procedures for determining other speechcharacteristics, such as formant frequencies and bandwidths, the spectral envelope, and the autocorrelation function, are discussed.
Abstract: A method of representing the speech signal by time‐varying parameters relating to the shape of the vocal tract and the glottal‐excitation function is described. The speech signal is first analyzed and then synthesized by representing it as the output of a discrete linear time‐varying filter, which is excited by a suitable combination of a quasiperiodic pulse train and white noise. The output of the linear filter at any sampling instant is a linear combination of the past output samples and the input. The optimum linear combination is obtained by minimizing the mean‐squared error between the actual values of the speech samples and their predicted values based on a fixed number of preceding samples. A 10th‐order linear predictor was found to represent the speech signal band‐limited to 5kHz with sufficient accuracy. The 10 coefficients of the predictor are shown to determine both the frequencies and bandwidths of the formants. Two parameters relating to the glottal‐excitation function and the pitch period are determined from the prediction error signal. Speech samples synthesized by this method will be demonstrated.

1,124 citations

01 Jan 1983

496 citations

Journal ArticleDOI
Bishnu S. Atal1
TL;DR: A new class of speech coders are described which allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.
Abstract: Predictive coding is a promising approach for speech coding. In this paper, we review the recent work on adaptive predictive coding of speech signals, with particular emphasis on achieving high speech quality at low bit rates (less than 10 kbits/s). Efficient prediction of the redundant structure in speech signals is obviously important for proper functioning of a predictive coder. It is equally important to ensure that the distortion in the coded speech signal be perceptually small. The subjective loudness of quantization noise depends both on the short-time spectrum of the noise and its relation to the short-time spectrum of the Speech signal. The noise in the formant regions is partially masked by the speech signal itself. This masking of quantization noise by speech signal allows one to use low bit rates while maintaining high speech quality. This paper will present generalizations of predictive coding for minimizing subjective distortion in the reconstructed speech signal at the receiver. The quantizer in predictive coders quantizes its input on a sample-by-sample basis. Such sample-by-sample (instantaneous) quantization creates difficulty in realizing an arbitrary noise spectrum, particularly at low bit rates. We will describe a new class of speech coders in this paper which could be considered to be a generalization of the predictive coder. These new coders not only allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.

316 citations