scispace - formally typeset
Search or ask a question
Author

B. Atal

Bio: B. Atal is an academic researcher from Bell Labs. The author has contributed to research in topics: Linear predictive coding & Speech coding. The author has an hindex of 13, co-authored 17 publications receiving 2031 citations.

Papers
More filters
Journal ArticleDOI
Kuldip K. Paliwal1, B. Atal1
TL;DR: It is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB.
Abstract: For low bit rate speech coding applications, it is important to quantize the LPC parameters accurately using as few bits as possible. Though vector quantizers are more efficient than scalar quantizers, their use for accurate quantization of linear predictive coding (LPC) information (using 24-26 bits/frame) is impeded by their prohibitively high complexity. A split vector quantization approach is used here to overcome the complexity problem. An LPC vector consisting of 10 line spectral frequencies (LSFs) is divided into two parts, and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. With this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB. The effect of channel errors on the performance of this quantizer is also investigated and results are reported. >

665 citations

Proceedings ArticleDOI
B. Atal1, J. Remde1
03 May 1982
TL;DR: This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period, and minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals.
Abstract: The excitation for LPC speech synthesis usually consists of two separate signals - a delta-function pulse once every pitch period for voiced speech and white noise for unvoiced speech. This manner of representing excitation requires that speech segments be classified accurately into voiced and unvoiced categories and the pitch period of voiced segments be known. It is now well recognized that such a rigid idealization of the vocal excitation is often responsible for the unnatural quality associated with synthesized speech. This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period. All classes of sounds are generated by exciting the LPC filter with a sequence of pulses; the amplitudes and locations of the pulses are determined using a non-iterative analysis-by-synthesis procedure. This procedure minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals. The distance metric takes account of the finite-frequency resolution as well as the differential sensitivity of the human ear to errors in the formant and inter-formant regions of the speech spectrum.

600 citations

Journal ArticleDOI
Kuldip K. Paliwal1, B. Atal1
TL;DR: A split vector quantization approach is used to overcome the complexity problem of LPC vector and each part is vector‐quantized separately.
Abstract: Linear prediction coding (LPC) parameters are widely used in various speech processing applications for representing the spectral envelope information of speech. For low‐bit‐rate speech coding application, it is important to quantize these parameters accurately using as few bits as possible without sacrificing the speech quality. Though the vector quantizers are more efficient than the scalar quantizers, their use for fine quantization of LPC information (using 24–26 bits/frames) is impeded due to their prohibitively high complexity. In this paper, a split vector quantization approach is used to overcome the complexity problem. Here, the LPC vector is divided into two parts and each part is vector‐quantized separately. The splitting of LPC vector is studied in the following three domains: (1) line spectral‐pair frequency (LSF), (2) arc‐sine reflection coefficient, and (3) log area ratio. Splitting in LSF domain is found to be the best. Using the localized spectral properties of the LSF parameters, a weigh...

211 citations

Proceedings ArticleDOI
Kuldip K. Paliwal1, B. Atal1
14 Apr 1991
TL;DR: It is shown that the split vector quantizer can quantize LPC information in 24 b/frame with 1-dB average spectral distortion and <2% outlier frames (having spectral distortion greater than 2 dB).
Abstract: Though vector quantizers are more efficient than scalar quantizers, their use for fine quantization of linear predictive coding (LPC) information (using 24-26 b/frame) is impeded due to their prohibitively high complexity. In the present work, a split vector quantization approach is used to overcome the complexity problem. The LPC vector, consisting of ten line spectral frequencies (LSFs), is divided into two parts and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. Using this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 b/frame with 1-dB average spectral distortion and >

137 citations

Proceedings ArticleDOI
B. Atal1, M. Schroeder
01 Apr 1980
TL;DR: This method of quantization not only improves the speech quality by accurate quantization of the predicted residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.
Abstract: Adaptive predictive coding of speech signals at bit rates lower than 10 kbits/sec often requires the use of 2-level (1 bit) quantization of the samples of the prediction residual. Such a coarse quantization of the prediction residual can produce audible quantizing noise in the reproduced speech signal at the receiver. This paper describes a new method of quantization for improving the speech quality. The improvement is obtained by center clipping the prediction residual and by fine quantization of the high-amplitude portions of the prediction residual. The threshold of center clipping is adjusted to provide encoding of the prediction residual at a specified bit rate. This method of quantization not only improves the speech quality by accurate quantization of the prediction residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.

113 citations


Cited by
More filters
Book
01 Jan 1996
TL;DR: The author explains the development of the Huffman Coding Algorithm and some of the techniques used in its implementation, as well as some of its applications, including Image Compression, which is based on the JBIG standard.
Abstract: Preface 1 Introduction 1.1 Compression Techniques 1.1.1 Lossless Compression 1.1.2 Lossy Compression 1.1.3 Measures of Performance 1.2 Modeling and Coding 1.3 Organization of This Book 1.4 Summary 1.5 Projects and Problems 2 Mathematical Preliminaries 2.1 Overview 2.2 A Brief Introduction to Information Theory 2.3 Models 2.3.1 Physical Models 2.3.2 Probability Models 2.3.3. Markov Models 2.3.4 Summary 2.5 Projects and Problems 3 Huffman Coding 3.1 Overview 3.2 "Good" Codes 3.3. The Huffman Coding Algorithm 3.3.1 Minimum Variance Huffman Codes 3.3.2 Length of Huffman Codes 3.3.3 Extended Huffman Codes 3.4 Nonbinary Huffman Codes 3.5 Adaptive Huffman Coding 3.5.1 Update Procedure 3.5.2 Encoding Procedure 3.5.3 Decoding Procedure 3.6 Applications of Huffman Coding 3.6.1 Lossless Image Compression 3.6.2 Text Compression 3.6.3 Audio Compression 3.7 Summary 3.8 Projects and Problems 4 Arithmetic Coding 4.1 Overview 4.2 Introduction 4.3 Coding a Sequence 4.3.1 Generating a Tag 4.3.2 Deciphering the Tag 4.4 Generating a Binary Code 4.4.1 Uniqueness and Efficiency of the Arithmetic Code 4.4.2 Algorithm Implementation 4.4.3 Integer Implementation 4.5 Comparison of Huffman and Arithmetic Coding 4.6 Applications 4.6.1 Bi-Level Image Compression-The JBIG Standard 4.6.2 Image Compression 4.7 Summary 4.8 Projects and Problems 5 Dictionary Techniques 5.1 Overview 5.2 Introduction 5.3 Static Dictionary 5.3.1 Diagram Coding 5.4 Adaptive Dictionary 5.4.1 The LZ77 Approach 5.4.2 The LZ78 Approach 5.5 Applications 5.5.1 File Compression-UNIX COMPRESS 5.5.2 Image Compression-the Graphics Interchange Format (GIF) 5.5.3 Compression over Modems-V.42 bis 5.6 Summary 5.7 Projects and Problems 6 Lossless Image Compression 6.1 Overview 6.2 Introduction 6.3 Facsimile Encoding 6.3.1 Run-Length Coding 6.3.2 CCITT Group 3 and 4-Recommendations T.4 and T.6 6.3.3 Comparison of MH, MR, MMR, and JBIG 6.4 Progressive Image Transmission 6.5 Other Image Compression Approaches 6.5.1 Linear Prediction Models 6.5.2 Context Models 6.5.3 Multiresolution Models 6.5.4 Modeling Prediction Errors 6.6 Summary 6.7 Projects and Problems 7 Mathematical Preliminaries 7.1 Overview 7.2 Introduction 7.3 Distortion Criteria 7.3.1 The Human Visual System 7.3.2 Auditory Perception 7.4 Information Theory Revisted 7.4.1 Conditional Entropy 7.4.2 Average Mutual Information 7.4.3 Differential Entropy 7.5 Rate Distortion Theory 7.6 Models 7.6.1 Probability Models 7.6.2 Linear System Models 7.6.3 Physical Models 7.7 Summary 7.8 Projects and Problems 8 Scalar Quantization 8.1 Overview 8.2 Introduction 8.3 The Quantization Problem 8.4 Uniform Quantizer 8.5 Adaptive Quantization 8.5.1 Forward Adaptive Quantization 8.5.2 Backward Adaptive Quantization 8.6 Nonuniform Quantization 8.6.1 pdf-Optimized Quantization 8.6.2 Companded Quantization 8.7 Entropy-Coded Quantization 8.7.1 Entropy Coding of Lloyd-Max Quantizer Outputs 8.7.2 Entropy-Constrained Quantization 8.7.3 High-Rate Optimum Quantization 8.8 Summary 8.9 Projects and Problems 9 Vector Quantization 9.1 Overview 9.2 Introduction 9.3 Advantages of Vector Quantization over Scalar Quantization 9.4 The Linde-Buzo-Gray Algorithm 9.4.1 Initializing the LBG Algorithm 9.4.2 The Empty Cell Problem 9.4.3 Use of LBG for Image Compression 9.5 Tree-Structured Vector Quantizers 9.5.1 Design of Tree-Structured Vector Quantizers 9.6 Structured Vector Quantizers 9.6.1 Pyramid Vector Quantization 9.6.2 Polar and Spherical Vector Quantizers 9.6.3 Lattice Vector Quantizers 9.7 Variations on the Theme 9.7.1 Gain-Shape Vector Quantization 9.7.2 Mean-Removed Vector Quantization 9.7.3 Classified Vector Quantization 9.7.4 Multistage Vector Quantization 9.7.5 Adaptive Vector Quantization 9.8 Summary 9.9 Projects and Problems 10 Differential Encoding 10.1 Overview 10.2 Introduction 10.3 The Basic Algorithm 10.4 Prediction in DPCM 10.5 Adaptive DPCM (ADPCM) 10.5.1 Adaptive Quantization in DPCM 10.5.2 Adaptive Prediction in DPCM 10.6 Delta Modulation 10.6.1 Constant Factor Adaptive Delta Modulation (CFDM) 10.6.2 Continuously Variable Slope Delta Modulation 10.7 Speech Coding 10.7.1 G.726 10.8 Summary 10.9 Projects and Problems 11 Subband Coding 11.1 Overview 11.2 Introduction 11.3 The Frequency Domain and Filtering 11.3.1 Filters 11.4 The Basic Subband Coding Algorithm 11.4.1 Bit Allocation 11.5 Application to Speech Coding-G.722 11.6 Application to Audio Coding-MPEG Audio 11.7 Application to Image Compression 11.7.1 Decomposing an Image 11.7.2 Coding the Subbands 11.8 Wavelets 11.8.1 Families of Wavelets 11.8.2 Wavelets and Image Compression 11.9 Summary 11.10 Projects and Problems 12 Transform Coding 12.1 Overview 12.2 Introduction 12.3 The Transform 12.4 Transforms of Interest 12.4.1 Karhunen-Loeve Transform 12.4.2 Discrete Cosine Transform 12.4.3 Discrete Sine Transform 12.4.4 Discrete Walsh-Hadamard Transform 12.5 Quantization and Coding of Transform Coefficients 12.6 Application to Image Compression-JPEG 12.6.1 The Transform 12.6.2 Quantization 12.6.3 Coding 12.7 Application to Audio Compression 12.8 Summary 12.9 Projects and Problems 13 Analysis/Synthesis Schemes 13.1 Overview 13.2 Introduction 13.3 Speech Compression 13.3.1 The Channel Vocoder 13.3.2 The Linear Predictive Coder (Gov.Std.LPC-10) 13.3.3 Code Excited Linear Prediction (CELP) 13.3.4 Sinusoidal Coders 13.4 Image Compression 13.4.1 Fractal Compression 13.5 Summary 13.6 Projects and Problems 14 Video Compression 14.1 Overview 14.2 Introduction 14.3 Motion Compensation 14.4 Video Signal Representation 14.5 Algorithms for Videoconferencing and Videophones 14.5.1 ITU_T Recommendation H.261 14.5.2 Model-Based Coding 14.6 Asymmetric Applications 14.6.1 The MPEG Video Standard 14.7 Packet Video 14.7.1 ATM Networks 14.7.2 Compression Issues in ATM Networks 14.7.3 Compression Algorithms for Packet Video 14.8 Summary 14.9 Projects and Problems A Probability and Random Processes A.1 Probability A.2 Random Variables A.3 Distribution Functions A.4 Expectation A.5 Types of Distribution A.6 Stochastic Process A.7 Projects and Problems B A Brief Review of Matrix Concepts B.1 A Matrix B.2 Matrix Operations C Codes for Facsimile Encoding D The Root Lattices Bibliography Index

2,311 citations

Journal ArticleDOI
TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
Abstract: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].

1,659 citations

Journal ArticleDOI
TL;DR: This work considers in depth the extension of two classes of algorithms-Matching Pursuit and FOCal Underdetermined System Solver-to the multiple measurement case so that they may be used in applications such as neuromagnetic imaging, where multiple measurement vectors are available, and solutions with a common sparsity structure must be computed.
Abstract: We address the problem of finding sparse solutions to an underdetermined system of equations when there are multiple measurement vectors having the same, but unknown, sparsity structure. The single measurement sparse solution problem has been extensively studied in the past. Although known to be NP-hard, many single-measurement suboptimal algorithms have been formulated that have found utility in many different applications. Here, we consider in depth the extension of two classes of algorithms-Matching Pursuit (MP) and FOCal Underdetermined System Solver (FOCUSS)-to the multiple measurement case so that they may be used in applications such as neuromagnetic imaging, where multiple measurement vectors are available, and solutions with a common sparsity structure must be computed. Cost functions appropriate to the multiple measurement problem are developed, and algorithms are derived based on their minimization. A simulation study is conducted on a test-case dictionary to show how the utilization of more than one measurement vector improves the performance of the MP and FOCUSS classes of algorithm, and their performances are compared.

1,454 citations

Book
01 Jan 2001
TL;DR: This chapter discusses the Discrete-Time Speech Signal Processing Framework, a model based on the FBS Method, and its applications in Speech Communication Pathway and Homomorphic Signal Processing.
Abstract: (NOTE: Each chapter begins with an introduction and concludes with a Summary, Exercises and Bibliography.) 1. Introduction. Discrete-Time Speech Signal Processing. The Speech Communication Pathway. Analysis/Synthesis Based on Speech Production and Perception. Applications. Outline of Book. 2. A Discrete-Time Signal Processing Framework. Discrete-Time Signals. Discrete-Time Systems. Discrete-Time Fourier Transform. Uncertainty Principle. z-Transform. LTI Systems in the Frequency Domain. Properties of LTI Systems. Time-Varying Systems. Discrete-Fourier Transform. Conversion of Continuous Signals and Systems to Discrete Time. 3. Production and Classification of Speech Sounds. Anatomy and Physiology of Speech Production. Spectrographic Analysis of Speech. Categorization of Speech Sounds. Prosody: The Melody of Speech. Speech Perception. 4. Acoustics of Speech Production. Physics of Sound. Uniform Tube Model. A Discrete-Time Model Based on Tube Concatenation. Vocal Fold/Vocal Tract Interaction. 5. Analysis and Synthesis of Pole-Zero Speech Models. Time-Dependent Processing. All-Pole Modeling of Deterministic Signals. Linear Prediction Analysis of Stochastic Speech Sounds. Criterion of "Goodness". Synthesis Based on All-Pole Modeling. Pole-Zero Estimation. Decomposition of the Glottal Flow Derivative. Appendix 5.A: Properties of Stochastic Processes. Random Processes. Ensemble Averages. Stationary Random Process. Time Averages. Power Density Spectrum. Appendix 5.B: Derivation of the Lattice Filter in Linear Prediction Analysis. 6. Homomorphic Signal Processing. Concept. Homomorphic Systems for Convolution. Complex Cepstrum of Speech-Like Sequences. Spectral Root Homomorphic Filtering. Short-Time Homomorphic Analysis of Periodic Sequences. Short-Time Speech Analysis. Analysis/Synthesis Structures. Contrasting Linear Prediction and Homomorphic Filtering. 7. Short-Time Fourier Transform Analysis and Synthesis. Short-Time Analysis. Short-Time Synthesis. Short-Time Fourier Transform Magnitude. Signal Estimation from the Modified STFT or STFTM. Time-Scale Modification and Enhancement of Speech. Appendix 7.A: FBS Method with Multiplicative Modification. 8. Filter-Bank Analysis/Synthesis. Revisiting the FBS Method. Phase Vocoder. Phase Coherence in the Phase Vocoder. Constant-Q Analysis/Synthesis. Auditory Modeling. 9. Sinusoidal Analysis/Synthesis. Sinusoidal Speech Model. Estimation of Sinewave Parameters. Synthesis. Source/Filter Phase Model. Additive Deterministic-Stochastic Model. Appendix 9.A: Derivation of the Sinewave Model. Appendix 9.B: Derivation of Optimal Cubic Phase Parameters. 10. Frequency-Domain Pitch Estimation. A Correlation-Based Pitch Estimator. Pitch Estimation Based on a "Comb Filter<170. Pitch Estimation Based on a Harmonic Sinewave Model. Glottal Pulse Onset Estimation. Multi-Band Pitch and Voicing Estimation. 11. Nonlinear Measurement and Modeling Techniques. The STFT and Wavelet Transform Revisited. Bilinear Time-Frequency Distributions. Aeroacoustic Flow in the Vocal Tract. Instantaneous Teager Energy Operator. 12. Speech Coding. Statistical Models of Speech. Scaler Quantization. Vector Quantization (VQ). Frequency-Domain Coding. Model-Based Coding. LPC Residual Coding. 13. Speech Enhancement. Introduction. Preliminaries. Wiener Filtering. Model-Based Processing. Enhancement Based on Auditory Masking. Appendix 13.A: Stochastic-Theoretic parameter Estimation. 14. Speaker Recognition. Introduction. Spectral Features for Speaker Recognition. Speaker Recognition Algorithms. Non-Spectral Features in Speaker Recognition. Signal Enhancement for the Mismatched Condition. Speaker Recognition from Coded Speech. Appendix 14.A: Expectation-Maximization (EM) Estimation. Glossary.Speech Signal Processing.Units.Databases.Index.About the Author.

984 citations

Journal ArticleDOI
01 Nov 1985
TL;DR: This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization, and focuses primarily on the coding of speech signals and parameters.
Abstract: Quantization, the process of approximating continuous-amplitude signals by digital (discrete-amplitude) signals, is an important aspect of data compression or coding, the field concerned with the reduction of the number of bits necessary to transmit or store analog data, subject to a distortion or fidelity criterion. The independent quantization of each signal value or parameter is termed scalar quantization, while the joint quantization of a block of parameters is termed block or vector quantization. This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization. Vector quantization is presented as a process of redundancy removal that makes effective use of four interrelated properties of vector parameters: linear dependency (correlation), nonlinear dependency, shape of the probability density function (pdf), and vector dimensionality itself. In contrast, scalar quantization can utilize effectively only linear dependency and pdf shape. The basic concepts are illustrated by means of simple examples and the theoretical limits of vector quantizer performance are reviewed, based on results from rate-distortion theory. Practical issues relating to quantizer design, implementation, and performance in actual applications are explored. While many of the methods presented are quite general and can be used for the coding of arbitrary signals, this paper focuses primarily on the coding of speech signals and parameters.

961 citations