scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Cepstral techniques appear to be even more reliable and efficient than visual methods for pitch detection, and to produce high‐resolution spectra without utilizing either heterodyning methods or bandpass filter banks.
Abstract: A spectrum analyzer based on a definition of short‐time power spectra has been designed and simulated on a digital computer. The analyzer is primarily intended for use in speech analysis. It has been designed to operate in real time, and to produce high‐resolution spectra without utilizing either heterodyning methods or bandpass filter banks. The logarithm of each consecutive amplitude spectrum thus obtained can be used as the input to a second similar spectrum analyzer. The output of this analyzer is then the “cepstrum” or power spectrum of the logarithm spectrum. The cepstrum of a speech signal has a peak corresponding to the fundamental period for voiced speech but no peak for unvoiced speech. Thus, a cepstrum analyzer can function both as a pitch and as a voiced‐unvoiced detector. Cepstral pitch detection has the important advantages that it is insensitive to phase distortion, and is also resistant to additive noise and amplitude distortion of the speech signal. The method does not require the presence of the fundamental frequency in the speech signal, and will give several separate cepstral peaks if several different pitch periods are present. Cepstral techniques appear to be even more reliable and efficient than visual methods for pitch detection. The short‐time spectrum and cepstrum analyzers described in this paper were simulated by a sampled‐data system on an IBM‐7090 digital computer. The simulation was programmed with the assistance of a special block‐diagram compiler.

219 citations

Journal ArticleDOI
TL;DR: In this paper, the Jacobian determinant of the transformation matrix is computed analytically for three typical warping functions and it is shown that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.
Abstract: Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

217 citations

Book
01 Apr 1983
TL;DR: This chapter discusses Digital Signal Processing with PDAs with Multichannel PDAs, a first look at the areas of application, and Voicing Determination by Means of Pattern Recognition Methods.
Abstract: 1. Introduction.- 1.1 Voice Source Parameter Measurement and the Speech Signal.- 1.2 A Short Look at the Areas of Application.- 1.3 Organization of the Book.- 2. Basic Terminology. A Short Introduction to Digital Signal Processing.- 2.1 The Simplified Model of Speech Excitation.- 2.2 Digital Signal Processing 1: Signal Representation.- 2.3 Digital Signal Processing 2: Filters.- 2.4 Time-Variant Systems. The Principle of Short-Term Analysis.- 2.5 Definition of the Task. The Linear Model of Speech Production.- 2.6 A First Categorization of Pitch Determination Algorithms (PDAs).- 3. The Human Voice Source.- 3.1 Mechanism of Sound Generation at the Larynx.- 3.2 Operational Modes of the Larynx. Registers.- 3.3 The Glottal Source (Excitation) Signal.- 3.4 The Influence of the Vocal Tract Upon Voice Source Parameters.- 3.5 The Voiceless and the Transient Sources.- 4. Measuring Range, Accuracy, Pitch Perception.- 4.1 The Range of Fundamental Frequency.- 4.2 Pitch Perception. Toward a Redefinition of the Task.- 4.2.1 Pitch Perception: Spectral and Virtual Pitch.- 4.2.2 Toward a Redefinition of the Task.- 4.2.3 Difference Limens for Fundamental-Frequency Change.- 4.3 Measurement Accuracy.- 4.4 Representation of the Pitch Information in the Signal.- 4.5 Calibration and Performance Evaluation of a PDA.- 5. Manual and Instrumental Pitch Determination, Voicing Determination.- 5.1 Manual Pitch Determination.- 5.1.1 Time-Domain Manual Pitch Determination.- 5.1.2 Frequency-Domain Manual Pitch Determination.- 5.2 Pitch Determination Instruments (PDIs).- 5.2.1 Clinical Methods for Larynx Inspection.- 5.2.2 Mechanic PDIs.- 5.2.3 Electric PDIs.- 5.2.4 Ultrasonic PDIs.- 5.2.5 Photoelectric PDIs (Transillumination of the Glottis).- 5.2.6 Comparative Evaluation of PDIs.- 5.3 Voicing Determination - Selected Examples.- 5.3.1 Voicing Determination: Parameters.- 5.3.2 Voicing Determination - Simple Voicing Determination Algo-rithms (VDAs) Combined VDA-PDA Systems.- 5.3.3 Multiparameter VDAs. Voicing Determination by Means of Pattern Recognition Methods.- 5.3.4 Summary and Conclusions.- 6. Time-Domain Pitch Determination.- 6.1 Pitch Determination by Fundamental-Harmonic Extraction.- 6.1.1 The Basic Extractor.- 6.1.2 The Simplest Pitch Determination Device - Low-Pass Filter and Zero (or Threshold) Crossings Analysis Basic Extractor.- 6.1.3 Enhancement of the First Harmonic by Nonlinear Means.- 6.1.4 Manual Preset and Tunable (Adaptive) Filters.- 6.2 The Other Extreme - Temporal Structure Analysis.- 6.2.1 Envelope Modeling - the Analog Approach.- 6.2.2 Simple Peak Detector and Global Correction.- 6.2.3 Zero Crossings and Excursion Cycles.- 6.2.4 Mixed-Feature Algorithms.- 6.2.5 Other PDAs That Investigate the Temporal Structure of the Signal.- 6.3 The Intermediate Device: Temporal Structure Transformation and Simplification.- 6.3.1 Temporal Structure Simplification by Inverse Filtering.- 6.3.2 The Discontinuity in the Excitation Signal: Event Detection.- 6.4 Parallel Processing in Fundamental Period Determination. Multichannel PDAs.- 6.4.1 PDAs with Multichannel Preprocessor Filters.- 6.4.2 PDAs with Several Channels Applying Different Extraction Principles.- 6.5 Special-Purpose (High-Accuracy) Time-Domain PDAs.- 6.5.1 Glottal Inverse Filtering.- 6.5.2 Determining the Instant of Glottal Closure.- 6.6 The Postprocessor.- 6.6.1 Time-to-Frequency Conversion Display.- 6.6.2 f0 Determination With Basic Extractor Omitted.- 6.6.3 Global Error Correction Routines.- 6.6.4 Smoothing Pitch Contours.- 6.7 Final Comments.- 7. Design and Implementation of a Time-Domain PDA for Undistorted and Band-Limited Signals.- 7.1 The Linear Algorithm.- 7.1.1 Prefiltering.- 7.1.2 Measurement and Suppression of F1.- 7.1.3 The Basic Extractor.- 7.1.4 Problems with the Formant F2. Implementation of a Multiple Two-Pulse Filter (TPF).- 7.1.5 Phase Relations and Starting Point of the Period.- 7.1.6 Performance of the Algorithm with Respect to Linear Distortions, Especially to Band Limitations.- 7.2 Band-Limited Signals in Time-Domain PDAs.- 7.2.1 Concept of the Universal PDA.- 7.2.2 Once More: Use of Nonlinear Distortion in Time-Domain PDAs.- 7.3 An Experimental Study Towards a Universal Time-Domain PDA Applying a Nonlinear Function and a Threshold Analysis Basic Extractor.- 7.3.1 Setup of the Experiment.- 7.3.2 Relative Amplitude and Enhancement of First Harmonic.- 7.4 Toward a Choice of Optimal Nonlinear Functions.- 7.4.1 Selection with Respect to Phase Distortions.- 7.4.2 Selection with Respect to Amplitude Characteristics.- 7.4.3 Selection with Respect to the Sequence of Processing.- 7.5 Implementation of a Three-Channel PDA with Nonlinear Processing.- 7.5.1 Selection of Nonlinear Functions.- 7.5.2 Determination of the Parameter for the Comb Filter.- 7.5.3 Threshold Function in the Basic Extractor.- 7.5.4 Selection of the Most Likely Channel in the Basic Extractor.- 8. Short-Term Analysis Pitch Determination.- 8.1 The Short-Term Transformation and Its Consequences.- 8.2 Autocorrelation Pitch Determination.- 8.2.1 The Autocorrelation Function and Its Relation to the Power Spectrum.- 8.2.2 Analog Realizations.- 8.2.3 "Ordinary" Autocorrelation PDAs.- 8.2.4 Autocorrelation PDAs with Nonlinear Preprocessing.- 8.2.5 Autocorrelation PDAs with Linear Adaptive Preprocessing.- 8.3 "Anticorrelation" Pitch Determination: Average Magnitude Difference Function, Distance and Dissimilarity Measures, and Other Nonstationary Short-Term Analysis PDAs.- 8.3.1 Average Magnitude Difference Function (AMDF).- 8.3.2 Generalized Distance Functions.- 8.3.3 Nonstationary Short-Term Analysis and Incremental Time-Domain PDAs.- 8.4 Multiple Spectral Transform ("Cepstrum") Pitch Determination.- 8.4.1 The More General Aspect: Deconvolution.- 8.4.2 Cepstrum Pitch Determination.- 8.5 Frequency-Domain PDAs.- 8.5.1 Spectral Compression: Frequency and Period Histogram Product Spectrum.- 8.5.2 Harmonic Matching. Psychoacoustic PDAs.- 8.5.3 Determination of f0 from the Distance of Adjacent Spectral Peaks.- 8.5.4 The Fast Fourier Transform, Spectral Resolution, and the Computing Effort.- 8.6 Maximum-Likelihood (Least-Squares) Pitch Determination.- 8.6.1 The Least-Squares Algorithm.- 8.6.2 A Multichannel Solution.- 8.6.3 Computing Complexity, Relation to Comb Filters, Simplified Realizations.- 8.7 Summary and Conclusions.- 9. General Discussion: Summary, Error Analysis, Applications.- 9.1 A Short Survey of the Principal Methods of Pitch Determination.- 9.1.1 Categorization of PDAs and Definitions of Pitch.- 9.1.2 The Basic Extractor.- 9.1.3 The Postprocessor.- 9.1.4 Methods of Preprocessing.- 9.1.5 The Impact of Technology of the Design of PDAs and the Question of Computing Effort.- 9.2 Calibration, Search for Standards.- 9.2.1 Data Acquisition.- 9.2.2 Creating the Standard Pitch Contour Manually, Automatically, and by an Interactive PDA.- 9.2.3 Creating a Standard Contour by Means of a PDI.- 9.3 Performance Evaluation of PDAs.- 9.3.1 Comparative Performance Evaluation of PDAs: Some Examples from the Literature.- 9.3.2 Methods of Error Analysis.- 9.4 A Closer Look at the Applications.- 9.4.1 Has the Problem Been Solved?.- 9.4.2 Application in Phonetics, Linguistics, and Musicology.- 9.4.3 Application in Education and in Pathology.- 9.4.4 The "Technical" Application: Speech Communication.- 9.4.5 A Way Around the Problem in Speech Communication: Voice-Excited and Residual-Excited Vocoding (Baseband Coding).- 9.5 Possible Paths Towards a General Solution.- Appendix A. Experimental Data on the Behavior of Nonlinear Functions in Time-Domain Pitch Determination Algorithms.- A.1 The Data Base of the Investigation.- A.2 Examples for the Behavior of the Nonlinear Functions.- A.3 Relative Amplitude RA1 and Enhancement RE1 of the First Harmonic.- A.4 Relative Amplitude RASM of Spurious Maximum and Autocorrelation Threshold.- A.5 Processing Sequence, Preemphasis, Phase, Band Limitation.- A.6 Optimal Performance of Nonlinear Functions.- A.7 Performance of the Comb Filters.- Appendix B. Original Text of the Quotations in Foreign Languages Throughout This Book.- List of Abbreviations.- Author and Subject Index.

212 citations

Journal ArticleDOI
TL;DR: It is demonstrated, by means of extensive simulations, that the proposed tricepstrum-based equalization scheme performs well and outperforms other existing blind equalizers, at the expense of higher computational complexity.
Abstract: An adaptive blind equalization method is introduced for nonminimum phase communication channels. The method estimates the inverse channel impulse response, by using the complex cepstrum of the fourth-order cumulants (tricepstrum) of the synchronously sampled received signal. As such, the proposed adaptive method depends only on the statistics of the received sequence, and is capable of reconstructing separately both the minimum and maximum phase response of the channel. It is demonstrated, by means of extensive simulations, that the proposed tricepstrum-based equalization scheme performs well and outperforms other existing blind equalizers, at the expense of higher computational complexity. >

211 citations

Proceedings ArticleDOI
19 Oct 1993
TL;DR: It is shown that a cepstral based algorithm exhibits a high degree of independence to levels of background noise and successful speech end-pointing can be achieved via thresholding cepStral distance measures.
Abstract: This paper reviews algorithms which rely on the analysis of time domain samples to provide energy and zero-crossing rates, together with more recent algorithms that use different methods for speech detection. We then examine a different approach using cepstral analysis, showing a high degree of amplitude and noise level independence. We show that a cepstral based algorithm exhibits a high degree of independence to levels of background noise and successful speech end-pointing can be achieved via thresholding cepstral distance measures. Through the use of a noise code-book we are able to provide a successful reference for Euclidean distance measures in the voice detection algorithm. >

208 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130