Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Ideal ratio mask estimation using deep neural networks for robust speech recognition

[...]

Arun Narayanan¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

26 May 2013

TL;DR: The proposed feature enhancement algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask.

...read moreread less

Abstract: We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.

...read moreread less

557 citations

Journal Article•DOI•

Blind deconvolution of spatially invariant image blurs with phase

[...]

M. Cannon¹•Institutions (1)

Los Alamos National Laboratory¹

01 Feb 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this paper, the frequency response of a two-dimensional spatially invariant linear system through which an image has been passed and blurred is estimated for the cases of uniform linear camera motion.

...read moreread less

Abstract: This paper is concerned with the digital estimation of the frequency response of a two-dimensional spatially invariant linear system through which an image has been passed and blurred. For the cases of uniform linear camera motion and an out-of-focus lens system it is shown that the power cepstrum of the image contains sufficient information to identify the blur. Methods for deblurring are presented, including restoration of the density version of the image. The restoration procedure consumes only a modest amount of computation time. Results are demonstrated on images blurred in the camera.

...read moreread less

489 citations

Book•DOI•

Acoustical and environmental robustness in automatic speech recognition

[...]

Alex Acero

01 May 1991

TL;DR: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cep stral normalization (CDCN).

...read moreread less

Abstract: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of environmental variability are introduced by the use of desk-top microphones and different training and testing conditions: additive noise and spectral tilt introduced by linear filtering. An important attribute of the novel compensation algorithms described in this thesis is that they provide joint rather than independent compensation for these two types of degradation. Acoustical compensation is applied in our algorithms as an additive correction in the cepstral domain. This allows a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses the cepstrum as its feature vector. Therefore, these algorithms can be implemented very efficiently. Processing in many of these algorithms is based on instantaneous signal-to-noise ratio (SNR), as the appropriate compensation represents a form of noise suppression at low SNRs and spectral equalization at high SNRs. The compensation vectors for additive noise and spectral transformations are estimated by minimizing the differences between speech feature vectors obtained from a "standard" training corpus of speech and feature vectors that represent the current acoustical environment. In our work this is accomplished by minimizing the distortion of vector-quantized cepstra that are produced by the feature extraction module in SPHINX. In this dissertation we describe several algorithms including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cepstral Normalization (CDCN). With CDCN, the accuracy of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded with a desk-top microphone is essentially the same obtained when the system is trained and tested on speech from the desk-top microphone. An algorithm for frequency normalization has also been proposed in which the parameter of the bilinear transformation that is used by the signal-processing stage to produce frequency warping is adjusted for each new speaker and acoustical environment. The optimum value of this parameter is again chosen to minimize the vector-quantization distortion between the standard environment and the current one. In preliminary studies, use of this frequency normalization produced a moderate additional decrease in the observed error rate.

...read moreread less

474 citations

Proceedings Article•DOI•

Environmental robustness in automatic speech recognition

[...]

Alejandro Acero¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

03 Apr 1990

TL;DR: Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported, and two novel methods based on additive corrections in the cepstral domain are proposed.

...read moreread less

Abstract: Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported. To deal with differences in noise level and spectral tilt between close-talking and desk-top microphones, two novel methods based on additive corrections in the cepstral domain are proposed. In the first algorithm, the additive correction depends on the instantaneous SNR of the signal. In the second technique, expectation-maximization techniques are used to best match the cepstral vectors of the input utterances to the ensemble of codebook entries representing a standard acoustical ambience. Use of the algorithms dramatically improves recognition accuracy when the system is tested on a microphone other than the one on which it was trained. >

...read moreread less

461 citations

Proceedings Article•DOI•

Distance measures for effective clustering of ARIMA time-series

[...]

Konstantinos Kalpakis¹, D. Gada, Vasundhara Puttagunta•Institutions (1)

University of Baltimore¹

29 Nov 2001

TL;DR: This work proposes the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure.

...read moreread less

Abstract: Much environmental and socioeconomic time-series data can be adequately modeled using autoregressive integrated moving average (ARIMA) models. We call such time series "ARIMA time series". We propose the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time series. For example, just a few LPC cepstral coefficients are sufficient in order to discriminate between time series that are modeled by different ARIMA models. In fact, this approach requires fewer coefficients than traditional approaches, such as DFT (discrete Fourier transform) and DWT (discrete wavelet transform). The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time series using the "partition around medoids" method with various similarity measures. We present experimental results demonstrating that, using the proposed measure, we achieve significantly better clusterings of ARIMA time series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT, DWT, PCA (principal component analysis), etc. Experiments were performed both on simulated and real data.

...read moreread less

445 citations

Collapse

Network Information

Performance

Metrics

3,645

Papers

60,375

Citations

No. of papers in the topic in previous years
Year	Papers
2023	86
2022	206
2021	60
2020	96
2019	135
2018	130

Cepstrum

Papers published on a yearly basis

Papers

Trending Questions (9)

Network Information

Related Topics (5)

Performance

Metrics