scispace - formally typeset
Open AccessJournal ArticleDOI

DNN-Based Cepstral Excitation Manipulation for Speech Enhancement

TLDR
The new approach exceeds the performance of a formerly introduced classical signal processing-based cepstral excitation manipulation (CEM) method in terms of noise attenuation by about 1.5 dB and shows that this gain also holds true when comparing serial combinations of envelope and excitation enhancement.
Abstract
This contribution aims at speech model-based speech enhancement by exploiting the source-filter model of human speech production. The proposed method enhances the excitation signal in the cepstral domain by making use of a deep neural network (DNN). We investigate two types of target representations along with the significant effects of their normalization. The new approach exceeds the performance of a formerly introduced classical signal processing-based cepstral excitation manipulation (CEM) method in terms of noise attenuation by about 1.5 dB. We show that this gain also holds true when comparing serial combinations of envelope and excitation enhancement. In the important low-SNR conditions, no significant trade-off for speech component quality or speech intelligibility is induced, while allowing for substantially higher noise attenuation. In total, a traditional purely statistical state-of-the-art speech enhancement system is outperformed by more than 3 dB noise attenuation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

TL;DR: A comparative analysis of accuracies obtained in ASR with employment of classical Gaussian mixture model, support vector machine, SVM which is the machine learning algorithm and the state of art 1-D CNN as classifiers is presented and results indicate that SVM and1-D Neural network outperform GMM.
Journal ArticleDOI

Multi-scale decomposition based supervised single channel deep speech enhancement

TL;DR: A nonlinear multi-scale decomposition-based deep speech enhancement method to improve the quality and intelligibility of the contaminated speech by applying Hurst exponent-based Empirical Mode Decomposition (HEMD) to the noisy speech and obtaining a set of intrinsic mode functions (IMFs) and a residual.
Journal ArticleDOI

Improved CEM for Speech Harmonic Enhancement in Single Channel Noise Suppression

TL;DR: In this article , the authors proposed two modifications to improve the robustness and performance of CEM in low signal to noise ratio (SNR) cases, which resulted in better preservation of speech harmonics, more refined fine structure and higher interharmonic noise suppression.
Posted Content

Robust Acoustic Scene Classification in the Presence of Active Foreground Speech

TL;DR: In this article, an iVector based acoustic scene classification (ASC) system was proposed for real-life settings where active foreground speech can be present, where each recording is represented by a fixed-length iVector that models the recording's important properties.
Proceedings ArticleDOI

Improvement of Speech Residuals for Speech Enhancement

TL;DR: A deep neural network is used to enhance residual signals in the cepstral domain, thereby exceeding a former cep stral excitation manipulation approach in different ways and providing higher speech component quality in low-SNR conditions.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book

Neural networks for pattern recognition

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Book ChapterDOI

Neural Networks for Pattern Recognition

TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.
Proceedings Article

Understanding the difficulty of training deep feedforward neural networks

TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Related Papers (5)