Proceedings ArticleDOI
Robust speech/non-speech detection using LDA applied to MFCC
Arnaud Martin,Delphine Charlet,L. Mauuary +2 more
- Vol. 1, pp 237-240
TLDR
In this article, a method for speech/non-speech detection using a linear discriminant analysis (LDA) applied to mel frequency cepstrum coefficients (MFCC) is presented.Abstract:
In speech recognition, speech/non-speech detection must be robust to,noise. In the paper, a method for speech/non-speech detection using a linear discriminant analysis (LDA) applied to mel frequency cepstrum coefficients (MFCC) is presented. The energy is the most discriminant parameter between noise and speech. But with this single parameter, the speech/non-speech detection system detects too many noise segments. The LDA applied to MFCC and the associated test reduces the detection of noise segments. This new algorithm is compared to the one based on signal to noise ratio (Mauuary and Monne, 1993).read more
Citations
More filters
Proceedings Article
Mahimahi: accurate record-and-replay for HTTP
Ravi Netravali,Anirudh Sivaraman,Somak Das,Ameesh Goyal,Keith Winstein,James Mickens,Hari Balakrishnan +6 more
TL;DR: Mahimahi is a framework to record traffic from HTTP-based applications, and later replay it under emulated network conditions, designed as a set of composable shells, providing ease-of-use and extensibility.
Proceedings Article
Wishbone: profile-based partitioning for sensornet applications
TL;DR: Wishbone is a system that takes a dataflow graph of operators and produces an optimal partitioning, which shows that the system can quickly identify good trade-offs given limitations in CPU and network capacity.
Proceedings Article
Robust Speech/Non-Speech Detection using LDA applied to MFCC for Continuous Speech Recognition
TL;DR: A method for speech/non-speech detection using a linear discriminant analysis (LDA) applied to mel frequency cepstrum coefficients (MFCC) is presented, which reduces the detection of noise segments.
Patent
Voice activity detection system and method
TL;DR: In this paper, a set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames by applying at least one weighting factor to each feature vector.
Proceedings ArticleDOI
Real-time speaker identification.
TL;DR: The number of test vectors is reduced by pre-quantizing the test sequence prior to matching, and the number of speakers are reduced by pruning out unlikely speakers during the identification process by optimizing vector quantization (VQ) based speaker identification.
References
More filters
Proceedings ArticleDOI
A novel approach to robust speech endpoint detection in car environments
Liang-Sheng Huang,Chung-Ho Yang +1 more
TL;DR: A novel approach is proposed that finds robust features for endpoint detection in a noisy in-car environment by integrating both the widely used energy and entropy to form a new feature that possesses advantages of each individual while compensating for the drawback of each other.
Proceedings ArticleDOI
Speech/non-speech classification using multiple features for robust endpoint detection
TL;DR: A new speech/non-speech classification method that improves the endpoint detection performance for speech recognition in noisy environments and the classification and regression tree (CART) technique is applied to effectively combine these multiple features for classification of each frame.
Journal ArticleDOI
Towards improving ASR robustness for PSN and GSM telephone applications
Chafic Mokbel,Laurent Mauuary,Lamia Karray,Denis Jouvet,Jean Monne,Jacques Simonin,Katarina Bartkova +6 more
TL;DR: The results obtained prove that HMM adaptation and preprocessing techniques can be advantageously combined to improve Automatic Speech Recognition (ASR) robustness and show that spectral subtraction improves speech detection under noisy GSM conditions.
Journal ArticleDOI
Evaluation of a statistical approach to voiced-unvoiced-silence analysis for telephone-quality speech
TL;DR: An investigation was undertaken to determine a suitable set of parameters that would provide a reliable voiced-unvoiced-silence decision across a variety of standard telephone connections, and the use of the Itakura two-pole spectral normalization was investigated to see its effect on the error scores.
Proceedings ArticleDOI
Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
Koji Iwano,Keikichi Hirose +1 more
TL;DR: A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F/sub 0/) contours, formerly proposed by the authors.