Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

Open AccessPosted Content

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

- 22 Mar 2010 -

TLDR

This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.

Abstract:

— Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring

Rory Gibb, +4 more

- 01 Feb 2019 -

Methods in Ecology and Evolution

TL;DR: It is shown that terrestrial and marine PAM applications are advancing rapidly, driven by emerging sensor hardware, the application of machine learning inno-vations to automated wildlife call identification, and work towards developing acoustic biodiversity indicators.

...read moreread less

Proceedings Article

Commandersong: a systematic approach for practical adversarial voice recognition

Xuejing Yuan, +9 more

TL;DR: Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.

...read moreread less

Cocaine noodles: exploiting the gap between human and machine speech recognition

Tavish Vaidya, +3 more

TL;DR: It is found that differences in how humans and machines understand spoken speech can be easily exploited by an adversary to produce sound which is intelligible as a command to a computer speech recognition system but is not easily understandable by humans.

...read moreread less

Journal ArticleDOI

De-identification for privacy protection in multimedia content

Slobodan Ribarić, +2 more

- 01 Sep 2016 -

Signal Processing-image Communication

TL;DR: The study provides an overview of de-identification approaches for non-biometric identifiers (text, hairstyle, dressing style, license plates), as well as for the physiological, behavioural and soft biometric identifiers in multimedia documents.

...read moreread less

Journal ArticleDOI

Indoor Localization Improved by Spatial Context—A Survey

Fuqiang Gu, +6 more

- 03 Jul 2019 -

ACM Computing Surveys

TL;DR: This survey gives a comprehensive review of state-of-the-art indoor localization methods and localization improvement methods using maps, spatial models, and landmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Dynamic programming algorithm optimization for spoken word recognition

H. Sakoe, +1 more

- 01 Feb 1978 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

...read moreread less

Proceedings ArticleDOI

Word image matching using dynamic time warping

Toni M. Rath, +1 more

TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.

...read moreread less

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space

Stan Salvador, +1 more

TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity that uses a multilevel approach that recursively projects a solution from a coarse resolution and refines the projected solution.

...read moreread less

Journal ArticleDOI

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Philip Lockwood, +1 more

TL;DR: The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement and the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS.

...read moreread less

Book

Signal and Linear System Analysis

Gordon E. Carlson

TL;DR: Preliminary concepts: Signal and system Characteristics and Models Convolution Continuous-Time Signals and Systems Continuous Time Signals Continuous Time Signal Spectra Time-Domain Analysis of Discrete-Time Systems Spectral Analysis of Continuous Time Systems Analysis of continuous-time Series Using the Laplace Transform Continuous Time Filters State Variable Concepts for Discrete Time Linear Systems Discrete time Signal and Systems: Discretetime Signals Discrete -Time Signal Spectras Time Domain Analysis of DTLS Spectral analysis of Discreet-Time System Spectral as mentioned in this paper.

...read moreread less

Related Papers (5)

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

Dynamic programming algorithm optimization for spoken word recognition

H. Sakoe, +1 more

- 01 Feb 1978 -

IEEE Transactions on Acoustics, Speech, ...

IEEE Transactions on Speech and Audio Pr...

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

Citations

Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring

Commandersong: a systematic approach for practical adversarial voice recognition

Cocaine noodles: exploiting the gap between human and machine speech recognition

De-identification for privacy protection in multimedia content

Indoor Localization Improved by Spatial Context—A Survey

References

Dynamic programming algorithm optimization for spoken word recognition

Word image matching using dynamic time warping

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Signal and Linear System Analysis

Related Papers (5)

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Dynamic programming algorithm optimization for spoken word recognition

Fundamentals of speech recognition

A tutorial on hidden Markov models and selected applications in speech recognition

Robust text-independent speaker identification using Gaussian mixture speaker models