scispace - formally typeset
Open AccessPosted Content

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

TLDR
This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.
Abstract
— Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

read more

Citations
More filters
Journal ArticleDOI

Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring

TL;DR: It is shown that terrestrial and marine PAM applications are advancing rapidly, driven by emerging sensor hardware, the application of machine learning inno-vations to automated wildlife call identification, and work towards developing acoustic biodiversity indicators.
Proceedings Article

Commandersong: a systematic approach for practical adversarial voice recognition

TL;DR: Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.

Cocaine noodles: exploiting the gap between human and machine speech recognition

TL;DR: It is found that differences in how humans and machines understand spoken speech can be easily exploited by an adversary to produce sound which is intelligible as a command to a computer speech recognition system but is not easily understandable by humans.
Journal ArticleDOI

De-identification for privacy protection in multimedia content

TL;DR: The study provides an overview of de-identification approaches for non-biometric identifiers (text, hairstyle, dressing style, license plates), as well as for the physiological, behavioural and soft biometric identifiers in multimedia documents.
Journal ArticleDOI

Indoor Localization Improved by Spatial Context—A Survey

TL;DR: This survey gives a comprehensive review of state-of-the-art indoor localization methods and localization improvement methods using maps, spatial models, and landmarks.
References
More filters
Journal ArticleDOI

Dynamic programming algorithm optimization for spoken word recognition

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
Proceedings ArticleDOI

Word image matching using dynamic time warping

TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space

TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity that uses a multilevel approach that recursively projects a solution from a coarse resolution and refines the projected solution.
Journal ArticleDOI

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

TL;DR: The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement and the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS.
Book

Signal and Linear System Analysis

TL;DR: Preliminary concepts: Signal and system Characteristics and Models Convolution Continuous-Time Signals and Systems Continuous Time Signals Continuous Time Signal Spectra Time-Domain Analysis of Discrete-Time Systems Spectral Analysis of Continuous Time Systems Analysis of continuous-time Series Using the Laplace Transform Continuous Time Filters State Variable Concepts for Discrete Time Linear Systems Discrete time Signal and Systems: Discretetime Signals Discrete -Time Signal Spectras Time Domain Analysis of DTLS Spectral analysis of Discreet-Time System Spectral as mentioned in this paper.
Related Papers (5)