Open AccessPosted Content
Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques
TLDR
This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.Abstract:
— Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patternsread more
Citations
More filters
Journal ArticleDOI
Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring
TL;DR: It is shown that terrestrial and marine PAM applications are advancing rapidly, driven by emerging sensor hardware, the application of machine learning inno-vations to automated wildlife call identification, and work towards developing acoustic biodiversity indicators.
Proceedings Article
Commandersong: a systematic approach for practical adversarial voice recognition
Xuejing Yuan,Yuxuan Chen,Yue Zhao,Yunhui Long,Xiaokang Liu,Kai Chen,Shengzhi Zhang,Heqing Huang,XiaoFeng Wang,Carl A. Gunter +9 more
TL;DR: Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.
Cocaine noodles: exploiting the gap between human and machine speech recognition
TL;DR: It is found that differences in how humans and machines understand spoken speech can be easily exploited by an adversary to produce sound which is intelligible as a command to a computer speech recognition system but is not easily understandable by humans.
Journal ArticleDOI
De-identification for privacy protection in multimedia content
TL;DR: The study provides an overview of de-identification approaches for non-biometric identifiers (text, hairstyle, dressing style, license plates), as well as for the physiological, behavioural and soft biometric identifiers in multimedia documents.
Journal ArticleDOI
Indoor Localization Improved by Spatial Context—A Survey
Fuqiang Gu,Xuke Hu,Milad Ramezani,Debaditya Acharya,Kourosh Khoshelham,Shahrokh Valaee,Jianga Shang +6 more
TL;DR: This survey gives a comprehensive review of state-of-the-art indoor localization methods and localization improvement methods using maps, spatial models, and landmarks.
References
More filters
Journal ArticleDOI
Dynamic programming algorithm optimization for spoken word recognition
TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
Proceedings ArticleDOI
Word image matching using dynamic time warping
Toni M. Rath,R. Manmatha +1 more
TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Stan Salvador,Philip K. Chan +1 more
TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity that uses a multilevel approach that recursively projects a solution from a coarse resolution and refines the projected solution.
Journal ArticleDOI
Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars
Philip Lockwood,J. Boudy +1 more
TL;DR: The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement and the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS.
Book
Signal and Linear System Analysis
TL;DR: Preliminary concepts: Signal and system Characteristics and Models Convolution Continuous-Time Signals and Systems Continuous Time Signals Continuous Time Signal Spectra Time-Domain Analysis of Discrete-Time Systems Spectral Analysis of Continuous Time Systems Analysis of continuous-time Series Using the Laplace Transform Continuous Time Filters State Variable Concepts for Discrete Time Linear Systems Discrete time Signal and Systems: Discretetime Signals Discrete -Time Signal Spectras Time Domain Analysis of DTLS Spectral analysis of Discreet-Time System Spectral as mentioned in this paper.
Related Papers (5)
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more