scispace - formally typeset
Search or ask a question
Author

A. M. Natarajan

Bio: A. M. Natarajan is an academic researcher from Bannari Amman Institute of Technology, Sathy. The author has contributed to research in topics: Syllable & Language model. The author has an hindex of 4, co-authored 4 publications receiving 94 citations.

Papers
More filters
Journal Article
TL;DR: A Hidden Markov Model (HMM) based word and triphone acoustic models for medium and large vocabulary continuous speech recognizers for Tamil language are attempted.
Abstract: Building a continuous speech recognizer for the Indian language like Tamil is a challenging task due to the unique inherent features of the language like long and short vowels, lack of aspirated stops, aspirated consonants and many instances of allophones. Stress and accent vary in spoken Tamil language from region to region. But in formal read Tamil speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Like other Indian languages, Tamil is also syllabic in nature. Pronunciation of words and sentences is strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Tamil for small and restricted tasks. However medium and large vocabulary CSR for Tamil is relatively new and not explored. In this paper, the authors have attempted to build a Hidden Markov Model (HMM) based word and triphone acoustic models. The objective of this research is to build a small vocabulary word based and a medium vocabulary triphone based continuous speech recognizers for Tamil language. In this experimentation, a word based Context Independent (CI) acoustic model for 371 unique words and a triphone based Context Dependent (CD) acoustic model for 1700 unique words have been built. In addition to the acoustic models a pronunciation dictionary with 44 base phones and trigram based statistical language model have also been built as integral components of the linguist. These recognizers give very good word accuracy for trained and test sentences read by trained and new speakers.

55 citations

Journal ArticleDOI
TL;DR: In this article, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed and an algorithm based on prosodic syllable is proposed and two experiments have been conducted.
Abstract: In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.

28 citations

01 Jan 2009
TL;DR: A robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.
Abstract: Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers which are based on commonly used features like mel-frequency cepstral co-efficient (MFCC) and linear predictive coding (LPC). It is well understood that all previous auditory based feature extraction methods perform extremely well in terms of robustness due to the dominantfrequency information present in them. But these methods suffer from high computational cost. Another method called sub-band spectral centroid histograms (SSCH) integrates dominant-frequency information with sub-band power information. This method is based on sub-band spectral centroids (SSC) which are closely related to spectral peaks for both clean and noisy speech. Since SSC can be computed efficiently from short-term speech power spectrum estimate, SSCH method is quite robust to background additive noise at a lower computational cost. It has been noted that MFCC method outperforms SSCH method in the case of clean speech. However in the case of speech with additive noise, MFCC method degrades substantially. In this paper, both MFCC and SSCH feature extraction have been implemented in Carnegie Melon University (CMU) Sphinx 4.0 and trained and tested on AN4 database for clean and noisy speech. Finally, a robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.

7 citations

Journal ArticleDOI
TL;DR: To add the structural component, balance the vocabulary size and meet the challenging features, lexicalized and statistical parsing (LSP) is to be employed with the assimilation and semantic coverage models.
Abstract: Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of natural language sentences confined to the grammar. Parsing models need syntax and semantic coverage for better interpretation of natural language sentences. Though statistical parsing with trigram language models gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like lack of support in syntax, free ordering of words and long distance relationship which are the challenging features of the Tamil language. Grammar based structural parsing provides solutions to some extent. To overcome these disadvantages, structural component is to be involved in statistical approach which results in hybrid models like phrase and dependency models. To add the structural component, balance the vocabulary size and meet the challenging features, lexicalized and statistical parsing (LSP) is to be employed with the assistance of hybrid models. To incorporate all the features in complex and large sentences, phrase structure model may not be suitable to a larger extent. When dependency relations are applied among words, direct relationships can be established. Lexicalized and statistical parsing of natural language text in Tamil language using dependency model will give better performance than using phrase structure model. New part of speech (POS) and dependency tag sets for Tamil language have been Treebank has been developed with 326 sentences which comprises more than 5000 words with manual annotation. It has been extended to 1000 sentences using bootstrapping and manual correction and used to train the dependency model. This LSP with dependency model provides better results and covers all the features of Tamil language.

6 citations


Cited by
More filters
Proceedings ArticleDOI
01 Aug 1998
TL;DR: The authors presented a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies, which is useful for automatic speech recognition.
Abstract: The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation and operates in a left-to-right manner --- therefore usable for automatic speech recognition. The model, its probabilistic parameterization, and a set of experiments meant to evaluate its predictive power are presented; an improvement over standard trigram modeling is achieved.

177 citations

Journal ArticleDOI
TL;DR: Bidirectional recurrent neural network (BRNN) with self-organizing map (SOM)-based classification scheme is suggested for Tamil speech recognition and demonstrates that the suggested conspire accomplished preferable outcomes looked at over exist deep neural network–hidden Markov model algorithm regarding signal-to-noise ratio, classification accuracy, and mean square error.
Abstract: Speech recognition is one of the entrancing fields in the zone of computer science. Exactness of speech recognition framework may decrease because of the nearness of noise exhibited by the speech signal. Consequently, noise removal is a fundamental advance in automatic speech recognition (ASR) system. ASR is researched for various languages in light of the fact that every language has its particular highlights. Particularly, the requirement for ASR framework in Tamil language has been expanded broadly over the most recent couple of years. In this work, bidirectional recurrent neural network (BRNN) with self-organizing map (SOM)-based classification scheme is suggested for Tamil speech recognition. At first, the input speech signal is pre-prepared by utilizing Savitzky–Golay filter keeping in mind the end goal to evacuate the background noise and to improve the signal. At that point, Multivariate Autoregressive based highlights by presenting discrete cosine transformation piece to give a proficient signal investigation. And in addition, perceptual linear predictive coefficients likewise separated to enhance the classification accuracy. The feature vector is shifted in measure, for picking the right length of feature vector SOM utilized. At long last, Tamil digits and words are ordered by utilizing BRNN classifier where the settled length feature vector from SOM is given as input, named as BRNN-SOM. The experimental analysis demonstrates that the suggested conspire accomplished preferable outcomes looked at over exist deep neural network–hidden Markov model algorithm regarding signal-to-noise ratio, classification accuracy, and mean square error.

115 citations

Journal ArticleDOI
TL;DR: The proposed VOP detection method has shown significant improvement in the performance compared to the existing method under clean as well as coded cases and is analyzed in CV recognition by using VOP as an anchor point.
Abstract: In this paper, we propose a method for detecting the vowel onset points (VOPs) for low bit rate coded speech. VOP is the instant at which the onset of the vowel takes place in the speech signal. VOP plays an important role for the applications, such as consonant-vowel (CV) unit recognition and speech rate modification. The proposed VOP detection method is based on the spectral energy present in the glottal closure region of the speech signal. Speech coders considered to carry out this study are Global System for Mobile Communications (GSM) full rate, code-excited linear prediction (CELP), and mixed-excitation linear prediction (MELP). TIMIT database and CV units collected from the broadcast news corpus are used for evaluation. Performance of the proposed method is compared with existing methods, which uses the combination of evidence from the excitation source, spectral peaks energy, and modulation spectrum. The proposed VOP detection method has shown significant improvement in the performance compared to the existing method under clean as well as coded cases. The effectiveness of the proposed VOP detection method is analyzed in CV recognition by using VOP as an anchor point.

72 citations

Journal ArticleDOI
TL;DR: An effort has been made to highlight the progress made so far for ASRs of different languages and the technological perspective of automatic speech recognition in countries like China, Russian, Portuguese, Spain, Saudi Arab, Vietnam, Japan, UK, SriLanka, Philippines, Algeria and India.
Abstract: Automatic speech recognition, which was considered to be a concept of science fiction and which has been hit by number of performance degrading factors, is now an important part of information and communication technology. Improvements in the fundamental approaches and development of new approaches by researchers have lead to the advancement of ASRs which were just responding to a set of sounds to sophisticated ASRs which responds to fluently spoken natural language. Using artificial neural networks (ANNs), mathematical models of the low-level circuits in the human brain, to improve speech-recognition performance, through a model known as the ANN-Hidden Markov Model (ANNHMM) have shown promise for large-vocabulary speech recognition systems. Achieving higher Recognition accuracy, low Word error rate, developing speech corpus depending upon the nature of language and addressing the issues of sources of variability through approaches like Missing Data Techniques & Convolutive Non-Negative Matrix Factorization, are the major considerations for developing an efficient ASR. In this paper, an effort has been made to highlight the progress made so far for ASRs of different languages and the technological perspective of automatic speech recognition in countries like China, Russian, Portuguese, Spain, Saudi Arab, Vietnam, Japan, UK, SriLanka, Philippines, Algeria and India.

65 citations

Journal Article
TL;DR: This ebooks is under topic such as teacher, state hospitals (speech development and correction) hearing and speech development children's minnesota speech and language developmental milestones speech/language development, delay and disorders speech andlanguage developmental milestones delayed speech or language development nemours typical development of speech.
Abstract: The best ebooks about The Development Of Speech that you can get for free here by download this The Development Of Speech and save to your desktop. This ebooks is under topic such as teacher, state hospitals (speech development and correction) hearing and speech development children's minnesota speech and language developmental milestones speech/language development, delay and disorders speech and language developmental milestones delayed speech or language development nemours typical development of speech frequently asked questions (faqs) about speech and language development in children hmc computer science hearing and understanding talking pediatric dentistry the evolution of human speech brown university developmental stages of infants and children early language development: birth t0 12 months super duper speech and language delay in children early speech-language teachmetotalk early morphological development center for speech and speech development related to cleft palate hearing and speech developmental milestones maryland speech/language developmental history form development of the speech (sir) test for hearing speech & language development dsrf phonology development chart st rita school for the deaf speech and language developmental lynchburg college screening for speech and language delay in preschool the physiologic development of speech motor control: lip speech and language developmental milestones alexius speech sound development chart mommyspeechtherapy does motor development influence language development? child narrative development center for speech and age guide for feeding, speech, and mouth development speech development in monozygotic and dizygotic twins with speech contest judges training toastmasters international evaluation and management of the child with speech delay speech pathologist i departments mental health and tips for encouraging speech an introduction to and speech development in children with autism spectrum disorders pe2036 speech and language development in infants speech and language developmental milestones speech and language lexical and grammatical development the effects of premature birth on language development

54 citations