Proceedings ArticleDOI
Multi-label Classification Models for Detection of Phonetic Features in building Acoustic Models
Rupam Ojha,C. Chandra Sekhar +1 more
- pp 1-8
Reads0
Chats0
TLDR
Performance improvement over other phoneme recognition studies using the phonetic features is obtained and the effectiveness of the proposed approach is demonstrated on TIMIT and Wall Street Journal corpora.Abstract:
Acoustic modeling in large vocabulary continuous speech recognition systems is commonly done by building the models for subword units such as phonemes, syllables or senones. In recent years, various end-to-end systems using acoustic models built at grapheme or phoneme level have also been explored. These systems either require a lot of data and/or heavily rely on the use of language models or pronunciation dictionary for good recognition performance. With the intention of reducing the dependence on data or external models, we have explored the usage of phonetic features in building acoustic models for speech recognition. The phonetic features describe a sound based on the speech production mechanism in humans. Multi-label classification models are built for detection of phonetic features in a given speech signal. The detected phonetic features are used along with the acoustic features as input to models for phoneme identification. The effectiveness of the proposed approach is demonstrated on TIMIT and Wall Street Journal corpora. Performance improvement over other phoneme recognition studies using the phonetic features is obtained.read more
Citations
More filters
Book ChapterDOI
Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning
Sunakshi Mehra,Seba Susan +1 more
TL;DR: An unsupervised approach for correcting highly imperfect speech transcriptions based on a decision-level fusion of stemming and two-way phoneme pruning is introduced that led to an improvement of word recognition rate upto 32.96%.
Book ChapterDOI
Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning
Sunakshi Mehra,Seba Susan +1 more
TL;DR: In this article, a decision-level fusion of stemming and two-way phoneme pruning is proposed for correcting highly imperfect speech transcriptions based on a decision level fusion of combining stemming and phoneme extraction.
References
More filters
Proceedings ArticleDOI
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Journal ArticleDOI
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Journal ArticleDOI
ML-KNN: A lazy learning approach to multi-label learning
Min-Ling Zhang,Zhi-Hua Zhou +1 more
TL;DR: Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi- label learning algorithms.
Proceedings ArticleDOI
Hybrid speech recognition with Deep Bidirectional LSTM
TL;DR: The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy.
Posted Content
Sequence Transduction with Recurrent Neural Networks
TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.