Multi-label Classification Models for Detection of Phonetic Features in building Acoustic Models

doi:10.1109/IJCNN.2019.8851682

Proceedings ArticleDOI

Multi-label Classification Models for Detection of Phonetic Features in building Acoustic Models

Rupam Ojha, +1 more

- pp 1-8

Chats0

TLDR

Performance improvement over other phoneme recognition studies using the phonetic features is obtained and the effectiveness of the proposed approach is demonstrated on TIMIT and Wall Street Journal corpora.

Abstract:

Acoustic modeling in large vocabulary continuous speech recognition systems is commonly done by building the models for subword units such as phonemes, syllables or senones. In recent years, various end-to-end systems using acoustic models built at grapheme or phoneme level have also been explored. These systems either require a lot of data and/or heavily rely on the use of language models or pronunciation dictionary for good recognition performance. With the intention of reducing the dependence on data or external models, we have explored the usage of phonetic features in building acoustic models for speech recognition. The phonetic features describe a sound based on the speech production mechanism in humans. Multi-label classification models are built for detection of phonetic features in a given speech signal. The detected phonetic features are used along with the acoustic features as input to models for phoneme identification. The effectiveness of the proposed approach is demonstrated on TIMIT and Wall Street Journal corpora. Performance improvement over other phoneme recognition studies using the phonetic features is obtained.

Multi-label Classification Models for Detection of Phonetic Features in building Acoustic Models

Citations

Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning

Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning

References

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

ML-KNN: A lazy learning approach to multi-label learning

Hybrid speech recognition with Deep Bidirectional LSTM

Sequence Transduction with Recurrent Neural Networks

Related Papers (5)

Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm

Speech Recognition Via Phonetically Featured Syllables

Automatic alignment of speech with phonetic transcriptions in real time

Structural representation of speech for phonetic classification

A segment-based approach to automatic language identification