Improving Phoneme segmentation with Recurrent Neural Networks.

Open AccessPosted Content

Improving Phoneme segmentation with Recurrent Neural Networks.

- 01 Aug 2016 -

TLDR

This work proposes a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network that tries to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error.

Abstract:

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

Felix Kreuk, +2 more

TL;DR: In this article, a self-supervised representation learning model is proposed for unsupervised phoneme boundary detection, which is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.

...read moreread less

Posted Content

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

Yu-Hsuan Wang, +2 more

- 22 Mar 2017 -

arXiv: Sound

TL;DR: The temporal structure of gate activation signals inside the gated recurrent neural networks is highly correlated with the phoneme boundaries, and this correlation is further verified by a set of experiments for phoneme segmentation.

...read moreread less

Posted Content

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

Herman Kamper, +1 more

- 14 Dec 2020 -

arXiv: Computation and Language

TL;DR: This work constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.

...read moreread less

Posted Content

Phoneme Boundary Detection using Learnable Segmental Features.

Felix Kreuk, +3 more

- 11 Feb 2020 -

arXiv: Audio and Speech Processing

TL;DR: The authors proposed a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection, which achieved state-of-the-art performance in terms of F1 and R-value.

...read moreread less

Proceedings ArticleDOI

Attacking the problem of continuous speech segmentation into basic units

Ilya Andreev, +3 more

TL;DR: The paper considers the algorithm of continuous speech segmentation into basic units, namely phonemes, certain combination of phoneme and pauses, based on speech signal transformation into a two-dimensional image, i.e. an autocorrelation portrait.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Journal ArticleDOI

Finding Structure in Time

Jeffrey L. Elman

- 01 Mar 1990 -

Cognitive Science

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.

...read moreread less

Proceedings ArticleDOI

k-means++: the advantages of careful seeding

David Arthur, +1 more

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

...read moreread less

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Related Papers (5)

Blind phoneme segmentation with temporal prediction errors

Paul Michel, +3 more

- 01 Aug 2016 -

arXiv: Computation and Language

Improving Phoneme segmentation with Recurrent Neural Networks.

Citations

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

Phoneme Boundary Detection using Learnable Segmental Features.

Attacking the problem of continuous speech segmentation into basic units

References

Long short-term memory

Dropout: a simple way to prevent neural networks from overfitting

Finding Structure in Time

k-means++: the advantages of careful seeding

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Related Papers (5)

Blind phoneme segmentation with temporal prediction errors

Phone sequence modeling with recurrent neural networks

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

Global optimization of a neural network-hidden Markov model hybrid

A discriminative neural prediction system for speech recognition