scispace - formally typeset
Open AccessProceedings Article

An Improved Speech Segmentation Quality Measure: the R-value

TLDR
A new R-value quality measure is introduced that indicates how close a segmentation algorithm’s performance is to an ideal point of operation after established measures were found to be insensitive to this type of random boundary insertion.
Abstract
Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct approaches also exist, e.g., blind speech segmentation algorithms. In either case, performance of automatic speech segmentation algorithms is often measured using automated evaluation algorithms and used to optimize a segmentation system’s performance. However, evaluation approaches reported in literature were found to be lacking. Also, we have determined that increases in phone boundary location detection rates are often due to increased over-segmentation levels and not to algorithmic improvements, i.e., by simply adding random boundaries a better hit-rate can be achieved when using current quality measures. Since established measures were found to be insensitive to this type of random boundary insertion, a new R-value quality measure is introduced that indicates how close a segmentation algorithm’s performance is to an ideal point of operation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Multilingual processing of speech via web services

TL;DR: Five multilingual web services for speech science operational since 2012 are described and the benefits and drawbacks of the new paradigm as well as the experiences with user acceptance and implementation problems are discussed.
Book ChapterDOI

Phoneme Recognition on the TIMIT Database

TL;DR: Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations, but large Vocabulary ASR systems’ performance depends on the quality of the phone recognizer, so research teams continue developing phone recognizers, in order to enhance their performance as much as possible.
Journal ArticleDOI

Pre-linguistic segmentation of speech into syllable-like units

TL;DR: The present study investigates the feasibility of speech segmentation into syllable-like chunks without any a priori linguistic knowledge, and shows that the sonority fluctuation in speech is highly informative of syllable and word boundaries in all three cases without any language-specific tuning of the model.
Proceedings ArticleDOI

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

TL;DR: In this article, a self-supervised representation learning model is proposed for unsupervised phoneme boundary detection, which is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.
Journal ArticleDOI

Phonetic segmentation of speech signal using local singularity analysis

TL;DR: A two-stage segmentation algorithm is developed that is significantly more accurate than state-of-the-art ones and convey relevant information about local dynamics of the speech signal that can be used for the task of phonetic segmentation.
References
More filters
Journal ArticleDOI

Robust speaker change detection

TL;DR: In this article, the authors present a criterion which can be used to identify speaker changes in an audio stream without such tuning, which consists of calculating the log likelihood ratio (LLR) of two models with the same number of parameters.

An HMM-based system for automatic segmentation and alignment of speech

TL;DR: A system for automatic time-aligned phone transcription of spoken Swedish has been developed using a speech recording and an orthographic transcription of the words spoken in the recording to generate a phone-level segmentation without manual intervention.
Proceedings ArticleDOI

A new text-independent method for phoneme segmentation

TL;DR: A new approach for text-independent speech segmentation based on critical-band perceptual analysis and an original algorithm for the individuation of phoneme boundaries is proposed, promising since the method gives 74% of correct segmentation without presenting over-segmentation.
Proceedings ArticleDOI

Finding Maximum Margin Segments in Speech

TL;DR: Initial analyses show that MMC is a promising method for the automatic detection of sub-phonetic information in the speech signal and is highly competitive with existing unsupervised methods for theautomatic detection of phoneme boundaries.
Proceedings ArticleDOI

"Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge

TL;DR: A new automaticspeech segmentation procedure, called the "Blind" speech segmentation, is presented, which involves finding the optimal number of sub- word segments in the given speech sample, before locating the sub-word segment boundaries.