Open AccessProceedings Article
An Improved Speech Segmentation Quality Measure: the R-value
Okko Räsänen,Unto K. Laine,Toomas Altosaar +2 more
- pp 1851-1854
TLDR
A new R-value quality measure is introduced that indicates how close a segmentation algorithm’s performance is to an ideal point of operation after established measures were found to be insensitive to this type of random boundary insertion.Abstract:
Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct approaches also exist, e.g., blind speech segmentation algorithms. In either case, performance of automatic speech segmentation algorithms is often measured using automated evaluation algorithms and used to optimize a segmentation system’s performance. However, evaluation approaches reported in literature were found to be lacking. Also, we have determined that increases in phone boundary location detection rates are often due to increased over-segmentation levels and not to algorithmic improvements, i.e., by simply adding random boundaries a better hit-rate can be achieved when using current quality measures. Since established measures were found to be insensitive to this type of random boundary insertion, a new R-value quality measure is introduced that indicates how close a segmentation algorithm’s performance is to an ideal point of operation.read more
Citations
More filters
Journal ArticleDOI
Multilingual processing of speech via web services
TL;DR: Five multilingual web services for speech science operational since 2012 are described and the benefits and drawbacks of the new paradigm as well as the experiences with user acceptance and implementation problems are discussed.
Book ChapterDOI
Phoneme Recognition on the TIMIT Database
Carla Lopes,Fernando Perdigão +1 more
TL;DR: Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations, but large Vocabulary ASR systems’ performance depends on the quality of the phone recognizer, so research teams continue developing phone recognizers, in order to enhance their performance as much as possible.
Journal ArticleDOI
Pre-linguistic segmentation of speech into syllable-like units
TL;DR: The present study investigates the feasibility of speech segmentation into syllable-like chunks without any a priori linguistic knowledge, and shows that the sonority fluctuation in speech is highly informative of syllable and word boundaries in all three cases without any language-specific tuning of the model.
Proceedings ArticleDOI
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.
TL;DR: In this article, a self-supervised representation learning model is proposed for unsupervised phoneme boundary detection, which is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.
Journal ArticleDOI
Phonetic segmentation of speech signal using local singularity analysis
TL;DR: A two-stage segmentation algorithm is developed that is significantly more accurate than state-of-the-art ones and convey relevant information about local dynamics of the speech signal that can be used for the task of phonetic segmentation.
References
More filters
Journal ArticleDOI
Robust speaker change detection
TL;DR: In this article, the authors present a criterion which can be used to identify speaker changes in an audio stream without such tuning, which consists of calculating the log likelihood ratio (LLR) of two models with the same number of parameters.
An HMM-based system for automatic segmentation and alignment of speech
TL;DR: A system for automatic time-aligned phone transcription of spoken Swedish has been developed using a speech recording and an orthographic transcription of the words spoken in the recording to generate a phone-level segmentation without manual intervention.
Proceedings ArticleDOI
A new text-independent method for phoneme segmentation
TL;DR: A new approach for text-independent speech segmentation based on critical-band perceptual analysis and an original algorithm for the individuation of phoneme boundaries is proposed, promising since the method gives 74% of correct segmentation without presenting over-segmentation.
Proceedings ArticleDOI
Finding Maximum Margin Segments in Speech
TL;DR: Initial analyses show that MMC is a promising method for the automatic detection of sub-phonetic information in the speech signal and is highly competitive with existing unsupervised methods for theautomatic detection of phoneme boundaries.
Proceedings ArticleDOI
"Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge
Manish Sharma,Richard J. Mammone +1 more
TL;DR: A new automaticspeech segmentation procedure, called the "Blind" speech segmentation, is presented, which involves finding the optimal number of sub- word segments in the given speech sample, before locating the sub-word segment boundaries.