scispace - formally typeset
Search or ask a question
Author

Teva Merlin

Bio: Teva Merlin is an academic researcher from University of Avignon. The author has contributed to research in topics: Speaker recognition & Speaker diarisation. The author has an hindex of 12, co-authored 22 publications receiving 1617 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: An introduction proposes a modular scheme of the training and test phases of a speaker verification system, and the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed.
Abstract: This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

874 citations

01 Jan 2010
TL;DR: An open-source diarization toolkit which is mostly dedicated to speaker and developed by the LIUM is presented, which includes hierarchical agglomerative clustering methods using well-known measures such as BIC and CLR.
Abstract: This paper presents an open-source diarization toolkit which is mostly dedicated to speaker and developed by the LIUM This toolkit includes hierarchical agglomerative clustering methods using well-known measures such as BIC and CLR Two applications for which the toolkit has been used are presented: one is for broadcast news using the ESTER 2 data and the other is for telephone conversations using the MEDIA corpus

190 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: This paper presents the LIUM open-source speaker diarization toolbox, mostly dedicated to broadcast news, which includes both Hierarchical Agglomerative Clustering using well-known measures such as BIC and CLR, and the new ILP clustering algorithm using i-vectors.
Abstract: This paper presents the LIUM open-source speaker diarization toolbox, mostly dedicated to broadcast news. This tool includes both Hierarchical Agglomerative Clustering using well-known measures such as BIC and CLR, and the new ILP clustering algorithm using i-vectors. Diarization systems are tested on the French evaluation data from ESTER, ETAPE and REPERE campaigns.

162 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A speaker tracking system is built by using successively a speaker change detector and a speaker verification system to find in a conversation between several persons target speakers chosen in a set of enrolled users.
Abstract: A speaker tracking system (STS) is built by using successively a speaker change detector and a speaker verification system. The aim of the STS is to find in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a first step, speech is segmented into homogeneous segments containing only one speaker, without any use of a priori knowledge about speakers. Then, the resulting segments are checked to belong to one of the target speakers. The system has been used in a NIST evaluation test with satisfactory results.

81 citations

Proceedings ArticleDOI
04 Sep 2005
TL;DR: This paper presents the system used by the LIUM to participate in ESTER, the french broadcast news evaluation campaign, based on the CMU Sphinx 3.3 (fast) decoder, with very few modifications and a simple MAP acoustic model estimation.
Abstract: This paper presents the system used by the LIUM to participate in ESTER, the french broadcast news evaluation campaign. This system is based on the CMU Sphinx 3.3 (fast) decoder. Some tools are presented which have been added on different steps of the Sphinx recognition process: segmentation, acoustic model adaptation, word-lattice rescoring. Several experiments have been conducted on studying the effects of the signal segmentation on the recognition process, on injecting automatically transcribed data into training corpora, or on testing different approaches for acoustic model adaptation. The results are presented in this paper. With very few modifications and a simple MAP acoustic model estimation, Sphinx3.3 decoder reached a word error rate of 28.2%. The entire system developed by LIUM obtained 23.6% as official word error rate for the ESTER evaluation, and 23.4% as result of an unsubmited system.

77 citations


Cited by
More filters
Proceedings ArticleDOI
19 Apr 2015
TL;DR: It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.
Abstract: This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

4,770 citations

Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations

Journal ArticleDOI
TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

1,433 citations

Journal ArticleDOI
TL;DR: An introduction proposes a modular scheme of the training and test phases of a speaker verification system, and the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed.
Abstract: This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

874 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.
Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

694 citations