Book ChapterDOI
The ICSI RT07s Speaker Diarization System
Chuck Wooters,Marijn Huijbregts +1 more
- pp 509-519
Reads0
Chats0
TLDR
The ICSI speaker diarization system as mentioned in this paper automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers, using standard speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion.Abstract:
In this paper, we present the ICSI speaker diarization system. This system was used in the 2007 National Institute of Standards and Technology (NIST) Rich Transcription evaluation. The ICSI system automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers. Our system uses "standard" speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion. However, we have developed the system with an eye towards robustness and ease of portability. Thus we have avoided the use of any sort of model that requires training on "outside" data and we have attempted to develop algorithms that require as little tuning as possible.
The system is simular to last year's system [1] except for three aspects. We used the most recent available version of the beam-forming toolkit, we implemented a new speech/non-speech detector that does not require models trained on meeting data and we performed our development on a much larger set of recordings.read more
Citations
More filters
One-vector representations of stochastic signals for pattern recognition
Thomas S. Huang,Hao Tang +1 more
TL;DR: A new maximum likelihood learning algorithm for HMMs and a novel one-vector representation of stochastic signals based on adapted ergodic hidden Markov models (HMMs and adapted left-to-right HMMs), which are referred to as the boosting Baum-Welch algorithm.
Proceedings ArticleDOI
Signature Cluster Model Selection for Incremental Gaussian Mixture Cluster Modeling in Agglomerative Hierarchical Speaker Clustering
TL;DR: To minimize contamination in cluster models by heterogeneous data, select and keep updating a representative (or signature) model for each cluster during AHSC based on incremental Gaussian mixture models.
Journal ArticleDOI
Unsupervised deep feature embeddings for speaker diarization
Rehan Ahmad,Syed M. Zubair +1 more
TL;DR: This paper proposes to learn a set of high-level feature representations, referred to as feature embeddings, from an unsupervised deep architecture for speaker diarization, which are learned through a deep autoencoder model when trained on mel-frequency cepstral coefficients of input speech frames.
Proceedings ArticleDOI
Low-latency meeting recognition and understanding using distant microphones
Shoko Araki,Takaaki Hori,Takuya Yoshioka,Masakiyo Fujimoto,Shinji Watanabe,Takanobu Oba,Atsunori Ogawa,Kazuhiro Otsuka,Dan Mikami,Marc Delcroix,Keisuke Kinoshita,Tomohiro Nakatani,Atsushi Nakamura,Junji Yamato +13 more
TL;DR: This demonstration presents a real-time meeting analyzer for group meetings that automatically recognizes “who speaks what to whom and when” in an online manner by using the audio and visual information captured by a microphone array and an omni-directional camera at the center of a table.
Proceedings ArticleDOI
An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives.
TL;DR: In this article, a speaker re-diarization scheme was proposed to improve speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. But the scheme is limited to the SAIVT-BNEWS corpus of Australian broadcast data.
References
More filters
Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion
TL;DR: The segmentation algorithm can successfully detect acoustic changes; the clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.
Proceedings ArticleDOI
A robust speaker clustering algorithm
Jitendra Ajmera,Chuck Wooters +1 more
TL;DR: The algorithm automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers and has the following advantages: no threshold adjustment requirements; no need for training/development data; and robustness to different data conditions.
Proceedings ArticleDOI
Approaches and applications of audio diarization
TL;DR: An overview of current audio diarization approaches is provided and performance and potential applications are discussed, as well as the performance of current systems as measured in the DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarized evaluation.
Journal ArticleDOI
Robust speaker change detection
TL;DR: In this article, the authors present a criterion which can be used to identify speaker changes in an audio stream without such tuning, which consists of calculating the log likelihood ratio (LLR) of two models with the same number of parameters.