scispace - formally typeset
Book ChapterDOI

The ICSI RT07s Speaker Diarization System

Reads0
Chats0
TLDR
The ICSI speaker diarization system as mentioned in this paper automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers, using standard speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion.
Abstract
In this paper, we present the ICSI speaker diarization system. This system was used in the 2007 National Institute of Standards and Technology (NIST) Rich Transcription evaluation. The ICSI system automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers. Our system uses "standard" speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion. However, we have developed the system with an eye towards robustness and ease of portability. Thus we have avoided the use of any sort of model that requires training on "outside" data and we have attempted to develop algorithms that require as little tuning as possible. The system is simular to last year's system [1] except for three aspects. We used the most recent available version of the beam-forming toolkit, we implemented a new speech/non-speech detector that does not require models trained on meeting data and we performed our development on a much larger set of recordings.

read more

Citations
More filters
Proceedings ArticleDOI

Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge

TL;DR: This work investigates the use of deep neural networks (DNNs) for the speaker diarization task to improve performance under domain mismatched conditions and presents results conducted on the DIHARD data, which was released for the 2018 diarized challenge.
Proceedings ArticleDOI

Improved Speaker Diarization of Meeting Speech with Recurrent Selection of Representative Speech Segments and Participant Interaction Pattern Modeling

TL;DR: Two distinct novel improvements to the speaker diarization system are described, one focusing on recurrent selection of representative speech segments for speaker clustering while the other is based on participant interaction pattern modeling.
Proceedings ArticleDOI

Linguistic influences on bottom-up and top-down clustering for speaker diarization

TL;DR: Experimental results confirm that clusters produced through top-down clustering are better normalized against phone variation than those produced through bottom-up clustering and that this accounts for the observed inconsistencies in purification performance.

A Framework for Productive, Efficient and Portable Parallel Computing

TL;DR: This dissertation presents PyCASP, a Python-based software framework that automatically maps Python application code to a variety of parallel platforms, an application- domain-specific framework that uses a systematic, pattern-oriented approach to offer a single productive software development environment for application writ- ers.
Journal ArticleDOI

Speech Activity Detection for Multi-Party Conversation Analyses Based on Likelihood Ratio Test on Spatial Magnitude

TL;DR: The proposed microphone array-based speech activity detection method considers conversations where the number of speakers and speaker locations cannot be restricted, such as when standing and talking, and at poster sessions, and can exploit the enhanced signals obtained from time-frequency masking.
References
More filters

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

S. Chen
TL;DR: The segmentation algorithm can successfully detect acoustic changes; the clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.
Proceedings ArticleDOI

A robust speaker clustering algorithm

TL;DR: The algorithm automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers and has the following advantages: no threshold adjustment requirements; no need for training/development data; and robustness to different data conditions.
Proceedings ArticleDOI

Approaches and applications of audio diarization

TL;DR: An overview of current audio diarization approaches is provided and performance and potential applications are discussed, as well as the performance of current systems as measured in the DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarized evaluation.
Journal ArticleDOI

Robust speaker change detection

TL;DR: In this article, the authors present a criterion which can be used to identify speaker changes in an audio stream without such tuning, which consists of calculating the log likelihood ratio (LLR) of two models with the same number of parameters.
Related Papers (5)