The ICSI RT07s Speaker Diarization System

doi:10.1007/978-3-540-68585-2_47

Book ChapterDOI

The ICSI RT07s Speaker Diarization System

Chuck Wooters, +1 more

- pp 509-519

Chats0

TLDR

The ICSI speaker diarization system as mentioned in this paper automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers, using standard speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion.

Abstract:

In this paper, we present the ICSI speaker diarization system. This system was used in the 2007 National Institute of Standards and Technology (NIST) Rich Transcription evaluation. The ICSI system automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers. Our system uses "standard" speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion. However, we have developed the system with an eye towards robustness and ease of portability. Thus we have avoided the use of any sort of model that requires training on "outside" data and we have attempted to develop algorithms that require as little tuning as possible. The system is simular to last year's system [1] except for three aspects. We used the most recent available version of the beam-forming toolkit, we implemented a new speech/non-speech detector that does not require models trained on meeting data and we performed our development on a much larger set of recordings.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Speaker Diarization: A Review of Recent Research

Xavier Anguera Miro, +5 more

- 01 Feb 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.

...read moreread less

Proceedings ArticleDOI

An HDP-HMM for systems with state persistence

Emily B. Fox, +3 more

TL;DR: A sampling algorithm is developed that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates and demonstrating the advantages of the sticky extension, and the utility of the HDP-HMM in real-world applications.

...read moreread less

Book ChapterDOI

Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications

Yee Whye Teh, +1 more

TL;DR: The role of hierarchical modeling in Bayesian nonparametrics is discussed, focusing on models in which the infinite-dimensional parameters are treated hierarchically, and the value of these hierarchical constructions is demonstrated in a wide range of practical applications.

...read moreread less

Journal ArticleDOI

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

Shrikanth S. Narayanan, +1 more

TL;DR: Behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion are illustrated.

...read moreread less

A sticky HDP-HMM with application to speaker diarization

Emily B. Fox, +3 more

TL;DR: An augmented HDP-HMM is described that provides effective control over the switching rate and makes it possible to treat emission distributions nonparametrically, and a sampling algorithm is developed that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal Article

Robust speaker diarization for meetings : ICSI RT06s meetings evaluation system

Xavier Anguera, +2 more

- 01 Jan 2006 -

Lecture Notes in Computer Science

TL;DR: The ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) conducted on the meetings environment as discussed by the authors is based on the RT05s system, which uses agglomerative clustering with a modified Bayesian information criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters.

...read moreread less

Proceedings ArticleDOI

Filtering the Unknown: Speech Activity Detection in Heterogeneous Video Collections

Marijn Huijbregts, +2 more

TL;DR: The speech activity detection system that was used for detecting speech regions in the Dutch TRECVID video collection is discussed, designed to filter non-speech like music or sound effects out of the signal without the use of predefined non- speech models.

...read moreread less

Journal Article

Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Marijn Huijbregts, +2 more

- 09 May 2007 -

CTIT technical report series

TL;DR: The setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections, for NIST/TRECVID-2007 test collection are reported on.

...read moreread less

Related Papers (5)

An overview of automatic speaker diarization systems

S. E. Tranter, +1 more

- 01 Sep 2006 -

IEEE Transactions on Audio, Speech, and ...

Approaches and applications of audio diarization

D.A. Reynolds, +1 more

The ICSI RT07s Speaker Diarization System

Citations

Speaker Diarization: A Review of Recent Research

An HDP-HMM for systems with state persistence

Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

A sticky HDP-HMM with application to speaker diarization

References

Robust speaker diarization for meetings : ICSI RT06s meetings evaluation system

Filtering the Unknown: Speech Activity Detection in Heterogeneous Video Collections

Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Related Papers (5)

An overview of automatic speaker diarization systems

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

A robust speaker clustering algorithm

Speaker Diarization: A Review of Recent Research

Approaches and applications of audio diarization