The ICSI RT07s Speaker Diarization System

doi:10.1007/978-3-540-68585-2_47

Book ChapterDOI

The ICSI RT07s Speaker Diarization System

- pp 509-519

TLDR

The ICSI speaker diarization system as mentioned in this paper automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers, using standard speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion.

Abstract:

In this paper, we present the ICSI speaker diarization system. This system was used in the 2007 National Institute of Standards and Technology (NIST) Rich Transcription evaluation. The ICSI system automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers. Our system uses "standard" speech processing components and techniques such as HMMs, agglomerative clustering, and the Bayesian Information Criterion. However, we have developed the system with an eye towards robustness and ease of portability. Thus we have avoided the use of any sort of model that requires training on "outside" data and we have attempted to develop algorithms that require as little tuning as possible. The system is simular to last year's system [1] except for three aspects. We used the most recent available version of the beam-forming toolkit, we implemented a new speech/non-speech detector that does not require models trained on meeting data and we performed our development on a much larger set of recordings.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Israel D. Gebru, +3 more

- 01 May 2018 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The proposed audio-visual spatiotemporal diarization model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones.

...read moreread less

The LIA-EURECOM RT`09 Speaker Diarization System

Corinne Fredouille

TL;DR: In this article, a beamforming for the multiple distant microphone (MDM) condition and also significant enhancements to the speaker segmentation stage of the core speaker diarization system are described.

...read moreread less

Journal ArticleDOI

Prosodic and other Long-Term Features for Speaker Diarization

Gerald Friedland, +3 more

- 01 Jul 2009 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long- term features.

...read moreread less

Proceedings ArticleDOI

Multi-modal speaker diarization of real-world meetings using compressed-domain video features

Gerald Friedland, +2 more

TL;DR: A multi-modal approach is shown where a state-of-the-art speaker diarization system is improved by combining standard acoustic features (MFCCs) with compressed domain video features.

...read moreread less

Journal ArticleDOI

Multimodal Speaker Diarization

A.K. Noulas, +2 more

- 01 Jan 2012 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization and is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovISual space.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Extrapolation, Interpolation, and Smoothing of Stationary Time Series

Norbert Wiener

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

S. Chen

TL;DR: The segmentation algorithm can successfully detect acoustic changes; the clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.

...read moreread less

Proceedings ArticleDOI

A robust speaker clustering algorithm

Jitendra Ajmera, +1 more

TL;DR: The algorithm automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers and has the following advantages: no threshold adjustment requirements; no need for training/development data; and robustness to different data conditions.

...read moreread less

Proceedings ArticleDOI

Approaches and applications of audio diarization

D.A. Reynolds, +1 more

TL;DR: An overview of current audio diarization approaches is provided and performance and potential applications are discussed, as well as the performance of current systems as measured in the DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarized evaluation.

...read moreread less

Journal ArticleDOI

Robust speaker change detection

Jitendra Ajmera, +2 more

- 26 Jul 2004 -

IEEE Signal Processing Letters

TL;DR: In this article, the authors present a criterion which can be used to identify speaker changes in an audio stream without such tuning, which consists of calculating the log likelihood ratio (LLR) of two models with the same number of parameters.

...read moreread less

Related Papers (5)

An overview of automatic speaker diarization systems

S. E. Tranter, +1 more

- 01 Sep 2006 -

IEEE Transactions on Audio, Speech, and ...

Approaches and applications of audio diarization

D.A. Reynolds, +1 more

The ICSI RT07s Speaker Diarization System

Citations

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

The LIA-EURECOM RT`09 Speaker Diarization System

Prosodic and other Long-Term Features for Speaker Diarization

Multi-modal speaker diarization of real-world meetings using compressed-domain video features

Multimodal Speaker Diarization

References

Extrapolation, Interpolation, and Smoothing of Stationary Time Series

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

A robust speaker clustering algorithm

Approaches and applications of audio diarization

Robust speaker change detection

Related Papers (5)

An overview of automatic speaker diarization systems

Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

A robust speaker clustering algorithm

Speaker Diarization: A Review of Recent Research

Approaches and applications of audio diarization