Home
/
Authors
/
Simon Bozonnet

Author

Simon Bozonnet

Other affiliations: Technische Universität München

Bio: Simon Bozonnet is an academic researcher from Institut Eurécom. The author has contributed to research in topics: Speaker diarisation & Speaker recognition. The author has an hindex of 9, co-authored 13 publications receiving 850 citations. Previous affiliations of Simon Bozonnet include Technische Universität München.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speaker Diarization: A Review of Recent Research

[...]

Xavier Anguera Miro¹, Simon Bozonnet², Nicholas Evans², Corinne Fredouille³, Gerald Friedland⁴, Oriol Vinyals⁴ - Show less +2 more•Institutions (4)

Telefónica¹, Institut Eurécom², University of Avignon³, Institute of Company Secretaries of India⁴

01 Feb 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.

...read moreread less

Abstract: Speaker diarization is the task of determining “who spoke when?” in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.

...read moreread less

706 citations

Journal Article•

Speaker diarization : A review of recent research

[...]

Xavier Anguera¹, Simon Bozonnet², Nicholas Evans², Corinne Fredouille, Gerald Friedland³, Oriol Vinyals³ - Show less +2 more•Institutions (3)

Telefónica¹, Institut Eurécom², International Computer Science Institute³

01 Aug 2010-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Speaker diarization is the task of determining "who spoke when" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers as discussed by the authors.

...read moreread less

Abstract: Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain In this paper we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research

...read moreread less

64 citations

Journal Article•DOI•

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization

[...]

Nicholas Evans¹, Simon Bozonnet¹, Dong Wang¹, Corinne Fredouille², Raphaël Troncy¹ - Show less +1 more•Institutions (2)

Institut Eurécom¹, University of Avignon²

01 Feb 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this paper, the relative merits of the two most general, dominant approaches to speaker diarization involving bottom-up and top-down hierarchical clustering are analyzed. And a new combination strategy is proposed to exploit the merits of these two approaches.

...read moreread less

Abstract: This paper presents a theoretical framework to analyze the relative merits of the two most general, dominant approaches to speaker diarization involving bottom-up and top-down hierarchical clustering. We present an original qualitative comparison which argues how the two approaches are likely to exhibit different behavior in speaker inventory optimization and model training: bottom-up approaches will capture comparatively purer models and will thus be more sensitive to nuisance variation such as that related to the speech content; top-down approaches, in contrast, will produce less discriminative speaker models but, importantly, models which are potentially better normalized against nuisance variation. We report experiments conducted on two standard, single-channel NIST RT evaluation datasets which validate our hypotheses. Results show that competitive performance can be achieved with both bottom-up and top-down approaches (average DERs of 21% and 22%), and that neither approach is superior. Speaker purification, which aims to improve speaker discrimination, gives more consistent improvements with the top-down system than with the bottom-up system (average DERs of 19% and 25%), thereby confirming that the top-down system is less discriminative and that the bottom-up system is less stable. Finally, we report a new combination strategy that exploits the merits of the two approaches. Combination delivers an average DER of 17% and confirms the intrinsic complementary of the two approaches.

...read moreread less

49 citations

Proceedings Article•DOI•

The lia-eurecom RT'09 speaker diarization system: Enhancements in speaker modelling and cluster purification

[...]

Simon Bozonnet¹, Nicholas Evans¹, Corinne Fredouille²•Institutions (2)

Institut Eurécom¹, University of Avignon²

14 Mar 2010

TL;DR: In this paper, a top-down approach to speaker diarization is proposed and compared to bottom-up and top-up approaches, which are both computationally efficient, but also prone to poor model initialisation and cluster impurities.

...read moreread less

Abstract: There are two approaches to speaker diarization. They are bottom-up and top-down. Our work on top-down systems show that they can deliver competitive results compared to bottom-up systems and that they are extremely computationally efficient, but also that they are particularly prone to poor model initialisation and cluster impurities. In this paper we present enhancements to our state-of-the-art, top-down approach to speaker diarization that deliver improved stability across three different datasets composed of conference meetings from five standard NIST RT evaluations. We report an improved approach to speaker modelling which, despite having greater chances for cluster impurities, delivers a 35% relative improvement in DER for the MDM condition. We also describe new work to incorporate cluster purification into a top-down system which delivers relative improvements of 44% over the baseline system without compromising computational efficiency.

...read moreread less

42 citations

Proceedings Article•DOI•

Speech overlap detection and attribution using convolutive non-negative sparse coding

[...]

Ravichander Vipperla¹, Jürgen T. Geiger², Simon Bozonnet¹, Dong Wang¹, Nicholas Evans¹, Björn Schuller², Gerhard Rigoll² - Show less +3 more•Institutions (2)

Institut Eurécom¹, Technische Universität München²

25 Mar 2012

TL;DR: Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector and in a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.

...read moreread less

Abstract: Overlapping speech is known to degrade speaker diarization performance with impacts on speaker clustering and segmentation. While previous work made important advances in detecting overlapping speech intervals and in attributing them to relevant speakers, the problem remains largely unsolved. This paper reports the first application of convolutive non-negative sparse coding (CNSC) to the overlap problem. CNSC aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to overlap detection and attribution. Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector. In a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.

...read moreread less

29 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Speaker Diarization: A Review of Recent Research

[...]

Xavier Anguera Miro¹, Simon Bozonnet², Nicholas Evans², Corinne Fredouille³, Gerald Friedland⁴, Oriol Vinyals⁴ - Show less +2 more•Institutions (4)

Telefónica¹, Institut Eurécom², University of Avignon³, Institute of Company Secretaries of India⁴

01 Feb 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.

...read moreread less

706 citations

Book•

2017 25th European Signal Processing Conference (EUSIPCO)

[...]

Ieee Staff

01 Jan 2017

376 citations

Journal Article•DOI•

pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

[...]

Theodoros Giannakopoulos

11 Dec 2015-PLOS ONE

TL;DR: In this paper, the authors present pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization.

...read moreread less

Abstract: Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

...read moreread less

362 citations

Journal Article•DOI•

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

[...]

Shrikanth S. Narayanan¹, Panayiotis G. Georgiou¹•Institutions (1)

University of Southern California¹

07 Feb 2013

TL;DR: Behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion are illustrated.

...read moreread less

Abstract: The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

...read moreread less

286 citations

Journal Article•DOI•

A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version

[...]

Iulian Vlad Serban¹, Ryan Lowe², Peter Henderson², Laurent Charlin³, Joelle Pineau² - Show less +1 more•Institutions (3)

Université de Montréal¹, McGill University², HEC Montréal³

01 Jun 2018

TL;DR: A wide survey of publicly available datasets suitable for data-driven learning of dialogue systems is carried out and important characteristics of these datasets are discussed and how they can be used to learn diverse dialogue strategies.

...read moreread less

Abstract: During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.

...read moreread less

239 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

Collapse