Home
/
Authors
/
Gilles Boulianne

Author

Gilles Boulianne

Other affiliations: Institut national de la recherche scientifique

Bio: Gilles Boulianne is an academic researcher from École de technologie supérieure. The author has contributed to research in topics: Speaker recognition & Word error rate. The author has an hindex of 19, co-authored 73 publications receiving 7592 citations. Previous affiliations of Gilles Boulianne include Institut national de la recherche scientifique.

Papers published on a yearly basis

2023
2022
2020
2019
2018
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1996
1994
1992

Papers

PDF

Open Access

More filters

Proceedings Article•

The Kaldi Speech Recognition Toolkit

[...]

Daniel Povey¹, Arnab Ghoshal², Gilles Boulianne, Lukas Burget³, Ondrej Glembek³, Nagendra Kumar Goel, Mirko Hannemann³, Petr Motlicek⁴, Yanmin Qian⁵, Petr Schwarz³, Jan Silovsky, Georg Stemmer⁶, Karel Vesely³ - Show less +9 more•Institutions (6)

Microsoft¹, Saarland University², Brno University of Technology³, Idiap Research Institute⁴, Tsinghua University⁵, University of Erlangen-Nuremberg⁶

01 Jan 2011

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

...read moreread less

5,857 citations

Journal Article•DOI•

Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

[...]

Patrick Kenny, Gilles Boulianne, Pierre Ouellet, Pierre Dumouchel

01 May 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown how the two approaches to the problem of session variability in Gaussian mixture model (GMM)-based speaker verification, eigenchannels, and joint factor analysis can be implemented using essentially the same software at all stages except for the enrollment of target speakers.

...read moreread less

Abstract: We compare two approaches to the problem of session variability in Gaussian mixture model (GMM)-based speaker verification, eigenchannels, and joint factor analysis, on the National Institute of Standards and Technology (NIST) 2005 speaker recognition evaluation data. We show how the two approaches can be implemented using essentially the same software at all stages except for the enrollment of target speakers. We demonstrate the effectiveness of zt-norm score normalization and a new decision criterion for speaker recognition which can handle large numbers of t-norm speakers and large numbers of speaker factors at little computational cost. We found that factor analysis was far more effective than eigenchannel modeling. The best result we obtained was a detection cost of 0.016 on the core condition (all trials) of the evaluation

...read moreread less

773 citations

Journal Article•DOI•

Eigenvoice modeling with sparse training data

[...]

Patrick Kenny, Gilles Boulianne, Pierre Dumouchel

18 Apr 2005-IEEE Transactions on Speech and Audio Processing

TL;DR: This work derives an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and shows how it can be regarded as a new method of eigenvoice estimation.

...read moreread less

Abstract: We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate as many eigenvoices from a given training set as there are training speakers. In the limit as the amount of training data for each speaker tends to infinity, it is equivalent to cluster adaptive training.

...read moreread less

523 citations

Journal Article•DOI•

Speaker and Session Variability in GMM-Based Speaker Verification

[...]

Patrick Kenny, Gilles Boulianne, Pierre Ouellet, Pierre Dumouchel

01 May 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which is called joint factor analysis is presented.

...read moreread less

Abstract: We present a corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of the hidden variables in the factor analysis model and verification tests are conducted using a new type of likelihood II ratio statistic. Using the NIST 1999 and 2000 speaker recognition evaluation data sets, we show that the effectiveness of this approach depends on the availability of a training corpus which is well matched with the evaluation set used for testing. Experiments on the NIST 1999 evaluation set using a mismatched corpus to train factor analysis models did not result in any improvement over standard methods, but we found that, even with this type of mismatch, feature warping performs extremely well in conjunction with the factor analysis model, and this enabled us to obtain very good results (equal error rates of about 6.2%)

...read moreread less

268 citations

Proceedings Article•DOI•

Generating exact lattices in the WFST framework

[...]

Daniel Povey¹, Mirko Hannemann¹, Gilles Boulianne, Lukas Burget², Arnab Ghoshal³, Milos Janda², Martin Karafiat², Stefan Kombrink², Petr Motlicek, Yanmin Qian⁴, Korbinian Riedhammer⁵, Karel Vesely², Ngoc Thang Vu⁶ - Show less +9 more•Institutions (6)

Microsoft¹, Brno University of Technology², University of Edinburgh³, Tsinghua University⁴, University of Erlangen-Nuremberg⁵, Karlsruhe Institute of Technology⁶

25 Mar 2012

TL;DR: A lattice generation method that is exact, i.e. it satisfies all the natural properties the authors would want from a lattice of alternative transcriptions of an utterance, and does not introduce substantial overhead above one-best decoding.

...read moreread less

Abstract: We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is “fully expanded”, i.e. where the arcs correspond to HMM transitions. It outputs lattices that include HMM-state-level alignments as well as word labels. The general idea is to create a state-level lattice during decoding, and to do a special form of determinization that retains only the best-scoring path for each word sequence. This special determinization algorithm is a solution to the following problem: Given a WFST A, compute a WFST B that, for each input-symbol-sequence of A, contains just the lowest-cost path through A.

...read moreread less

127 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Dropout: a simple way to prevent neural networks from overfitting

[...]

Nitish Srivastava¹, Geoffrey E. Hinton¹, Alex Krizhevsky¹, Ilya Sutskever¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

33,597 citations

Posted Content•

Representation Learning with Contrastive Predictive Coding

[...]

Aaron van den Oord¹, Yazhe Li¹, Oriol Vinyals¹•Institutions (1)

Google¹

10 Jul 2018-arXiv: Learning

TL;DR: This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

...read moreread less

Abstract: While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

...read moreread less

5,444 citations

Proceedings Article•DOI•

Librispeech: An ASR corpus based on public domain audio books

[...]

Vassil Panayotov¹, Guoguo Chen¹, Daniel Povey¹, Sanjeev Khudanpur¹•Institutions (1)

Johns Hopkins University¹

19 Apr 2015

TL;DR: It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.

...read moreread less

Abstract: This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

...read moreread less

4,770 citations

Journal Article•DOI•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

3,526 citations

Book•

Deep Learning: Methods and Applications

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

12 Jun 2014

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

2,817 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse