Home
/
Authors
/
Lukas Burget

Author

Lukas Burget

Other affiliations: Qualcomm, University of California, Berkeley, International Computer Science Institute ...read more

Bio: Lukas Burget is an academic researcher from Brno University of Technology. The author has contributed to research in topics: Speaker recognition & Speaker diarisation. The author has an hindex of 53, co-authored 252 publications receiving 21375 citations. Previous affiliations of Lukas Burget include Qualcomm & University of California, Berkeley.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001

Papers

PDF

Open Access

More filters

Proceedings Article•

The Kaldi Speech Recognition Toolkit

[...]

Daniel Povey¹, Arnab Ghoshal², Gilles Boulianne, Lukas Burget³, Ondrej Glembek³, Nagendra Kumar Goel, Mirko Hannemann³, Petr Motlicek⁴, Yanmin Qian⁵, Petr Schwarz³, Jan Silovsky, Georg Stemmer⁶, Karel Vesely³ - Show less +9 more•Institutions (6)

Microsoft¹, Saarland University², Brno University of Technology³, Idiap Research Institute⁴, Tsinghua University⁵, University of Erlangen-Nuremberg⁶

01 Jan 2011

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

...read moreread less

5,857 citations

Proceedings Article•

Recurrent neural network based language model

[...]

Tomas Mikolov¹, Martin Karafiat¹, Lukas Burget¹, Jan Cernocký, Sanjeev Khudanpur² - Show less +1 more•Institutions (2)

Brno University of Technology¹, Johns Hopkins University²

01 Jan 2010

TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.

...read moreread less

Abstract: A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model. Speech recognition experiments show around 18% reduction of word error rate on the Wall Street Journal task when comparing models trained on the same amount of data, and around 5% on the much harder NIST RT05 task, even when the backoff model is trained on much more data than the RNN LM. We provide ample empirical evidence to suggest that connectionist language models are superior to standard n-gram techniques, except their high computational (training) complexity. Index Terms: language modeling, recurrent neural networks, speech recognition

...read moreread less

5,751 citations

Proceedings Article•DOI•

Extensions of recurrent neural network language model

[...]

Tomas Mikolov¹, Stefan Kombrink¹, Lukas Burget¹, Jan Cernocky¹, Sanjeev Khudanpur² - Show less +1 more•Institutions (2)

Brno University of Technology¹, Johns Hopkins University²

22 May 2011

TL;DR: Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.

...read moreread less

Abstract: We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

...read moreread less

1,675 citations

Proceedings Article•DOI•

Sequence-discriminative training of deep neural networks

[...]

Karel Veselý¹, Arnab Ghoshal², Lukas Burget¹, Daniel Povey³•Institutions (3)

Brno University of Technology¹, University of Edinburgh², Johns Hopkins University³

01 Aug 2013

TL;DR: Different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on a standard 300 hour American conversational telephone speech task.

...read moreread less

Abstract: Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American En- glish conversational telephone speech task. Different sequence- discriminative criteria — maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI — are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria — lattices are re- generated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hy- potheses are disjoint are removed from the gradient compu- tation. Starting from a competitive DNN baseline trained us- ing cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on aver- age. Little difference is noticed between the different sequence- based criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results. Index Terms: speech recognition, deep learning, sequence- criterion training, neural networks, reproducible research

...read moreread less

745 citations

Proceedings Article•DOI•

Strategies for training large scale neural network language models

[...]

Tomas Mikolov¹, Anoop Deoras², Daniel Povey³, Lukas Burget¹, Jan Cernocky¹ - Show less +1 more•Institutions (3)

Brno University of Technology¹, Johns Hopkins University², Microsoft³

01 Dec 2011

TL;DR: This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.

...read moreread less

Abstract: We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

...read moreread less

539 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

Collapse

Cited by

PDF

Open Access

More filters

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Journal Article•

Dropout: a simple way to prevent neural networks from overfitting

[...]

Nitish Srivastava¹, Geoffrey E. Hinton¹, Alex Krizhevsky¹, Ilya Sutskever¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

33,597 citations

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse