Home
/
Authors
/
Dong Yu

Author

Dong Yu

Other affiliations: Peking University, Microsoft, City University of Hong Kong

Bio: Dong Yu is an academic researcher from Tencent. The author has contributed to research in topics: Artificial neural network & Word error rate. The author has an hindex of 72, co-authored 339 publications receiving 39098 citations. Previous affiliations of Dong Yu include Peking University & Microsoft.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Recurrent Neural Networks and Related Models

[...]

Dong Yu¹, Li Deng¹•Institutions (1)

Microsoft¹

01 Jan 2015

TL;DR: This chapter presents the state-space formulation of the basic RNN as a nonlinear dynamical system, where the recurrent matrix governing the system dynamics is largely unstructured and analyzes the RNNAs a bottom-up, discriminative, dynamic system model against the top-down, generative counterpart of dynamic system.

...read moreread less

Abstract: A recurrent neural network (RNN) is a class of neural network models where many connections among its neurons form a directed cycle. This gives rise to the structure of internal states or memory in the RNN, endowing it with the dynamic temporal behavior not exhibited by the DNN discussed in earlier chapters. In this chapter, we first present the state-space formulation of the basic RNN as a nonlinear dynamical system, where the recurrent matrix governing the system dynamics is largely unstructured. For such basic RNNs, we describe two algorithms for learning their parameters in some detail: (1) the most popular algorithm of backpropagation through time (BPTT); and (2) a more rigorous, primal-dual optimization technique, where constraints on the RNN’s recurrent matrix are imposed to guarantee stability during RNN learning. Going beyond basic RNNs, we further study an advanced version of the RNN, which exploits the structure called long-short-term memory (LSTM), and analyzes its strengths over the basic RNN both in terms of model construction and of practical applications including some latest speech recognition results. Finally, we analyze the RNN as a bottom-up, discriminative, dynamic system model against the top-down, generative counterpart of dynamic system as discussed in Chap. 4. The analysis and discussion lead to potentially more effective and advanced RNN-like architectures and learning paradigm where the strengths of discriminative and generative modeling are integrated while their respective weaknesses are overcome.

...read moreread less

5 citations

Proceedings Article•DOI•

Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.

[...]

Lianwu Chen¹, Meng Yu¹, Yanmin Qian¹, Dan Su¹, Dong Yu¹ - Show less +1 more•Institutions (1)

Tencent¹

02 Sep 2018

TL;DR: It is found that SSGAN-PIT outperforms SSGAN without PIT and the neural networks based speech separation with or without PIT, which confirms the feasibility of the proposed model and training approach for efficient speech separation.

...read moreread less

Abstract: We explore generative adversarial networks (GANs) for speech separation, particularly with permutation invariant training (SSGAN-PIT). Prior work [1] demonstrates that GANs can be implemented for suppressing additive noise in noisy speech waveform and improving perceptual speech quality. In this work, we train GANs for speech separation which enhances multiple speech sources simultaneously with the permutation issue addressed by the utterance level PIT in the training of the generator network. We propose operating GANs on the power spectrum domain instead of waveforms to reduce computation. To better explore time dependencies, recurrent neural networks (RNNs) with long short-term memory (LSTM) are adopted for both generator and discriminator in this study. We evaluated SSGAN-PIT on the WSJ0 two-talker mixed speech separation task and found that SSGAN-PIT outperforms SSGAN without PIT and the neural networks based speech separation with or without PIT. The evaluation confirms the feasibility of the proposed model and training approach for efficient speech separation. The convergence behavior of permutation invariant training and adversarial training are analyzed.

...read moreread less

5 citations

Proceedings Article•DOI•

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis

[...]

Mu Wang¹, Xixin Wu², Zhiyong Wu¹, Shiyin Kang³, Deyi Tuo³, Li Guangzhi³, Dan Su³, Dong Yu³, Helen Meng¹ - Show less +5 more•Institutions (3)

Tsinghua University¹, The Chinese University of Hong Kong², Tencent³

12 May 2019

TL;DR: This work introduces a fully convolutional neural network (CNN) model, which can effiently run on parallel processers, for speech synthesis, and shows that CNNs with variational inference can generate highly natural speech on a par with end-to-end models.

...read moreread less

Abstract: Recurrent neural networks, such as gated recurrent units (GRUs) and long short-term memory (LSTM), are widely used on acoustic modeling for speech synthesis. However, such sequential generating processes are not friendly to today’s massively parallel computing devices. We introduce a fully convolutional neural network (CNN) model, which can effiently run on parallel processers, for speech synthesis. To improve the quality of the generated acoustic features, we strengthen our model with variational inference. We also use quasi-recurrent neural networks (QRNNs) to smoothen the generated acoustic features. Finally, a high-quality parallel WaveNet model is used to generate audio samples. Our contributions are twofold. First, we show that CNNs with variational inference can generate highly natural speech on a par with end-to-end models; the use of QRNNs further improves the synthetic quality by reducing trembling of generated acoustic features and introduces very little run-time overheads. Second, we show some techniques to further speed up the sampling process of the parallel WaveNet model.

...read moreread less

5 citations

Journal Article•DOI•

Resilient control design for networked DC microgrids under time-constrained DoS attacks.

[...]

Songlin Hu, Fuyi Yang, S.B. Gorbachev, Dong Yu, Victor Kuzin, Chao Deng - Show less +2 more

01 Feb 2022-Isa Transactions

TL;DR: In this paper , the authors considered the duration characteristics of sporadic/aperiodic DoS attacks, and proposed a new type of time-constrained DoS attack model, which is characterized by DoS frequency and DoS duration.

...read moreread less

Abstract: This paper studies the resilient current controller design for the networked DC microgrid system with multiple constant power loads (CPLs) under a new type of time-constrained denial-of-service (DoS) attack. Different from the existing DoS attack models, which are often characterized by DoS frequency and DoS duration, this paper only considers the duration characteristics of the sporadic/aperiodic DoS attacks, and proposes a new type of time-constrained DoS attack model. Under the effects of such DoS attacks, a switching state feedback control law is constructed and a switching-like DC microgrid system model is then established. Furthermore, based on an attack-parameter-dependent time-varying Lyapunov function (TVLF) method, the exponential stability criterion of the resulting DC microgrid system under aperiodic DoS attacks is derived, and a new resilient controller design method is proposed. Finally, simulation studies are given to verify the effectiveness and merits of the proposed resilient control design scheme on achieving the desired control performance and attack resilience.

...read moreread less

5 citations

Proceedings Article•DOI•

Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data

[...]

Jun Wang¹, Dan Su¹, Jie Chen¹, Shulin Feng², Dongpeng Ma¹, Na Li¹, Dong Yu¹ - Show less +3 more•Institutions (2)

Tencent¹, Peking University²

12 May 2019

TL;DR: This article proposed a novel method which simultaneously models both the sequence discriminative training and the feature discriminating learning within a single network architecture, which obviates the need for pre-segmented training data.

...read moreread less

Abstract: In this work, we try to answer two questions: Can deeply learned features with discriminative power benefit an ASR system’s robustness to acoustic variability? And how to learn them without requiring framewise labelled sequence training data? As existing methods usually require knowing where the labels occur in the input sequence, they have so far been limited to many real-world sequence learning tasks. We propose a novel method which simultaneously models both the sequence discriminative training and the feature discriminative learning within a single network architecture, so that it can learn discriminative deep features in sequence training that obviates the need for presegmented training data. Our experiment in a realistic industrial ASR task shows that, without requiring any specific fine-tuning or additional complexity, our proposed models have consistently outperformed state-of-the-art models and significantly reduced Word Error Rate (WER) under all test conditions, and especially with highest improvements under unseen noise conditions, by relative 12.94%, 8.66% and 5.80%, showing our proposed models can generalize better to acoustic variability.

...read moreread less

5 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
…
55
56
57
58
59
60
61
…
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Journal Article•DOI•

Deep learning

[...]

Yann LeCun¹, Yann LeCun², Yoshua Bengio³, Geoffrey E. Hinton⁴, Geoffrey E. Hinton⁵ - Show less +1 more•Institutions (5)

Facebook¹, New York University², Université de Montréal³, University of Toronto⁴, Google⁵

28 May 2015-Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

...read moreread less

46,982 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse