D
Di He
Researcher at Microsoft
Publications - 112
Citations - 5206
Di He is an academic researcher from Microsoft. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 31, co-authored 92 publications receiving 3623 citations. Previous affiliations of Di He include Peking University.
Papers
More filters
Proceedings Article
Dual learning for machine translation
TL;DR: Experiments show that dual-NMT works very well on English ↔ French translation; especially, by learning from monolingual data, it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Posted Content
Dual Learning for Machine Translation
TL;DR: In this paper, the authors proposed a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual learning game, inspired by the following observation: any machine translation task has a dual task, e.g., Englishto-French translation (primal) versus French-to-English translation (dual), the primal and dual tasks can form a closed loop and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler.
Posted Content
On Layer Normalization in the Transformer Architecture
Ruibin Xiong,Yunchang Yang,Di He,Kai Zheng,Shuxin Zheng,Chen Xing,Huishuai Zhang,Yanyan Lan,Liwei Wang,Tie-Yan Liu +9 more
TL;DR: In this paper, the authors show that layer normalization is crucial to the performance of pre-LN Transformers and remove the warm-up stage for the training of Pre-LNs.
Proceedings Article
Incorporating BERT into Neural Machine Translation
TL;DR: A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
Posted Content
A Theoretical Analysis of NDCG Type Ranking Measures
TL;DR: This paper studies, from a theoretical perspective, the widely used Normalized Discounted Cumulative Gain (NDCG)-type ranking measures, and shows that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.