scispace - formally typeset
D

Di He

Researcher at Microsoft

Publications -  112
Citations -  5206

Di He is an academic researcher from Microsoft. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 31, co-authored 92 publications receiving 3623 citations. Previous affiliations of Di He include Peking University.

Papers
More filters
Proceedings Article

Dual learning for machine translation

TL;DR: Experiments show that dual-NMT works very well on English ↔ French translation; especially, by learning from monolingual data, it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Posted Content

Dual Learning for Machine Translation

TL;DR: In this paper, the authors proposed a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual learning game, inspired by the following observation: any machine translation task has a dual task, e.g., Englishto-French translation (primal) versus French-to-English translation (dual), the primal and dual tasks can form a closed loop and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler.
Posted Content

On Layer Normalization in the Transformer Architecture

TL;DR: In this paper, the authors show that layer normalization is crucial to the performance of pre-LN Transformers and remove the warm-up stage for the training of Pre-LNs.
Proceedings Article

Incorporating BERT into Neural Machine Translation

TL;DR: A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
Posted Content

A Theoretical Analysis of NDCG Type Ranking Measures

TL;DR: This paper studies, from a theoretical perspective, the widely used Normalized Discounted Cumulative Gain (NDCG)-type ranking measures, and shows that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.