B
Bei Li
Researcher at Northeastern University (China)
Publications - 34
Citations - 843
Bei Li is an academic researcher from Northeastern University (China). The author has contributed to research in topics: Machine translation & Transformer (machine learning model). The author has an hindex of 8, co-authored 23 publications receiving 398 citations. Previous affiliations of Bei Li include Northeastern University.
Papers
More filters
Proceedings ArticleDOI
Learning Deep Transformer Models for Machine Translation.
TL;DR: This paper showed that a deep Transformer model can surpass the Transformer-Big counterpart by proper use of layer normalization and a novel way of passing the combination of previous layers to the next.
Posted Content
Learning Deep Transformer Models for Machine Translation
TL;DR: It is claimed that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next.
Proceedings ArticleDOI
The NiuTrans Machine Translation Systems for WMT19.
Bei Li,Yinqiao Li,Chen Xu,Ye Lin,Jiqiang Liu,Hui Liu,Ziyang Wang,Yuhao Zhang,Nuo Xu,Zeyang Wang,Kai Feng,Hexuan Chen,Tengbo Liu,Yanyang Li,Qiang Wang,Tong Xiao,Jingbo Zhu +16 more
TL;DR: NuTrans neural machine translation systems for the WMT 2019 news translation tasks achieved the highest BLEU scores in {KK↔EN, GU→EN} directions, ranking 2nd in {RU→EN, DE↔CS} and 3rd in {ZH→en, LT→ EN, EN→RU, EN↔DE} among all constrained submissions.
Posted Content
Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation
TL;DR: Surprisingly, it is found that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator, which makes us rethink the real benefits of multi-encoder in context-aware translation.
Proceedings Article
Learning Light-Weight Translation Models from Deep Transformer
TL;DR: GPKD as discussed by the authors proposed a group-permutation based knowledge distillation approach to compress the deep Transformer model into a shallow model, which achieved a BLEU score of 30.63 on English-German newstest 2014.