scispace - formally typeset
D

Derek F. Wong

Researcher at University of Macau

Publications -  117
Citations -  2343

Derek F. Wong is an academic researcher from University of Macau. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 21, co-authored 117 publications receiving 1604 citations. Previous affiliations of Derek F. Wong include Tencent.

Papers
More filters
Proceedings ArticleDOI

Learning Deep Transformer Models for Machine Translation.

TL;DR: This paper showed that a deep Transformer model can surpass the Transformer-Big counterpart by proper use of layer normalization and a novel way of passing the combination of previous layers to the next.
Posted Content

Learning Deep Transformer Models for Machine Translation

TL;DR: It is claimed that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next.
Proceedings ArticleDOI

Modeling Localness for Self-Attention Networks

TL;DR: This work cast localness modeling as a learnable Gaussian bias, which indicates the central and scope of the local region to be paid more attention in self-attention networks, to maintain the strength of capturing long distance dependencies while enhance the ability of capturing short-range dependencies.
Proceedings Article

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation

TL;DR: The acquisition of a large scale and high quality parallel corpora for English and Chinese for Statistical Machine Translation (SMT) is described, designed to embrace eight different domains.
Proceedings ArticleDOI

Norm-Based Curriculum Learning for Neural Machine Translation

TL;DR: This paper aims to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method that uses the norm (aka length or module) of a word embedding as a measure of the difficulty of the sentence, the competence of the model, and the weight of the sentences.