scispace - formally typeset
Z

Zihang Dai

Researcher at Google

Publications -  57
Citations -  14783

Zihang Dai is an academic researcher from Google. The author has contributed to research in topics: Language model & Computer science. The author has an hindex of 28, co-authored 52 publications receiving 9340 citations. Previous affiliations of Zihang Dai include Baidu & Carnegie Mellon University.

Papers
More filters
Proceedings Article

XLNet: Generalized Autoregressive Pretraining for Language Understanding

TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.
Posted Content

XLNet: Generalized Autoregressive Pretraining for Language Understanding

TL;DR: XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Proceedings ArticleDOI

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Posted Content

Unsupervised Data Augmentation for Consistency Training

TL;DR: A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Posted Content

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TL;DR: Transformer-XL as discussed by the authors uses a segment-level recurrence mechanism and a novel positional encoding scheme to learn longer-term dependency beyond a fixed-length context without disrupting temporal coherence.