Home
/
Authors
/
Tianyu He

Author

Tianyu He

Other affiliations: University of Science and Technology of China

Bio: Tianyu He is an academic researcher from Alibaba Group. The author has contributed to research in topics: Computer science & Data compression. The author has an hindex of 8, co-authored 23 publications receiving 341 citations. Previous affiliations of Tianyu He include University of Science and Technology of China.

Topics: Computer science, Data compression, Encoder, Image compression, Codec ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

[...]

Tianyu He¹, Xu Tan², Yingce Xia¹, Di He², Tao Qin², Zhibo Chen¹, Tie-Yan Liu² - Show less +3 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

01 Jan 2018

TL;DR: The concept of layer-wise coordination for NMT is proposed, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level.

...read moreread less

Abstract: Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level. Specifically, we design a layer-wise attention and mixed attention mechanism, and further share the parameters of each layer between the encoder and decoder to regularize and coordinate the learning. Experiments show that combined with the state-of-the-art Transformer model, layer-wise coordination achieves improvements on three IWSLT and two WMT translation tasks. More specifically, our method achieves 34.43 and 29.01 BLEU score on WMT16 English-Romanian and WMT14 English-German tasks, outperforming the Transformer baseline.

...read moreread less

120 citations

Journal Article•DOI•

Learning for Video Compression

[...]

Zhibo Chen¹, Tianyu He¹, Xin Jin¹, Feng Wu¹•Institutions (1)

University of Science and Technology of China¹

01 Feb 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The proposed PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks can model spatiotemporal coherence to effectively perform predictive coding inside the learning network and provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

...read moreread less

Abstract: One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

...read moreread less

116 citations

Journal Article•DOI•

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

[...]

Yingce Xia¹, Tianyu He², Xu Tan¹, Fei Tian¹, Di He³, Tao Qin¹ - Show less +2 more•Institutions (3)

Microsoft¹, University of Science and Technology of China², Peking University³

17 Jul 2019

TL;DR: This work considers model-level sharing and ties the whole parts of the encoder and decoder of an NMT model, and obtains a compact model named Tied Transformer, which demonstrates that such a simple method works well for both similar and dissimilar language pairs.

...read moreread less

Abstract: Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

...read moreread less

58 citations

Journal Article•DOI•

ImmerTai: Immersive Motion Learning in VR Environments

[...]

Xiaoming Chen¹, Zhibo Chen¹, Ye Li¹, Tianyu He¹, Junhui Hou², Sen Liu¹, Ying He³ - Show less +3 more•Institutions (3)

University of Science and Technology of China¹, City University of Hong Kong², Nanyang Technological University³

01 Jan 2019-Journal of Visual Communication and Image Representation

TL;DR: The results show that ImmerTai can accelerate the learning process of students noticeably compared to non-immersive learning with the conventional PC setup, and there is a substantial difference in the quality of the learnt motion between CAVE and HMD compared to PC.

...read moreread less

39 citations

Journal Article•DOI•

Learning based Facial Image Compression with semantic fidelity metric

[...]

Zhibo Chen¹, Tianyu He¹•Institutions (1)

University of Science and Technology of China¹

21 Apr 2019-Neurocomputing

TL;DR: A Learning based Facial Image Compression framework with a novel Regionally Adaptive Pooling module whose parameters can be automatically optimized according to gradient feedback from an integrated hybrid semantic fidelity metric, including a successfully exploration to apply Generative Adversarial Network (GAN) as metric directly in image compression scheme.

...read moreread less

38 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Posted Content•

Language Models are Few-Shot Learners

[...]

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020-arXiv: Computation and Language

TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

...read moreread less

1,886 citations

Journal Article•

PaLM: Scaling Language Modeling with Pathways

[...]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Velu Prabhakaran, Emily Reif, Nan Du, B. C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Peng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, L Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Oliveira Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zong Tuan Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, M. Catasta, Jason Loh Seong Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeffrey Dean, Slav Petrov, Noah Fiedel - Show less +63 more

05 Apr 2022-arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-speciﬁc training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly eﬃcient training across multiple TPU Pods. We demonstrate continued beneﬁts of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the ﬁnetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A signiﬁcant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

...read moreread less

1,429 citations

Proceedings Article•

MASS: Masked Sequence to Sequence Pre-training for Language Generation

[...]

Kaitao Song¹, Xu Tan², Tao Qin², Jianfeng Lu¹, Tie-Yan Liu² - Show less +1 more•Institutions (2)

Nanjing University of Science and Technology¹, Microsoft²

24 May 2019

TL;DR: This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model.

...read moreread less

Abstract: Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of representation extraction and language modeling. By further fine-tuning on a variety of zero/low-resource language generation tasks, including neural machine translation, text summarization and conversational response generation (3 tasks and totally 8 datasets), MASS achieves significant improvements over the baselines without pre-training or with other pre-training methods. Specially, we achieve the state-of-the-art accuracy (37.5 in terms of BLEU score) on the unsupervised English-French translation, even beating the early attention-based supervised model.

...read moreread less

622 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

Collapse