Home
/
Authors
/
Tong He

Author

Tong He

Other affiliations: York University, Simon Fraser University

Bio: Tong He is an academic researcher from Amazon.com. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 12, co-authored 19 publications receiving 1828 citations. Previous affiliations of Tong He include York University & Simon Fraser University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Bag of Tricks for Image Classification with Convolutional Neural Networks

[...]

Tong He¹, Zhi Zhang¹, Hang Zhang¹, Zhongyue Zhang¹, Junyuan Xie¹, Mu Li¹ - Show less +2 more•Institutions (1)

Amazon.com¹

01 Jun 2019

TL;DR: This article examined a collection of such refinements and empirically evaluated their impact on the final model accuracy through ablation study, and showed that by combining these refinements together, they are able to improve various CNN models significantly.

...read moreread less

Abstract: Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50's top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.

...read moreread less

980 citations

Posted Content•

ResNeSt: Split-Attention Networks

[...]

Hang Zhang¹, Chongruo Wu², Zhongyue Zhang³, Yi Zhu, Zhi Zhang³, Haibin Lin³, Yue Sun³, Tong He³, Jonas Mueller³, R. Manmatha³, Mu Li³, Alexander J. Smola³ - Show less +8 more•Institutions (3)

Facebook¹, University of California, Davis², Amazon.com³

19 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A simple and modular Split-Attention block that enables attention across feature-map groups ResNet-style is presented that preserves the overall ResNet structure to be used in downstream tasks straightforwardly without introducing additional computational costs.

...read moreread less

Abstract: It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.

...read moreread less

822 citations

Posted Content•

Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks

[...]

Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, Zheng Zhang - Show less +11 more

03 Sep 2019-arXiv: Learning

TL;DR: DGL distills the computational patterns of GNNs into a few generalized sparse tensor operations suitable for extensive parallelization and allows users to easily port and leverage the existing components across multiple deep learning frameworks.

...read moreread less

Abstract: Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs. In this paper, we present the design principles and implementation of Deep Graph Library (DGL). DGL distills the computational patterns of GNNs into a few generalized sparse tensor operations suitable for extensive parallelization. By advocating graph as the central programming abstraction, DGL can perform optimizations transparently. By cautiously adopting a framework-neutral design, DGL allows users to easily port and leverage the existing components across multiple deep learning frameworks. Our evaluation shows that DGL significantly outperforms other popular GNN-oriented frameworks in both speed and memory consumption over a variety of benchmarks and has little overhead for small scale workloads.

...read moreread less

588 citations

Posted Content•

Bag of Tricks for Image Classification with Convolutional Neural Networks

[...]

Tong He¹, Zhi Zhang¹, Hang Zhang¹, Zhongyue Zhang¹, Junyuan Xie¹, Mu Li¹ - Show less +2 more•Institutions (1)

Amazon.com¹

04 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper examines a collection of training procedure refinements and empirically evaluates their impact on the final model accuracy through ablation study, and shows that by combining these refinements together, they are able to improve various CNN models significantly.

...read moreread less

299 citations

Journal Article•DOI•

SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines

[...]

Tong He¹, Marten Heidemeyer¹, Fuqiang Ban², Artem Cherkasov², Martin Ester¹ - Show less +1 more•Institutions (2)

Simon Fraser University¹, University of British Columbia²

18 Apr 2017-Journal of Cheminformatics

TL;DR: A method is presented called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions and outperform the previously reported models across the studied datasets.

...read moreread less

Abstract: Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug–target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug–target pairs are not differentiated from inactive cases, and that predicted levels of activity depend on pre-defined binarization thresholds. In this paper, we present a method called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions. Additionally, we propose a version of the method called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity, thus defining the Applicability Domain metrics explicitly. We evaluate SimBoost and SimBoostQuant on two established drug–target interaction benchmark datasets and one new dataset that we propose to use as a benchmark for read-across cheminformatics applications. We demonstrate that our methods outperform the previously reported models across the studied datasets.

...read moreread less

228 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

[...]

Ze Liu¹, Yutong Lin¹, Yue Cao¹, Han Hu¹, Yixuan Wei¹, Zheng Zhang¹, Stephen Lin¹, Baining Guo¹ - Show less +4 more•Institutions (1)

Microsoft¹

25 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.

...read moreread less

Abstract: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (86.4 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The code and models will be made publicly available at~\url{this https URL}.

...read moreread less

3,518 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

Journal Article•DOI•

Res2Net: A New Multi-Scale Backbone Architecture

[...]

Shanghua Gao¹, Ming-Ming Cheng¹, Kai Zhao¹, Xin-Yu Zhang¹, Ming-Hsuan Yang², Philip H. S. Torr³ - Show less +2 more•Institutions (3)

Nankai University¹, University of California, Merced², University of Oxford³

01 Feb 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.

...read moreread less

Abstract: Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/ .

...read moreread less

1,553 citations

Proceedings Article•DOI•

GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond

[...]

Yue Cao¹, Jiarui Xu², Stephen Lin¹, Fangyun Wei¹, Han Hu¹ - Show less +1 more•Institutions (2)

Microsoft¹, Hong Kong University of Science and Technology²

25 Apr 2019

TL;DR: A simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation is created, and this simplified design shares similar structure with Squeeze-Excitation Network (SENet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks.

...read moreread less

Abstract: The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks.

...read moreread less

1,202 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse