Home
/
Authors
/
Hai-Tao Zheng

Author

Hai-Tao Zheng

Other affiliations: Tencent, Seoul National University

Bio: Hai-Tao Zheng is an academic researcher from Tsinghua University. The author has contributed to research in topics: Ontology (information science) & Language model. The author has an hindex of 17, co-authored 120 publications receiving 2631 citations. Previous affiliations of Hai-Tao Zheng include Tencent & Seoul National University.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

[...]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng¹, Jian Sun•Institutions (1)

Tsinghua University¹

08 Sep 2018

TL;DR: ShuffleNet V2 as discussed by the authors proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs, based on a series of controlled experiments, and derives several practical guidelines for efficient network design.

...read moreread less

Abstract: Currently, the neural network architecture design is mostly guided by the indirect metric of computation complexity, i.e., FLOPs. However, the direct metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical guidelines for efficient network design. Accordingly, a new architecture is presented, called ShuffleNet V2. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

...read moreread less

3,393 citations

Posted Content•

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

[...]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng¹, Jian Sun•Institutions (1)

Tsinghua University¹

30 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs, and derives several practical guidelines for efficient network design, called ShuffleNet V2.

...read moreread less

Abstract: Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

...read moreread less

157 citations

Journal Article•DOI•

Exploiting noun phrases and semantic relationships for text document clustering

[...]

Hai-Tao Zheng¹, Bo-Yeong Kang¹, Hong-Gee Kim¹•Institutions (1)

Seoul National University¹

01 Jun 2009-Information Sciences

TL;DR: This work combines detection of noun phrases with the use of WordNet as background knowledge to explore better ways of representing documents semantically for clustering, and finds that noun phrase analysis improves the WordNet-based clustering method.

...read moreread less

96 citations

Proceedings Article•DOI•

Byte Segment Neural Network for Network Traffic Classification

[...]

Rui Li¹, Xi Xiao¹, Shiguang Ni¹, Hai-Tao Zheng¹, Shu-Tao Xia¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

04 Jun 2018

TL;DR: The recurrent neural network is introduced to network traffic classification and a novel neural network, the Byte Segment Neural Network (BSNN), which has superiority over the traditional machine learning-based method and the packet inspection method.

...read moreread less

Abstract: Network traffic classification, which can map network traffic to protocols in the application layer, is a fundamental technique for network management and security issues such as Quality of Service, network measurement, and network monitoring. Recent researchers focus on extracting features for traditional machine learning methods from flows or datagrams of the specific protocol. However, as the rapid growth of network applications, previous works cannot handle complex novel protocols well. In this paper, we introduce the recurrent neural network to network traffic classification and design a novel neural network, the Byte Segment Neural Network (BSNN). BSNN treats network datagrams as input and gives the classification results directly. In BSNN, a datagram is firstly broken into serval byte segments. Then, these segments are fed to encoders which are based on the recurrent neural network. The information extracted by encoders is combined to a representation vector of the whole datagram. Finally, we apply the softmax function to use this vector for predicting the application protocol of this datagram. There are several key advantages of BSNN: 1) no need for prior knowledge of target applications; 2) can handle both connection-oriented protocols and connection-less protocols; 3) supports multi-classification for protocols; 4) shows outstanding accuracy in both traditional protocols and complex novel protocols. Our thorough experiments on real-world data with different protocols indicate that BSNN gains average F1-measure about 95.82% in multi-classification for five protocols including QQ, PPLive, DNS, 360 and BitTorrent. And it also shows excellent performance for detection of novel protocols. Furthermore, compared with two recent state-of-the-art works, BSNN has superiority over the traditional machine learning-based method and the packet inspection method.

...read moreread less

67 citations

Proceedings Article•DOI•

Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge.

[...]

Ziran Li¹, Ning Ding¹, Zhiyuan Liu¹, Hai-Tao Zheng¹, Ying Shen² - Show less +1 more•Institutions (2)

Tsinghua University¹, Peking University²

01 Jul 2019

TL;DR: A multi-grained lattice framework for Chinese relation extraction is proposed, which incorporates word-level information into character sequence inputs so that segmentation errors can be avoided and model multiple senses of polysemous words with the help of external linguistic knowledge to alleviate polysemy ambiguity.

...read moreread less

Abstract: Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three real-world datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. We will release the source code of this paper in the future.

...read moreread less

65 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Posted Content•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

28 May 2019-arXiv: Learning

TL;DR: A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

...read moreread less

Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at this https URL.

...read moreread less

6,222 citations

Posted Content•

YOLOv4: Optimal Speed and Accuracy of Object Detection

[...]

Alexey Bochkovskiy, Chien-Yao Wang¹, Hong-Yuan Mark Liao¹•Institutions (1)

Academia Sinica¹

23 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

...read moreread less

Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

...read moreread less

5,709 citations

Proceedings Article•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

24 May 2019

TL;DR: EfficientNet-B7 as discussed by the authors proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient, which achieves state-of-the-art accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference.

...read moreread less

3,445 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse