Home
/
Authors
/
Hang Zhao

Author

Hang Zhao

Other affiliations: Zhejiang University, Nvidia, New York University ...read more

Bio: Hang Zhao is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 32, co-authored 83 publications receiving 12696 citations. Previous affiliations of Hang Zhao include Zhejiang University & Nvidia.

Topics: Computer science, Engineering, Artificial neural network, Trajectory, Parsing ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

[...]

Tianyu Hua, Yonglong Tian, Sucheng Ren, Hang Zhao, Leonid Sigal - Show less +1 more

22 Mar 2022

TL;DR: This paper introduces a novel strategy that is conceptually simple, but highly effective, addition to the decoder that allows learnable skip-connections to encoder feature layers, which further improves the performance.

...read moreread less

Abstract: Inspired by the success of self-supervised autoregressive representation learning in natural language (GPT and its variants), and advances in recent visual architecture design with Vision Transformers (ViTs), in this paper, we explore the effect various design choices have on the success of applying such training strategies for visual feature learning. Specifically, we introduce a novel strategy that we call Random Segments with Autoregressive Coding (RandSAC). In RandSAC, we group patch representations (image tokens) into hierarchically arranged segments; within each segment, tokens are predicted in parallel, similar to BERT, while across segment predictions are sequential, similar to GPT. We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning. We illustrate the pertinence of these design choices and explore alternatives on a number of datasets (e.g., CIFAR10, CIFAR100, ImageNet). While our pre-training strategy works with a vanilla Transformer, we also propose a conceptually simple, but highly effective, addition to the decoder that allows learnable skip-connections to encoder$'$s feature layers, which further improves the performance.

...read moreread less

6 citations

Posted Content•

HDMapNet: An Online HD Map Construction and Evaluation Framework.

[...]

Qi Li, Yue Wang, Yilun Wang, Hang Zhao

13 Jul 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Li et al. as discussed by the authors proposed an online map learning method, which dynamically constructs the HD maps based on local sensor observations, to provide semantic and geometry priors to self-driving vehicles than traditional pre-annotated HD maps.

...read moreread less

Abstract: High-definition map (HD map) construction is a crucial problem for autonomous driving. This problem typically involves collecting high-quality point clouds, fusing multiple point clouds of the same scene, annotating map elements, and updating maps constantly. This pipeline, however, requires a vast amount of human efforts and resources which limits its scalability. Additionally, traditional HD maps are coupled with centimeter-level accurate localization which is unreliable in many scenarios. In this paper, we argue that online map learning, which dynamically constructs the HD maps based on local sensor observations, is a more scalable way to provide semantic and geometry priors to self-driving vehicles than traditional pre-annotated HD maps. Meanwhile, we introduce an online map learning method, titled HDMapNet. It encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on the nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. To accelerate future research, we develop customized metrics to evaluate map learning performance, including both semantic-level and instance-level ones. By introducing this method and metrics, we invite the community to study this novel map learning problem. We will release our code and evaluation kit to facilitate future development.

...read moreread less

6 citations

Posted Content•

LID 2020: The Learning from Imperfect Data Challenge Results.

[...]

Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, Liwei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc Van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He - Show less +31 more

17 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new evaluation metric proposed by Zhang 2020rethinking, i.e., IoU curve, is introduced to measure the quality of the generated object localization maps, to find the state-of-the-art approaches in the weakly supervised learning setting for object detection, semantic segmentation, and scene parsing.

...read moreread less

Abstract: Learning from imperfect data becomes an issue in many industrial applications after the research community has made profound progress in supervised learning from perfectly annotated datasets. The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training. A massive amount of user-generated data nowadays available on multiple internet services. How to leverage those and improve the machine learning models is a high impact problem. We organize the challenges in conjunction with the workshop. The goal of these challenges is to find the state-of-the-art approaches in the weakly supervised learning setting for object detection, semantic segmentation, and scene parsing. There are three tracks in the challenge, i.e., weakly supervised semantic segmentation (Track 1), weakly supervised scene parsing (Track 2), and weakly supervised object localization (Track 3). In Track 1, based on ILSVRC DET, we provide pixel-level annotations of 15K images from 200 categories for evaluation. In Track 2, we provide point-based annotations for the training set of ADE20K. In Track 3, based on ILSVRC CLS-LOC, we provide pixel-level annotations of 44,271 images for evaluation. Besides, we further introduce a new evaluation metric proposed by \cite{zhang2020rethinking}, i.e., IoU curve, to measure the quality of the generated object localization maps. This technical report summarizes the highlights from the challenge. The challenge submission server and the leaderboard will continue to open for the researchers who are interested in it. More details regarding the challenge and the benchmarks are available at this https URL

...read moreread less

5 citations

Posted Content•

On Feature Decorrelation in Self-Supervised Learning

[...]

Tianyu Hua¹, Wenxiao Wang, Zihui Xue², Sucheng Ren³, Yue Wang⁴, Hang Zhao⁵ - Show less +2 more•Institutions (5)

Vanderbilt University¹, Fudan University², South China University of Technology³, Massachusetts Institute of Technology⁴, Tsinghua University⁵

02 May 2021-arXiv: Learning

TL;DR: In this article, the authors verify the existence of complete collapse and discover another reachable collapse pattern that is usually overlooked, namely dimensional collapse, and connect dimensional collapse with strong correlations between axes and consider such connection as a strong motivation for feature decorrelation.

...read moreread less

Abstract: In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations. A potential issue of this idea is the existence of completely collapsed solutions (i.e., constant features), which are typically avoided implicitly by carefully chosen implementation details. In this work, we study a relatively concise framework containing the most common components from recent approaches. We verify the existence of complete collapse and discover another reachable collapse pattern that is usually overlooked, namely dimensional collapse. We connect dimensional collapse with strong correlations between axes and consider such connection as a strong motivation for feature decorrelation (i.e., standardizing the covariance matrix). The capability of correlation as an unsupervised metric and the gains from feature decorrelation are verified empirically to highlight the importance and the potential of this insight.

...read moreread less

4 citations

Proceedings Article•

Self-Supervised Segmentation and Source Separation on Videos

[...]

Andrew Rouditchenko¹, Hang Zhao², Chuang Gan³, Josh H. McDermott², Antonio Torralba² - Show less +1 more•Institutions (3)

Intel¹, Massachusetts Institute of Technology², IBM³

01 Jan 2019

4 citations

1
2
3
4
5
6
7
8
9
…
10
11
12
13
14
15
16
…
17
18
19
20
21
22

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Pyramid Scene Parsing Network

[...]

Hengshuang Zhao¹, Jianping Shi², Xiaojuan Qi¹, Xiaogang Wang¹, Jiaya Jia¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, SenseTime²

21 Jul 2017

TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.

...read moreread less

Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

...read moreread less

10,189 citations

Journal Article•DOI•

What Will 5G Be

[...]

Jeffrey G. Andrews¹, Stefano Buzzi², Wan Choi, Stephen V. Hanly³, Angel Lozano⁴, Anthony C. K. Soong⁵, Jianzhong Charlie Zhang⁶ - Show less +3 more•Institutions (6)

University of Texas at Austin¹, University of Cassino², Macquarie University³, Pompeu Fabra University⁴, Huawei⁵, Samsung⁶

03 Jun 2014-IEEE Journal on Selected Areas in Communications

TL;DR: This paper discusses all of these topics, identifying key challenges for future research and preliminary 5G standardization activities, while providing a comprehensive overview of the current literature, and in particular of the papers appearing in this special issue.

...read moreread less

Abstract: What will 5G be? What it will not be is an incremental advance on 4G. The previous four generations of cellular technology have each been a major paradigm shift that has broken backward compatibility. Indeed, 5G will need to be a paradigm shift that includes very high carrier frequencies with massive bandwidths, extreme base station and device densities, and unprecedented numbers of antennas. However, unlike the previous four generations, it will also be highly integrative: tying any new 5G air interface and spectrum together with LTE and WiFi to provide universal high-rate coverage and a seamless user experience. To support this, the core network will also have to reach unprecedented levels of flexibility and intelligence, spectrum regulation will need to be rethought and improved, and energy and cost efficiencies will become even more critical considerations. This paper discusses all of these topics, identifying key challenges for future research and preliminary 5G standardization activities, while providing a comprehensive overview of the current literature, and in particular of the papers appearing in this special issue.

...read moreread less

7,139 citations

Book Chapter•DOI•

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

[...]

Liang-Chieh Chen¹, Yukun Zhu¹, George Papandreou¹, Florian Schroff¹, Hartwig Adam¹ - Show less +1 more•Institutions (1)

Google¹

08 Sep 2018

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

...read moreread less

Abstract: Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://github.com/tensorflow/models/tree/master/research/deeplab.

...read moreread less

7,113 citations

Journal Article•DOI•

Millimeter Wave Mobile Communications for 5G Cellular: It Will Work!

[...]

Theodore S. Rappaport¹, Shu Sun¹, Rimma Mayzus¹, Hang Zhao¹, Yaniv Azar¹, Kevin H. Wang¹, George N. Wong¹, Jocelyn K. Schulz¹, Mathew K. Samimi¹, Felix Gutierrez¹ - Show less +6 more•Institutions (1)

New York University¹

10 May 2013-IEEE Access

TL;DR: The motivation for new mm-wave cellular systems, methodology, and hardware for measurements are presented and a variety of measurement results are offered that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.

...read moreread less

Abstract: The global bandwidth shortage facing wireless carriers has motivated the exploration of the underutilized millimeter wave (mm-wave) frequency spectrum for future broadband cellular communication networks. There is, however, little knowledge about cellular mm-wave propagation in densely populated indoor and outdoor environments. Obtaining this information is vital for the design and operation of future fifth generation cellular networks that use the mm-wave spectrum. In this paper, we present the motivation for new mm-wave cellular systems, methodology, and hardware for measurements and offer a variety of measurement results that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.

...read moreread less

6,708 citations

Posted Content•

Rethinking Atrous Convolution for Semantic Image Segmentation

[...]

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam

17 Jun 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

Abstract: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

5,691 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse