scispace - formally typeset
Author

Hang Zhao

Other affiliations: Zhejiang University, Nvidia, New York University  ...read more
Bio: Hang Zhao is an academic researcher from Tsinghua University. The author has contributed to research in topic(s): Artificial neural network & Trajectory. The author has an hindex of 32, co-authored 83 publication(s) receiving 12696 citation(s). Previous affiliations of Hang Zhao include Zhejiang University & Nvidia.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The motivation for new mm-wave cellular systems, methodology, and hardware for measurements are presented and a variety of measurement results are offered that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.
Abstract: The global bandwidth shortage facing wireless carriers has motivated the exploration of the underutilized millimeter wave (mm-wave) frequency spectrum for future broadband cellular communication networks. There is, however, little knowledge about cellular mm-wave propagation in densely populated indoor and outdoor environments. Obtaining this information is vital for the design and operation of future fifth generation cellular networks that use the mm-wave spectrum. In this paper, we present the motivation for new mm-wave cellular systems, methodology, and hardware for measurements and offer a variety of measurement results that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.

5,589 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, is introduced and it is shown that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis.
Abstract: Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the communitys efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with 150 object and stuff classes included. Several segmentation baseline models are evaluated on the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines. We further show that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis1.

1,243 citations

Journal ArticleDOI
TL;DR: It is shown that the quality of the results improves significantly with better loss functions, even when the network architecture is left unchanged, and a novel, differentiable error function is proposed.
Abstract: Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems. The impact of the loss layer of neural networks, however, has not received much attention in the context of image processing: the default and virtually only choice is $\ell _2$ . In this paper, we bring attention to alternative choices for image restoration. In particular, we show the importance of perceptually-motivated losses when the resulting image is to be evaluated by a human observer. We compare the performance of several losses, and propose a novel, differentiable error function. We show that the quality of the results improves significantly with better loss functions, even when the network architecture is left unchanged.

1,069 citations

Journal ArticleDOI
TL;DR: The ADE20K dataset as discussed by the authors contains 25k images of complex everyday scenes containing a variety of objects in their natural spatial context, on average there are 19.5 instances and 10.5 object classes per image.
Abstract: Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation. We provide baseline performances on both of the benchmarks and re-implement state-of-the-art models for open source. We further evaluate the effect of synchronized batch normalization and find that a reasonably large batch size is crucial for the semantic segmentation performance. We show that the networks trained on ADE20K are able to segment a wide variety of scenes and objects.

402 citations

Posted Content
TL;DR: This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Abstract: Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects.

399 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper discusses all of these topics, identifying key challenges for future research and preliminary 5G standardization activities, while providing a comprehensive overview of the current literature, and in particular of the papers appearing in this special issue.
Abstract: What will 5G be? What it will not be is an incremental advance on 4G. The previous four generations of cellular technology have each been a major paradigm shift that has broken backward compatibility. Indeed, 5G will need to be a paradigm shift that includes very high carrier frequencies with massive bandwidths, extreme base station and device densities, and unprecedented numbers of antennas. However, unlike the previous four generations, it will also be highly integrative: tying any new 5G air interface and spectrum together with LTE and WiFi to provide universal high-rate coverage and a seamless user experience. To support this, the core network will also have to reach unprecedented levels of flexibility and intelligence, spectrum regulation will need to be rethought and improved, and energy and cost efficiencies will become even more critical considerations. This paper discusses all of these topics, identifying key challenges for future research and preliminary 5G standardization activities, while providing a comprehensive overview of the current literature, and in particular of the papers appearing in this special issue.

6,462 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.
Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

5,721 citations

Journal ArticleDOI
TL;DR: The motivation for new mm-wave cellular systems, methodology, and hardware for measurements are presented and a variety of measurement results are offered that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.
Abstract: The global bandwidth shortage facing wireless carriers has motivated the exploration of the underutilized millimeter wave (mm-wave) frequency spectrum for future broadband cellular communication networks. There is, however, little knowledge about cellular mm-wave propagation in densely populated indoor and outdoor environments. Obtaining this information is vital for the design and operation of future fifth generation cellular networks that use the mm-wave spectrum. In this paper, we present the motivation for new mm-wave cellular systems, methodology, and hardware for measurements and offer a variety of measurement results that show 28 and 38 GHz frequencies can be used when employing steerable directional antennas at base stations and mobile devices.

5,589 citations

Posted Content
TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Abstract: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

3,664 citations