Showing papers by "Jian Sun published in 2019"

PDF

Open Access

Proceedings Article•DOI•

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

[...]

Hanchao Li¹, Pengfei Xiong, Haoqiang Fan, Jian Sun•Institutions (1)

03 Apr 2019

TL;DR: This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints that substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance.

...read moreread less

Abstract: This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$\times$ less FLOPs and 2$\times$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.

...read moreread less

409 citations

Proceedings Article•DOI•

Objects365: A Large-Scale, High-Quality Dataset for Object Detection

[...]

Shuai Shao, Zeming Li, Tianyuan Zhang¹, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, Jian Sun - Show less +4 more•Institutions (1)

Peking University¹

01 Oct 2019

TL;DR: Object365 can serve as a better feature learning dataset for localization-sensitive tasks like object detection and semantic segmentation and better generalization ability of Object365 has been verified on CityPersons, VOC segmentation, and ADE tasks.

...read moreread less

Abstract: In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community. Objects365 can serve as a better feature learning dataset for localization-sensitive tasks like object detection and semantic segmentation. The Objects365 pre-trained models significantly outperform ImageNet pre-trained models with 5.6 points gain (42 vs 36.4) based on the standard setting of 90K iterations on COCO benchmark. Even compared with much long training time like 540K iterations, our Objects365 pretrained model with 90K iterations still have 2.7 points gain (42 vs 39.3). Meanwhile, the finetuning time can be greatly reduced (up to 10 times) when reaching the same accuracy. Better generalization ability of Object365 has also been verified on CityPersons, VOC segmentation, and ADE tasks. The dataset as well as the pretrained-models have been released at www.objects365.org.

...read moreread less

331 citations

Posted Content•

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

[...]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Tim Cheng, Jian Sun - Show less +3 more

25 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a meta learning approach for channel pruning of deep neural networks is proposed, where the weights are directly generated by the trained PruningNet and do not need any finetuning at search time.

...read moreread less

Abstract: In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure given the target network. We use a simple stochastic structure sampling method for training the PruningNet. Then, we apply an evolutionary procedure to search for good-performing pruned networks. The search is highly efficient because the weights are directly generated by the trained PruningNet and we do not need any finetuning at search time. With a single PruningNet trained for the target network, we can search for various Pruned Networks under different constraints with little human participation. Compared to the state-of-the-art pruning methods, we have demonstrated superior performances on MobileNet V1/V2 and ResNet. Codes are available on this https URL.

...read moreread less

291 citations

Proceedings Article•DOI•

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

[...]

Zechun Liu¹, Haoyuan Mu², Xiangyu Zhang, Zichao Guo, Xin Yang³, Kwang-Ting Cheng¹, Jian Sun - Show less +3 more•Institutions (3)

Hong Kong University of Science and Technology¹, Tsinghua University², Huazhong University of Science and Technology³

01 Oct 2019

TL;DR: A novel meta learning approach for automatic channel pruning of very deep neural networks by training a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure given the target network.

...read moreread less

286 citations

Proceedings Article•DOI•

ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices

[...]

Zheng Qin¹, Zeming Li, Zhaoning Zhang¹, Yiping Bao, Gang Yu, Yuxing Peng¹, Jian Sun - Show less +3 more•Institutions (1)

National University of Defense Technology¹

01 Oct 2019

TL;DR: benefit from the highly efficient backbone and detection part design, ThunderNet surpasses previous lightweight one-stage detectors with only 40% of the computational cost on PASCAL VOC and COCO benchmarks.

...read moreread less

Abstract: Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. Prior lightweight CNN-based detectors are inclined to use one-stage pipeline. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Benefit from the highly efficient backbone and detection part design, ThunderNet surpasses previous lightweight one-stage detectors with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, ThunderNet runs at 24.1 fps on an ARM-based device with 19.2 AP on COCO. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.

...read moreread less

179 citations

Journal Article•DOI•

Optimizing a Parameterized Plug-and-Play ADMM for Iterative Low-Dose CT Reconstruction

[...]

Ji He¹, Yan Yang², Yongbo Wang¹, Dong Zeng¹, Zhaoying Bian¹, Hao Zhang³, Jian Sun², Zongben Xu², Jianhua Ma¹ - Show less +5 more•Institutions (3)

Southern Medical University¹, Xi'an Jiaotong University², Johns Hopkins University³

01 Feb 2019-IEEE Transactions on Medical Imaging

TL;DR: Experimental results obtained on clinical patient datasets demonstrate that the proposed deep learning-based strategy for MBIR can achieve promising gains over existing algorithms for LdCT image reconstruction in terms of noise-induced artifact suppression and edge detail preservation.

...read moreread less

Abstract: Reducing the exposure to X-ray radiation while maintaining a clinically acceptable image quality is desirable in various CT applications. To realize low-dose CT (LdCT) imaging, model-based iterative reconstruction (MBIR) algorithms are widely adopted, but they require proper prior knowledge assumptions in the sinogram and/or image domains and involve tedious manual optimization of multiple parameters. In this paper, we propose a deep learning (DL)-based strategy for MBIR to simultaneously address prior knowledge design and MBIR parameter selection in one optimization framework. Specifically, a parameterized plug-and-play alternating direction method of multipliers (3pADMM) is proposed for the general penalized weighted least-squares model, and then, by adopting the basic idea of DL, the parameterized plug-and-play (3p) prior and the related parameters are optimized simultaneously in a single framework using a large number of training data. The main contribution of this paper is that the 3p prior and the related parameters in the proposed 3pADMM framework can be supervised and optimized simultaneously to achieve robust LdCT reconstruction performance. Experimental results obtained on clinical patient datasets demonstrate that the proposed method can achieve promising gains over existing algorithms for LdCT image reconstruction in terms of noise-induced artifact suppression and edge detail preservation.

...read moreread less

137 citations

Posted Content•

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

[...]

Hanchao Li¹, Pengfei Xiong, Haoqiang Fan, Jian Sun•Institutions (1)

Beijing Institute of Technology¹

03 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: DFANet as discussed by the authors proposes an efficient CNN architecture based on multi-scale feature propagation, which substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability.

...read moreread less

118 citations

Posted Content•

DetNAS: Neural Architecture Search on Object Detection.

[...]

Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Chunhong Pan, Jian Sun - Show less +2 more

26 Mar 2019

TL;DR: This paper proposes DetNAS to automatically search neural architectures for the backbones of object detectors, formulated into a supernet and the search method relies on evolution algorithm (EA).

...read moreread less

Abstract: Object detectors are usually equipped with networks designed for image classification as backbones, e.g., ResNet. Although it is publicly known that there is a gap between the task of image classification and object detection, designing a suitable detector backbone is still manually exhaustive. In this paper, we propose DetNAS to automatically search neural architectures for the backbones of object detectors. In DetNAS, the search space is formulated into a supernet and the search method relies on evolution algorithm (EA). In experiments, we show the effectiveness of DetNAS on various detectors, the one-stage detector, RetinaNet, and the twostage detector, FPN. For each case, we search in both training from scratch scheme and ImageNet pre-training scheme. There is a consistent superiority compared to the architectures searched on ImageNet classification. Our main result architecture achieves better performance than ResNet-101 on COCO with the FPN detector. In addition, we illustrate the architectures searched by DetNAS and find some meaningful patterns.

...read moreread less

112 citations

Journal Article•DOI•

A Graph-Based Semisupervised Deep Learning Model for PolSAR Image Classification

[...]

Haixia Bi¹, Jian Sun¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

01 Apr 2019-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A graph-based semisupervised deep learning model for PolSAR image classification that enforces the category label constraints on the human-labeled pixels and encourages class label smoothness and the alignment of class label boundaries with the image edges.

...read moreread less

Abstract: Aiming at improving the classification accuracy with limited numbers of labeled pixels in polarimetric synthetic aperture radar (PolSAR) image classification task, this paper presents a graph-based semisupervised deep learning model for PolSAR image classification. It models the PolSAR image as an undirected graph, where the nodes correspond to the labeled and unlabeled pixels, and the weighted edges represent similarities between the pixels. Upon the graph, we design an energy function incorporating a semisupervision term, a convolutional neural network (CNN) term, and a pairwise smoothness term. The employed CNN extracts abstract and data-driven polarimetric features and outputs class label predictions to the graph model. The semisupervision term enforces the category label constraints on the human-labeled pixels. The pairwise smoothness term encourages class label smoothness and the alignment of class label boundaries with the image edges. Starting from an initialized class label map generated based on $K$ -Wishart distribution hypothesis or superpixel segmentation of PauliRGB images, we iteratively and alternately optimize the defined energy function until it converges. We conducted experiments on two real benchmark PolSAR images, and extensive experiments demonstrated that our approach achieved the state-of-the-art results for PolSAR image classification.

...read moreread less

102 citations

Proceedings Article•DOI•

Disentangled Image Matting

[...]

Shaofan Cai, Xiaoshuai Zhang¹, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu¹, Jue Wang, Jian Sun - Show less +5 more•Institutions (1)

Peking University¹

01 Oct 2019

TL;DR: AdaMatting as discussed by the authors disentangles the matting problem into two sub-tasks: trimap adaptation and alpha estimation, which is a pixel-wise classification problem that infers the global structure of the input image.

...read moreread less

Abstract: Most previous image matting methods require a roughly-specificed trimap as input, and estimate fractional alpha values for all pixels that are in the unknown region of the trimap. In this paper, we argue that directly estimating the alpha matte from a coarse trimap is a major limitation of previous methods, as this practice tries to address two difficult and inherently different problems at the same time: identifying true blending pixels inside the trimap region, and estimate accurate alpha values for them. We propose AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation. Trimap adaptation is a pixel-wise classification problem that infers the global structure of the input image by identifying definite foreground, background, and semi-transparent image regions. Alpha estimation is a regression problem that calculates the opacity value of each blended pixel. Our method separately handles these two sub-tasks within a single deep convolutional neural network (CNN). Extensive experiments show that AdaMatting has additional structure awareness and trimap fault-tolerance. Our method achieves the state-of-the-art performance on Adobe Composition-1k dataset both qualitatively and quantitatively. It is also the current best-performing method on the alphamatting.com online evaluation for all commonly-used metrics.

...read moreread less

77 citations

Posted Content•

Disentangled Image Matting

[...]

Shaofan Cai, Xiaoshuai Zhang¹, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu¹, Jue Wang, Jian Sun - Show less +5 more•Institutions (1)

Peking University¹

10 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation, which achieves the state-of-the-art performance on Adobe Composition-1k dataset both qualitatively and quantitatively.

...read moreread less

Abstract: Most previous image matting methods require a roughly-specificed trimap as input, and estimate fractional alpha values for all pixels that are in the unknown region of the trimap. In this paper, we argue that directly estimating the alpha matte from a coarse trimap is a major limitation of previous methods, as this practice tries to address two difficult and inherently different problems at the same time: identifying true blending pixels inside the trimap region, and estimate accurate alpha values for them. We propose AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation. Trimap adaptation is a pixel-wise classification problem that infers the global structure of the input image by identifying definite foreground, background, and semi-transparent image regions. Alpha estimation is a regression problem that calculates the opacity value of each blended pixel. Our method separately handles these two sub-tasks within a single deep convolutional neural network (CNN). Extensive experiments show that AdaMatting has additional structure awareness and trimap fault-tolerance. Our method achieves the state-of-the-art performance on Adobe Composition-1k dataset both qualitatively and quantitatively. It is also the current best-performing method on the this http URL online evaluation for all commonly-used metrics.

...read moreread less

Posted Content•

ThunderNet: Towards Real-time Generic Object Detection

[...]

Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun - Show less +3 more

28 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper investigates the effectiveness of two- stage detectors in real-time generic detection and proposes a lightweight two-stage detector named ThunderNet, which achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks.

...read moreread less

Abstract: Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.

...read moreread less

Posted Content•

Content-Aware Unsupervised Deep Homography Estimation

[...]

Jirong Zhang¹, Chuan Wang, Shuaicheng Liu¹, Lanpeng Jia, Nianjin Ye, Jue Wang, Ji Zhou¹, Jian Sun - Show less +4 more•Institutions (1)

University of Electronic Science and Technology of China¹

12 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an unsupervised deep homography method with a new architecture design that outperforms the state-of-the-art including deep solutions and feature-based solutions.

...read moreread less

Abstract: Homography estimation is a basic image alignment method in many applications. It is usually conducted by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep homography approaches use either synthetic images for supervised learning or aerial images for unsupervised learning, both ignoring the importance of handling depth disparities and moving objects in real world applications. To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design. In the spirit of the RANSAC procedure in traditional methods, we specifically learn an outlier mask to only select reliable regions for homography estimation. We calculate loss with respect to our learned deep features instead of directly comparing image content as did previously. To achieve the unsupervised training, we also formulate a novel triplet loss customized for our network. We verify our method by conducting comprehensive comparisons on a new dataset that covers a wide range of scenes with varying degrees of difficulties for the task. Experimental results reveal that our method outperforms the state-of-the-art including deep solutions and feature-based solutions.

...read moreread less

Journal Article•DOI•

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

[...]

Shipeng Wang¹, Jian Sun¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

17 Jul 2019

TL;DR: HyperAdam as discussed by the authors combines the idea of learning to optimize and traditional Adam optimizer, which is the state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

...read moreread less

Abstract: Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

...read moreread less

Posted Content•

PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

[...]

Yisheng He¹, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, Jian Sun - Show less +2 more•Institutions (1)

Hong Kong University of Science and Technology¹

11 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep Hough voting network is proposed to detect 3D keypoints of objects and then estimate the 6D pose parameters within a least-squares fitting manner, which is a natural extension of 2D-keypoint approaches that successfully work on RGB based 6DoF estimation.

...read moreread less

Abstract: In this work, we present a novel data-driven method for robust 6DoF object pose estimation from a single RGBD image. Unlike previous methods that directly regressing pose parameters, we tackle this challenging task with a keypoint-based approach. Specifically, we propose a deep Hough voting network to detect 3D keypoints of objects and then estimate the 6D pose parameters within a least-squares fitting manner. Our method is a natural extension of 2D-keypoint approaches that successfully work on RGB based 6DoF estimation. It allows us to fully utilize the geometric constraint of rigid objects with the extra depth information and is easy for a network to learn and optimize. Extensive experiments were conducted to demonstrate the effectiveness of 3D-keypoint detection in the 6D pose estimation task. Experimental results also show our method outperforms the state-of-the-art methods by large margins on several benchmarks. Code and video are available at this https URL.

...read moreread less

Book Chapter•DOI•

A Prior Learning Network for Joint Image and Sensitivity Estimation in Parallel MR Imaging

[...]

Nan Meng¹, Yan Yang¹, Zongben Xu¹, Jian Sun¹•Institutions (1)

Xi'an Jiaotong University¹

13 Oct 2019

TL;DR: A novel deep network is proposed, dubbed as Blind-PMRI-Net, to simultaneously reconstruct the MR image and sensitivity maps in a blind setting for parallel imaging, which naturally combines the physical constraint of parallel imaging and prior learning in a single deep architecture.

...read moreread less

Abstract: Parallel imaging is a fast magnetic resonance imaging technique through spatial sensitivity coding using multi-coils. To reconstruct a high quality MR image from under-sampled k-space data, we propose a novel deep network, dubbed as Blind-PMRI-Net, to simultaneously reconstruct the MR image and sensitivity maps in a blind setting for parallel imaging. The Blind-PMRI-Net is a novel deep architecture inspired by the iterative algorithm optimizing a novel energy model for joint image and sensitivity estimation based on image and sensitivity priors. The network is designed to be able to automatically learn these two priors by learning their corresponding proximal operators using convolutional neural networks. Blind-PMRI-Net naturally combines the physical constraint of parallel imaging and prior learning in a single deep architecture. Experiments on a knee MRI dataset show that our network can effectively reconstruct MR image with improved accuracy than previous methods, with fast computational speed. For example, Blind-PMRI-Net takes 0.72 s on GPU to reconstruct 15-channel sensitivity maps and a complex-valued MR image in size of $320\times 320$.

...read moreread less

Proceedings Article•

Neural Diffusion Distance for Image Segmentation

[...]

Jian Sun¹, Zongben Xu²•Institutions (2)

Tsinghua University¹, Xi'an Jiaotong University²

01 Jan 2019

TL;DR: With the learned diffusion distance, a hierarchical image segmentation method outperforming previous segmentation methods is proposed and achieved promising results on PASCAL VOC 2012 segmentation dataset.

...read moreread less

Abstract: Diffusion distance is a spectral method for measuring distance among nodes on graph considering global data structure. In this work, we propose a spec-diff-net for computing diffusion distance on graph based on approximate spectral decomposition. The network is a differentiable deep architecture consisting of feature extraction and diffusion distance modules for computing diffusion distance on image by end-to-end training. We design low resolution kernel matching loss and high resolution segment matching loss to enforce the network's output to be consistent with human-labeled image segments. To compute high-resolution diffusion distance or segmentation mask, we design an up-sampling strategy by feature-attentional interpolation which can be learned when training spec-diff-net. With the learned diffusion distance, we propose a hierarchical image segmentation method outperforming previous segmentation methods. Moreover, a weakly supervised semantic segmentation network is designed using diffusion distance and achieved promising results on PASCAL VOC 2012 segmentation dataset.

...read moreread less

Posted Content•

HRGE-Net: Hierarchical Relational Graph Embedding Network for Multi-view 3D Shape Recognition.

[...]

Xin Wei¹, Ruixuan Yu¹, Jian Sun¹•Institutions (1)

Xi'an Jiaotong University¹

27 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work builds a hierarchical relational graph embedding network (HRGE-Net) to aggregate the multi-view features extracted from 2D images to be a global shape descriptor and proposes a novel feature aggregation network by fully investigating the relations among views.

...read moreread less

Abstract: View-based approach that recognizes 3D shape through its projected 2D images achieved state-of-the-art performance for 3D shape recognition. One essential challenge for view-based approach is how to aggregate the multi-view features extracted from 2D images to be a global 3D shape descriptor. In this work, we propose a novel feature aggregation network by fully investigating the relations among views. We construct a relational graph with multi-view images as nodes, and design relational graph embedding by modeling pairwise and neighboring relations among views. By gradually coarsening the graph, we build a hierarchical relational graph embedding network (HRGE-Net) to aggregate the multi-view features to be a global shape descriptor. Extensive experiments show that HRGE-Net achieves stateof-the-art performance for 3D shape classification and retrieval on benchmark datasets.

...read moreread less