Showing papers by "Jian Sun published in 2018"

PDF

Open Access

Proceedings Article•DOI•

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

[...]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

18 Jun 2018

TL;DR: ShuffleNet as discussed by the authors utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy, and achieves an actual speedup over AlexNet while maintaining comparable accuracy.

...read moreread less

Abstract: We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet [12] on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13A— actual speedup over AlexNet while maintaining comparable accuracy.

...read moreread less

4,503 citations

Book Chapter•DOI•

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

[...]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng¹, Jian Sun•Institutions (1)

Tsinghua University¹

08 Sep 2018

TL;DR: ShuffleNet V2 as discussed by the authors proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs, based on a series of controlled experiments, and derives several practical guidelines for efficient network design.

...read moreread less

Abstract: Currently, the neural network architecture design is mostly guided by the indirect metric of computation complexity, i.e., FLOPs. However, the direct metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical guidelines for efficient network design. Accordingly, a new architecture is presented, called ShuffleNet V2. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

...read moreread less

3,393 citations

Posted Content•

CrowdHuman: A Benchmark for Detecting Human in a Crowd

[...]

Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun - Show less +3 more

30 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles.

...read moreread less

Abstract: Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of $470K$ human instances from the train and validation subsets, and $~22.6$ persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.

...read moreread less

386 citations

Book Chapter•DOI•

ExFuse: Enhancing Feature Fusion for Semantic Segmentation

[...]

Zhenli Zhang¹, Xiangyu Zhang, Chao Peng, Xiangyang Xue¹, Jian Sun - Show less +1 more•Institutions (1)

Fudan University¹

08 Sep 2018

TL;DR: A new framework, named ExFuse, is proposed to bridge the gap between low-level and high-level features and significantly improve the segmentation quality, which outperforms the previous state-of-the-art results.

...read moreread less

Abstract: Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance. In this paper, we first point out that a simple fusion of low-level and high-level features could be less effective because of the gap in semantic levels and spatial resolution. We find that introducing semantic information into low-level features and high-resolution details into high-level features is more effective for the later fusion. Based on this observation, we propose a new framework, named ExFuse, to bridge the gap between low-level and high-level features thus significantly improve the segmentation quality by 4.0% in total. Furthermore, we evaluate our approach on the challenging PASCAL VOC 2012 segmentation benchmark and achieve 87.9% mean IoU, which outperforms the previous state-of-the-art results.

...read moreread less

349 citations

Posted Content•

ExFuse: Enhancing Feature Fusion for Semantic Segmentation

[...]

Zhenli Zhang, Xiangyu Zhang, Chao Peng, Dazhi Cheng, Jian Sun - Show less +1 more

11 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Xia et al. as mentioned in this paper proposed a new framework, named ExFuse, to bridge the gap between low-level and high-level features to improve the segmentation quality.

...read moreread less

Abstract: Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance. In this paper, we first point out that a simple fusion of low-level and high-level features could be less effective because of the gap in semantic levels and spatial resolution. We find that introducing semantic information into low-level features and high-resolution details into high-level features is more effective for the later fusion. Based on this observation, we propose a new framework, named ExFuse, to bridge the gap between low-level and high-level features thus significantly improve the segmentation quality by 4.0\% in total. Furthermore, we evaluate our approach on the challenging PASCAL VOC 2012 segmentation benchmark and achieve 87.9\% mean IoU, which outperforms the previous state-of-the-art results.

...read moreread less

296 citations

Posted Content•

DetNet: A Backbone network for Object Detection.

[...]

Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun - Show less +2 more

17 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: State-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on the DetNet~(4.8G FLOPs) backbone.

...read moreread less

Abstract: Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. The code will be released for the reproduction.

...read moreread less

238 citations

Book Chapter•DOI•

Proximal Dehaze-Net: A Prior Learning-Based Deep Network for Single Image Dehazing

[...]

Dong Yang¹, Jian Sun¹•Institutions (1)

Xi'an Jiaotong University¹

08 Sep 2018

TL;DR: A novel deep learning approach for single image dehazing by learning dark channel and transmission priors and incorporating haze-related prior learning into deep network is proposed.

...read moreread less

Abstract: Photos taken in hazy weather are usually covered with white masks and often lose important details. In this paper, we propose a novel deep learning approach for single image dehazing by learning dark channel and transmission priors. First, we build an energy model for dehazing using dark channel and transmission priors and design an iterative optimization algorithm using proximal operators for these two priors. Second, we unfold the iterative algorithm to be a deep network, dubbed as proximal dehaze-net, by learning the proximal operators using convolutional neural networks. Our network combines the advantages of traditional prior-based dehazing methods and deep learning methods by incorporating haze-related prior learning into deep network. Experiments show that our method achieves state-of-the-art performance for single image dehazing.

...read moreread less

234 citations

Book Chapter•DOI•

DetNet: Design Backbone for Object Detection

[...]

Zeming Li¹, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng¹, Jian Sun - Show less +2 more•Institutions (1)

Tsinghua University¹

08 Sep 2018

TL;DR: DetNet is proposed, which is a novel backbone network specifically designed for object detection that includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers.

...read moreread less

Abstract: Recent CNN based object detectors, either one-stage methods like YOLO, SSD, and RetinaNet, or two-stage detectors like Faster R-CNN, R-FCN and FPN, are usually trying to directly finetune from ImageNet pre-trained models designed for the task of image classification. However, there has been little work discussing the backbone feature extractor specifically designed for the task of object detection. More importantly, there are several differences between the tasks of image classification and object detection. (i) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. (ii) Object detection not only needs to recognize the category of the object instances but also spatially locate them. Large downsampling factors bring large valid receptive field, which is good for image classification, but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet (4.8G FLOPs) backbone. Codes will be released (https://github.com/zengarden/DetNet).

...read moreread less

233 citations

Posted Content•

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

[...]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng¹, Jian Sun•Institutions (1)

Tsinghua University¹

30 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs, and derives several practical guidelines for efficient network design, called ShuffleNet V2.

...read moreread less

Abstract: Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

...read moreread less

157 citations

Journal Article•DOI•

BM3D-Net: A Convolutional Neural Network for Transform-Domain Collaborative Filtering

[...]

Dong Yang¹, Jian Sun¹•Institutions (1)

Xi'an Jiaotong University¹

01 Jan 2018-IEEE Signal Processing Letters

TL;DR: This letter unrolls the computational pipeline of BM3D algorithm into a convolutional neural network structure, with “extraction” and “aggregation” layers to model block matching stage in BM2D, and proposes a new convolutionAL neural network inspired by the classical BM3d algorithm, dubbed as BM3 D-Net.

...read moreread less

Abstract: Denoising is a fundamental task in image processing with wide applications for enhancing image qualities. BM3D is considered as an effective baseline for image denoising. Although learning-based methods have been dominant in this area recently, the traditional methods are still valuable to inspire new ideas by combining with learning-based approaches. In this letter, we propose a new convolutional neural network inspired by the classical BM3D algorithm, dubbed as BM3D-Net. We unroll the computational pipeline of BM3D algorithm into a convolutional neural network structure, with “extraction” and “aggregation” layers to model block matching stage in BM3D. We apply our network to three denoising tasks: gray-scale image denoising, color image denoising, and depth map denoising. Experiments show that BM3D-Net significantly outperforms the basic BM3D method, and achieves competitive results compared with state of the art on these tasks.

...read moreread less

139 citations

Journal Article•DOI•

A discrete uniformization theorem for polyhedral surfaces

[...]

Xianfeng David Gu¹, Feng Luo², Jian Sun³, Tianqi Wu⁴•Institutions (4)

Stony Brook University¹, Rutgers University², Tsinghua University³, New York University⁴

01 Jun 2018-Journal of Differential Geometry

TL;DR: In this paper, a notion of discrete conformality for hyperbolic polyhedral surfaces is introduced, which is shown to be computable and can be obtained using a discrete Yamabe flow with surgery.

...read moreread less

Abstract: A notion of discrete conformality for hyperbolic polyhedral surfaces is introduced in this paper. This discrete conformality is shown to be computable. It is proved that each hyperbolic polyhedral metric on a closed surface is discrete conformal to a unique hyperbolic polyhedral metric with a given discrete curvature satisfying Gauss–Bonnet formula. Furthermore, the hyperbolic polyhedral metric with given curvature can be obtained using a discrete Yamabe flow with surgery. In particular, each hyperbolic polyhedral metric on a closed surface with negative Euler characteristic is discrete conformal to a unique hyperbolic metric.

...read moreread less

Book Chapter•DOI•

Unpaired Brain MR-to-CT Synthesis Using a Structure-Constrained CycleGAN

[...]

Heran Yang¹, Heran Yang², Jian Sun², Aaron Carass¹, Can Zhao¹, Junghoon Lee¹, Zongben Xu², Jerry L. Prince¹ - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Xi'an Jiaotong University²

20 Sep 2018

TL;DR: In this article, a structure-constrained cycleGAN was proposed for brain MR-to-CT synthesis using unpaired data that defines an extra structure-consistency loss based on the modality independent neighborhood descriptor to constrain structural consistency.

...read moreread less

Abstract: The cycleGAN is becoming an influential method in medical image synthesis However, due to a lack of direct constraints between input and synthetic images, the cycleGAN cannot guarantee structural consistency between these two images, and such consistency is of extreme importance in medical imaging To overcome this, we propose a structure-constrained cycleGAN for brain MR-to-CT synthesis using unpaired data that defines an extra structure-consistency loss based on the modality independent neighborhood descriptor to constrain structural consistency Additionally, we use a position-based selection strategy for selecting training images instead of a completely random selection scheme Experimental results on synthesizing CT images from brain MR images demonstrate that our method is better than the conventional cycleGAN and approximates the cycleGAN trained with paired data

...read moreread less

Journal Article•DOI•

Model-driven deep-learning

[...]

Zongben Xu¹, Jian Sun¹•Institutions (1)

Xi'an Jiaotong University¹

01 Jan 2018-National Science Review

Posted Content•

Unpaired Brain MR-to-CT Synthesis using a Structure-Constrained CycleGAN

[...]

Heran Yang¹, Heran Yang², Jian Sun², Aaron Carass¹, Can Zhao¹, Junghoon Lee¹, Zongben Xu², Jerry L. Prince¹ - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Xi'an Jiaotong University²

12 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A structure-constrained cycleGAN is proposed for brain MR-to-CT synthesis using unpaired data that defines an extra structure-consistency loss based on the modality independent neighborhood descriptor to constrain structural consistency.

...read moreread less

Abstract: The cycleGAN is becoming an influential method in medical image synthesis. However, due to a lack of direct constraints between input and synthetic images, the cycleGAN cannot guarantee structural consistency between these two images, and such consistency is of extreme importance in medical imaging. To overcome this, we propose a structure-constrained cycleGAN for brain MR-to-CT synthesis using unpaired data that defines an extra structure-consistency loss based on the modality independent neighborhood descriptor to constrain structural consistency. Additionally, we use a position-based selection strategy for selecting training images instead of a completely random selection scheme. Experimental results on synthesizing CT images from brain MR images demonstrate that our method is better than the conventional cycleGAN and approximates the cycleGAN trained with paired data.

...read moreread less

Proceedings Article•DOI•

Unsupervised Domain Adaptation with Regularized Optimal Transport for Multimodal 2D+3D Facial Expression Recognition

[...]

Xiaofan Wei¹, Huibin Li¹, Jian Sun¹, Liming Chen²•Institutions (2)

Xi'an Jiaotong University¹, École centrale de Lyon²

15 May 2018

TL;DR: Experimental results demonstrate that the proposed unsupervised domain adaptation with regularized optimal transport for multimodal 2D+3D Facial Expression Recognition can achieve superior performance compared with the state-of-the-art methods.

...read moreread less

Abstract: Since human expressions have strong flexibility and personality, subject-independent facial expression recognition is a typical data bias problem. To address this problem, we propose a novel approach, namely unsupervised domain adaptation with regularized optimal transport for multimodal 2D+3D Facial Expression Recognition (FER). In particular, Wasserstein distance is employed to measure the distribution inconsistency between the training samples (i.e. source domain) and test samples (i.e. target domain). Minimization of this Wasserstein distance is equivalent to finding an optimal transport mapping from training to test samples. Once we find this mapping, original training samples can be transformed into a new space in which the distributions of the mapped training samples and the test samples can be well-aligned. In this case, classifier learned from the transformed training samples can be well generalized to the test samples for expression prediction. In practice, approximate optimal transport can be effectively solved by adding entropy regularization. To fully explore the class label information of training samples, group sparsity regularizer is also used to enforce that the training samples from the same expression class can be mapped to the same group. Experimental results evaluated on the BU-3DFE and Bosphorus databases demonstrate that the proposed approach can achieve superior performance compared with the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Neural multi-atlas label fusion: Application to cardiac MR images.

[...]

Heran Yang¹, Jian Sun¹, Huibin Li¹, Lisheng Wang², Zongben Xu¹ - Show less +1 more•Institutions (2)

Xi'an Jiaotong University¹, Shanghai Jiao Tong University²

01 Oct 2018-Medical Image Analysis

TL;DR: The proposed novel multi‐atlas segmentation method, dubbed deep fusion net (DFN), is a deep architecture that integrates a feature extraction subnet and a non‐local patch‐based label fusion (NL‐PLF) subnet in a single network.

...read moreread less

Book Chapter•DOI•

GridFace: Face Rectification via Learning Local Homography Transformations

[...]

Erjin Zhou, Zhimin Cao, Jian Sun

08 Sep 2018

TL;DR: This paper proposes a method, called GridFace, to reduce facial geometric variations and improve the recognition performance, which rectifies the face by local homography transformations, which are estimated by a face rectification network.

...read moreread less

Abstract: In this paper, we propose a method, called GridFace, to reduce facial geometric variations and improve the recognition performance. Our method rectifies the face by local homography transformations, which are estimated by a face rectification network. To encourage the image generation with canonical views, we apply a regularization based on the natural face distribution. We learn the rectification network and recognition network in an end-to-end manner. Extensive experiments show our method greatly reduces geometric variations, and gains significant improvements in unconstrained face recognition scenarios.

...read moreread less

Journal Article•DOI•

The role of Th1/Th2 cytokines played in regulation of specific CD4 + Th1 cell conversion and activation during inflammatory reaction of chronic obstructive pulmonary disease

[...]

Jian Sun¹, Ting Liu¹, Y. Yan, K. Huo¹, Wanggang Zhang¹, Hongli Liu¹, Zhongqi Shi¹ - Show less +3 more•Institutions (1)

Xi'an Jiaotong University¹

01 Jul 2018-Scandinavian Journal of Immunology

TL;DR: The hypothesis that elastin exposure might serve as an antigen to initiate the stimulation of CD4 + Th1‐CXCR3 immune inflammation pathway is confirmed and the CD4+Th1‐specific conversion and activation may be an initiator of COPD immune inflammatory response.

...read moreread less

Abstract: CD4 + Th1-CXCR3 signalling pathway may play a key role in chronic obstructive pulmonary disease (COPD). The aim of this study was to explore Th1/Th2 cytokines ratio differences in patients in different stages of COPD and to confirm the hypothesis that elastin exposure might serve as an antigen to initiate the stimulation of CD4 + Th1-CXCR3 immune inflammation pathway. Patients of COPD in different stages and normal individuals were enrolled. Ten millilitres of peripheral blood was drawn from patients. The concentration of CXCR3, IFN-γ, IL-2, IL-4 and IL-13 in plasma was detected by ELISA. The Naive CD4+ T cells were isolated from the peripheral blood mononuclear cells, which were stimulated by elastin and collagen before determining the level of IFN-γ secretion by ELISPOT. Compared with control group, the concentration of CXCR3 in the acute exacerbation COPD (AECOPD) group was higher (P < .05). The concentration of IFN-γ and IL-2 in AECOPD group was lower than that in remission (P < .05). The concentration of IFN-γ in the AECOPD and remission was higher than that in controls (P < .05), while IL-2 was opposite (P < .01). The concentration of IL-4 and IL-13 in AECOPD group was higher than that in the controls (P < .05). The CD4+ Th1 cells stimulated by the elastin as antigen secreted more IFN-γ than that by collagen (P < .01). CXCR3 was highly expressed in patients with COPD. There were different Th1/Th2 cytokines in different stages of COPD. The CD4+Th1-specific conversion and activation may be an initiator of COPD immune inflammatory response.

...read moreread less

Proceedings Article•DOI•

LdCT-Net: low-dose CT image reconstruction strategy driven by a deep dual network

[...]

Ji He¹, Yongbo Wang¹, Yan Yang², Zhaoying Bian¹, Dong Zeng¹, Jian Sun², Zongben Xu², Jianhua Ma¹ - Show less +4 more•Institutions (2)

Southern Medical University¹, Xi'an Jiaotong University²

09 Mar 2018

TL;DR: This work presents a low-dose CT image reconstruction strategy driven by a deep dual network (LdCT-Net) to yield high-quality CT images by incorporating both projection information and image information simultaneously.

...read moreread less

Abstract: High radiation dose in CT imaging is a major concern, which could result in increased lifetime risk of cancers. Therefore, to reduce the radiation dose at the same time maintaining clinically acceptable CT image quality is desirable in CT application. One of the most successful strategies is to apply statistical iterative reconstruction (SIR) to obtain promising CT images at low dose. Although the SIR algorithms are effective, they usually have three disadvantages: 1) desired-image prior design; 2) optimal parameters selection; and 3) high computation burden. To address these three issues, in this work, inspired by the deep learning network for inverse problem, we present a low-dose CT image reconstruction strategy driven by a deep dual network (LdCT-Net) to yield high-quality CT images by incorporating both projection information and image information simultaneously. Specifically, the present LdCT-Net effectively reconstructs CT images by adequately taking into account the information learned in dual-domain, i.e., projection domain and image domain, simultaneously. The experiment results on patients data demonstrated the present LdCT-Net can achieve promising gains over other existing algorithms in terms of noise-induced artifacts suppression and edge details preservation.

...read moreread less

Journal Article•DOI•

Surface Reconstruction Based on the Modified Gauss Formula

[...]

Wenjia Lu¹, Zuoqiang Shi¹, Jian Sun¹, Bin Wang¹•Institutions (1)

Tsinghua University¹

14 Dec 2018-ACM Transactions on Graphics

TL;DR: A surface reconstruction method that has excellent performance despite nonuniformly distributed, noisy, and sparse data is introduced and can be parallelized with small overhead and shows compelling performance in a GPU version by implementing this direct and simple approach.

...read moreread less

Abstract: In this article, we introduce a surface reconstruction method that has excellent performance despite nonuniformly distributed, noisy, and sparse data. We reconstruct the surface by estimating an implicit function and then obtain a triangle mesh by extracting an iso-surface. Our implicit function takes advantage of both the indicator function and the signed distance function. The implicit function is dominated by the indicator function at the regions away from the surface and is approximated (up to scaling) by the signed distance function near the surface. On one hand, the implicit function is well defined over the entire space for the extracted iso-surface to remain near the underlying true surface. On the other hand, a smooth iso-surface can be extracted using the marching cubes algorithm with simple linear interpolations due to the properties of the signed distance function. Moreover, our implicit function can be estimated directly from an explicit integral formula without solving any linear system. An approach called disk integration is also incorporated to improve the accuracy of the implicit function. Our method can be parallelized with small overhead and shows compelling performance in a GPU version by implementing this direct and simple approach. We apply our method to synthetic and real-world scanned data to demonstrate the accuracy, noise resilience, and efficiency of this method. The performance of the proposed method is also compared with several state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Harmonic Extension on The Point Cloud

[...]

Zuoqiang Shi, Jian Sun, Minghao Tian

06 Feb 2018-Multiscale Modeling & Simulation

TL;DR: The harmonic extension problem is considered, which is widely used in many applications of machine learning, and is formulated as solving a Laplace--Beltrami equation.

...read moreread less

Abstract: In this paper, we consider the harmonic extension problem, which is widely used in many applications of machine learning. We formulate the harmonic extension as solving a Laplace--Beltrami equation...

...read moreread less

Journal Article•DOI•

Surface reconstruction from unorganized points with l0 gradient minimization

[...]

Huibin Li¹, Yibao Li¹, Ruixuan Yu¹, Jian Sun¹, Junseok Kim² - Show less +1 more•Institutions (2)

Xi'an Jiaotong University¹, Korea University²

01 Apr 2018-Computer Vision and Image Understanding

TL;DR: A novel efficient and fast method by using l0 gradient minimization, which can directly measure the sparsity of a solution and produce sharper surfaces is proposed, which is particularly effective for sharpening major edges and removing noise.

...read moreread less

Posted Content•

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

[...]

Shipeng Wang¹, Jian Sun¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

22 Nov 2018-arXiv: Learning

TL;DR: A new optimizer, dubbed as HyperAdam, is proposed that combines the idea of "learning to optimize" and traditional Adam optimizer and is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

...read moreread less

Abstract: Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of "learning to optimize" and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

...read moreread less

Posted Content•

GridFace: Face Rectification via Learning Local Homography Transformations

[...]

Erjin Zhou, Zhimin Cao, Jian Sun

19 Aug 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a method, called GridFace, to reduce facial geometric variations and improve the recognition performance, which rectifies the face by local homography transformations, which are estimated by a face rectification network.

...read moreread less

Journal Article•DOI•

A tensor-based nonlocal total variation model for multi-channel image recovery

[...]

Wenfei Cao¹, Jing Yao², Jian Sun², Guodong Han¹•Institutions (2)

Shaanxi Normal University¹, Xi'an Jiaotong University²

01 Dec 2018-Signal Processing

TL;DR: Extensive experimental results demonstrate that the proposed regularizer is systematically superior over other competing local and nonlocal regularization approaches, both quantitatively and visually.

...read moreread less

Book Chapter•DOI•

Learning Spectral Transform Network on 3D Surface for Non-rigid Shape Analysis

[...]

Ruixuan Yu¹, Jian Sun¹, Huibin Li¹•Institutions (1)

Xi'an Jiaotong University¹

08 Sep 2018

TL;DR: This work proposes a novel spectral transform network on 3D surface to learn shape descriptors that achieved the highest accuracies on SHREC’14, 15 datasets as well as the “range” subset of SHREC'17 dataset.

...read moreread less

Abstract: Designing a network on 3D surface for non-rigid shape analysis is a challenging task. In this work, we propose a novel spectral transform network on 3D surface to learn shape descriptors. The proposed network architecture consists of four stages: raw descriptor extraction, surface second-order pooling, mixture of power function-based spectral transform, and metric learning. The proposed network is simple and shallow. Quantitative experiments on challenging benchmarks show its effectiveness for non-rigid shape retrieval and classification, e.g., it achieved the highest accuracies on SHREC’14, 15 datasets as well as the “range” subset of SHREC’17 dataset.

...read moreread less

Posted Content•

Learning Spectral Transform Network on 3D Surface for Non-rigid Shape Analysis

[...]

Ruixuan Yu¹, Jian Sun¹, Huibin Li¹•Institutions (1)

Xi'an Jiaotong University¹

21 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as discussed by the authors proposed a spectral transform network on 3D surface to learn shape descriptors, which achieved the highest accuracies on SHREC14, 15 datasets as well as the Range subset of SHREC17 dataset.

...read moreread less

Abstract: Designing a network on 3D surface for non-rigid shape analysis is a challenging task. In this work, we propose a novel spectral transform network on 3D surface to learn shape descriptors. The proposed network architecture consists of four stages: raw descriptor extraction, surface second-order pooling, mixture of power function-based spectral transform, and metric learning. The proposed network is simple and shallow. Quantitative experiments on challenging benchmarks show its effectiveness for non-rigid shape retrieval and classification, e.g., it achieved the highest accuracies on SHREC14, 15 datasets as well as the Range subset of SHREC17 dataset.

...read moreread less