Showing papers by "Jian Sun published in 2017"

PDF

Open Access

Journal Article•DOI•

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

[...]

Shaoqing Ren¹, Kaiming He², Ross Girshick³, Jian Sun²•Institutions (3)

University of Science and Technology of China¹, Microsoft², Facebook³

01 Jun 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

...read moreread less

26,458 citations

Proceedings Article•DOI•

Channel Pruning for Accelerating Very Deep Neural Networks

[...]

Yihui He¹, Xiangyu Zhang², Jian Sun²•Institutions (2)

Carnegie Mellon University¹, Xi'an Jiaotong University²

01 Oct 2017

TL;DR: In this paper, a LASSO regression based channel selection and least square reconstruction is proposed to accelerate very deep convolutional neural networks, which achieves 5× speedup along with only 0.3% increase of error.

...read moreread less

Abstract: In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5× speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2× speedup respectively, which is significant.

...read moreread less

2,082 citations

Posted Content•

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

[...]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

04 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: An extremely computation-efficient CNN architecture named ShuffleNet is introduced, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs), to greatly reduce computation cost while maintaining accuracy.

...read moreread less

Abstract: We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.

...read moreread less

1,645 citations

Posted Content•

Channel Pruning for Accelerating Very Deep Neural Networks

[...]

Yihui He¹, Xiangyu Zhang², Jian Sun²•Institutions (2)

Carnegie Mellon University¹, Xi'an Jiaotong University²

19 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a LASSO regression based channel selection and least square reconstruction is proposed to accelerate very deep convolutional neural networks, which reduces the accumulated error and enhances the compatibility with various architectures.

...read moreread less

Abstract: In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.

...read moreread less

1,008 citations

Journal Article•DOI•

Object Detection Networks on Convolutional Feature Maps

[...]

Shaoqing Ren¹, Kaiming He², Ross Girshick³, Xiangyu Zhang⁴, Jian Sun² - Show less +1 more•Institutions (4)

University of Science and Technology of China¹, Microsoft², Facebook³, Xi'an Jiaotong University⁴

01 Jul 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a network on convolutional feature maps (NoC) is proposed for object detection, which uses shared, region-independent CNN features to improve the performance of object detection.

...read moreread less

Abstract: Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them “Networks on Convolutional feature maps” (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

...read moreread less

375 citations

Posted Content•

Light-Head R-CNN: In Defense of Two-Stage Object Detector.

[...]

Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun - Show less +2 more

20 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors' ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency and significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.

...read moreread less

Abstract: In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.

...read moreread less

273 citations

Journal Article•DOI•

Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network

[...]

Huibin Li¹, Jian Sun¹, Zongben Xu¹, Liming Chen²•Institutions (2)

Xi'an Jiaotong University¹, École centrale de Lyon²

08 Jun 2017-IEEE Transactions on Multimedia

TL;DR: This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER) and is the first work of introducing deep CNN to 3D FER and deep learning-based feature-level fusion for multi-million dollar FER.

...read moreread less

Abstract: This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER). DF-CNN comprises a feature extraction subnet, a feature fusion subnet, and a softmax layer. In particular, each textured three-dimensional (3D) face scan is represented as six types of 2D facial attribute maps (i.e., geometry map, three normal maps, curvature map, and texture map), all of which are jointly fed into DF-CNN for feature learning and fusion learning, resulting in a highly concentrated facial representation (32-dimensional). Expression prediction is performed by two ways: 1) learning linear support vector machine classifiers using the 32-dimensional fused deep features, or 2) directly performing softmax prediction using the six-dimensional expression probability vectors. Different from existing 3D FER methods, DF-CNN combines feature learning and fusion learning into a single end-to-end training framework. To demonstrate the effectiveness of DF-CNN, we conducted comprehensive experiments to compare the performance of DF-CNN with handcrafted features, pre-trained deep features, fine-tuned deep features, and state-of-the-art methods on three 3D face datasets (i.e., BU-3DFE Subset I, BU-3DFE Subset II, and Bosphorus Subset). In all cases, DF-CNN consistently achieved the best results. To the best of our knowledge, this is the first work of introducing deep CNN to 3D FER and deep learning-based feature-level fusion for multimodal 2D+3D FER.

...read moreread less

181 citations

Posted Content•

ADMM-Net: A Deep Learning Approach for Compressive Sensing MRI

[...]

Yan Yang, Jian Sun, Huibin Li, Zongben Xu

19 May 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.

...read moreread less

Abstract: Compressive sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR images from a small number of under-sampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and speed, in this paper, we propose two novel deep architectures, dubbed ADMM-Nets in basic and generalized versions. ADMM-Nets are defined over data flow graphs, which are derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a general CS-based MRI model. They take the sampled k-space data as inputs and output reconstructed MR images. Moreover, we extend our network to cope with complex-valued MR images. In the training phase, all parameters of the nets, e.g., transforms, shrinkage functions, etc., are discriminatively trained end-to-end. In the testing phase, they have computational overhead similar to ADMM algorithm but use optimized parameters learned from the data for CS-based reconstruction task. We investigate different configurations in network structures and conduct extensive experiments on MR image reconstruction under different sampling rates. Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.

...read moreread less

106 citations

Journal Article•DOI•

Unsupervised PolSAR Image Classification Using Discriminative Clustering

[...]

Haixia Bi¹, Jian Sun¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

17 Mar 2017-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: This paper designs an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov random field smoothness constraint and iteratively optimize the classifiers and class labels by alternately minimizing the energy function with respect to them.

...read moreread less

Abstract: This paper presents a novel unsupervised image classification method for polarimetric synthetic aperture radar (PolSAR) data. The proposed method is based on a discriminative clustering framework that explicitly relies on a discriminative supervised classification technique to perform unsupervised clustering. To implement this idea, we design an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov random field smoothness constraint. In this model, both the pixelwise class labels and classifiers are taken as unknown variables to be optimized. Starting from the initialized class labels generated by Cloude–Pottier decomposition and $K$ -Wishart distribution hypothesis, we iteratively optimize the classifiers and class labels by alternately minimizing the energy function with respect to them. Finally, the optimized class labels are taken as the classification result, and the classifiers for different classes are also derived as a side effect. We apply this approach to real PolSAR benchmark data. Extensive experiments justify that our approach can effectively classify the PolSAR image in an unsupervised way and produce higher accuracies than the compared state-of-the-art methods.

...read moreread less

38 citations

Journal Article•DOI•

Point Integral Method for Solving Poisson-Type Equations on Manifolds from Point Clouds with Convergence Guarantees

[...]

Zhen Li¹, Zuoqiang Shi¹, Jian Sun¹•Institutions (1)

Tsinghua University¹

01 Jul 2017-Communications in Computational Physics

TL;DR: In this article, a method called point integral method (PIM) is proposed to solve the Poisson-type equations from point clouds, where the integral equation is derived from the unknown function.

...read moreread less

Abstract: Partial differential equations (PDE) on manifolds arise in many areas, including mathematics and many applied fields. Due to the complicated geometrical structure of the manifold, it is difficult to get efficient numerical method to solve PDE on manifold. In the paper, we propose a method called point integral method (PIM) to solve the Poisson-type equations from point clouds. Among different kinds of PDEs, the Poisson-type equations including the standard Poisson equation and the related eigenproblem of the Laplace-Beltrami operator are one of the most important. In PIM, the key idea is to derive the integral equations which approximates the Poisson-type equations and contains no derivatives but only the values of the unknown function. This feature makes the integral equation easy to be discretized from point cloud. In the paper, we explain the derivation of the integral equations, describe the point integral method and its implementation, and present the numerical experiments to demonstrate the convergence of PIM.

...read moreread less

36 citations

Journal Article•DOI•

An Efficient Joint Formulation for Bayesian Face Verification

[...]

Dong Chen¹, Xudong Cao¹, David Wipf¹, Fang Wen¹, Jian Sun¹ - Show less +1 more•Institutions (1)

Microsoft¹

01 Jan 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper revisits the classical Bayesian face recognition algorithm from Baback Moghaddam et al. and proposes enhancements tailored to face verification, the problem of predicting whether or not a pair of facial images share the same identity.

...read moreread less

Abstract: This paper revisits the classical Bayesian face recognition algorithm from Baback Moghaddam et al. and proposes enhancements tailored to face verification, the problem of predicting whether or not a pair of facial images share the same identity. Like a variety of face verification algorithms, the original Bayesian face model only considers the appearance difference between two faces rather than the raw images themselves. However, we argue that such a fixed and blind projection may prematurely reduce the separability between classes. Consequently, we model two facial images jointly with an appropriate prior that considers intra- and extra-personal variations over the image pairs. This joint formulation is trained using a principled EM algorithm, while testing involves only efficient closed-formed computations that are suitable for real-time practical deployment. Supporting theoretical analyses investigate computational complexity, scale-invariance properties, and convergence issues. We also detail important relationships with existing algorithms, such as probabilistic linear discriminant analysis and metric learning. Finally, on extensive experimental evaluations, the proposed model is superior to the classical Bayesian face algorithm and many alternative state-of-the-art supervised approaches, achieving the best test accuracy on three challenging datasets, Labeled Face in Wild, Multi-PIE, and YouTube Faces, all with unparalleled computational efficiency.

...read moreread less

Proceedings Article•DOI•

Location-sensitive sparse representation of deep normal patterns for expression-robust 3D face recognition

[...]

Huibin Li¹, Jian Sun¹, Liming Chen²•Institutions (2)

Xi'an Jiaotong University¹, École centrale de Lyon²

01 Oct 2017-International Journal of Central Banking

TL;DR: The proposed location sensitive sparse representation classifier (LS-SRC) for similarity measure among deep normal patterns associated with different 3D faces, achieves significantly high performance, and reveals that the performance of 3D face recognition would be constantly improved with the aid of training deep models from massive 2D face images, which opens the door for future directions of 3d face recognition.

...read moreread less

Abstract: This paper presents a straight-forward yet efficient, and expression-robust 3D face recognition approach by exploring location sensitive sparse representation of deep normal patterns (DNP) In particular, given raw 3D facial surfaces, we first run 3D face pre-processing pipeline, including nose tip detection, face region cropping, and pose normalization The 3D coordinates of each normalized 3D facial surface are then projected into 2D plane to generate geometry images, from which three images of facial surface normal components are estimated Each normal image is then fed into a pre-trained deep face net to generate deep representations of facial surface normals, ie, deep normal patterns Considering the importance of different facial locations, we propose a location sensitive sparse representation classifier (LS-SRC) for similarity measure among deep normal patterns associated with different 3D faces Finally, simple score-level fusion of different normal components are used for the final decision The proposed approach achieves significantly high performance, and reporting rank-one scores of 9801%, 9760%, and 9613% on the FRGC v20, Bosphorus, and BU-3DFE databases when only one sample per subject is used in the gallery These experimental results reveals that the performance of 3D face recognition would be constantly improved with the aid of training deep models from massive 2D face images, which opens the door for future directions of 3D face recognition

...read moreread less

Journal Article•DOI•

Convergence of the point integral method for Laplace–Beltrami equation on point cloud

[...]

Zuoqiang Shi¹, Jian Sun¹•Institutions (1)

Tsinghua University¹

01 Dec 2017-Research in the Mathematical Sciences

TL;DR: The convergence of PIM for Poisson equation with Neumann boundary condition on submanifolds that are isometrically embedded in Euclidean spaces is analyzed.

...read moreread less

Abstract: The Laplace–Beltrami operator, a fundamental object associated with Riemannian manifolds, encodes all intrinsic geometry of manifolds and has many desirable properties Recently, we proposed the point integral method (PIM), a novel numerical method for discretizing the Laplace–Beltrami operator on point clouds (Li et al in Commun Comput Phys 22(1):228–258, 2017) In this paper, we analyze the convergence of PIM for Poisson equation with Neumann boundary condition on submanifolds that are isometrically embedded in Euclidean spaces

...read moreread less

Proceedings Article•DOI•

PolSAR image classification using discriminative clustering

[...]

Haixia Bi¹, Jian Sun¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

18 May 2017

TL;DR: This paper designs an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov Random Field (MRF) smoothness constraint, and iteratively optimize the classifiers and class labels by alternately minimizing the energy function w.r.t. them.

...read moreread less

Abstract: This paper presents a novel unsupervised image classification method for polarimetric synthetic aperture radar (PolSAR) data. The proposed method is based on a discriminative clustering framework that explicitly relies on a discriminative supervised classification technique to perform unsupervised clustering. To implement this idea, we design an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov Random Field (MRF) smoothness constraint. In this model, both the pixel-wise class labels and classifiers are taken as unknown variables to be optimized. Starting from the initialized class labels generated by Cloude-Pottier decomposition and K-Wishart distribution hypothesis, we iteratively optimize the classifiers and class labels by alternately minimizing the energy function w.r.t. them. Finally, the optimized class labels are taken as the classification result, and the classifiers for different classes are also derived as a side effect. We apply this approach to real PolSAR benchmark data. Extensive experiments justify that our approach can effectively classify the PolSAR image in an unsupervised way, and produce higher accuracies than the compared state-of-the-art methods.

...read moreread less