scispace - formally typeset
Search or ask a question

Showing papers by "Jian Sun published in 2017"


Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, a LASSO regression based channel selection and least square reconstruction is proposed to accelerate very deep convolutional neural networks, which achieves 5× speedup along with only 0.3% increase of error.
Abstract: In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5× speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2× speedup respectively, which is significant.

2,082 citations


Posted Content
TL;DR: An extremely computation-efficient CNN architecture named ShuffleNet is introduced, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs), to greatly reduce computation cost while maintaining accuracy.
Abstract: We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.

1,645 citations


Posted Content
TL;DR: In this article, a LASSO regression based channel selection and least square reconstruction is proposed to accelerate very deep convolutional neural networks, which reduces the accumulated error and enhances the compatibility with various architectures.
Abstract: In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.

1,008 citations


Journal ArticleDOI
TL;DR: In this article, a network on convolutional feature maps (NoC) is proposed for object detection, which uses shared, region-independent CNN features to improve the performance of object detection.
Abstract: Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them “Networks on Convolutional feature maps” (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

375 citations


Posted Content
TL;DR: The authors' ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency and significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.
Abstract: In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.

273 citations


Journal ArticleDOI
TL;DR: This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER) and is the first work of introducing deep CNN to 3D FER and deep learning-based feature-level fusion for multi-million dollar FER.
Abstract: This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER). DF-CNN comprises a feature extraction subnet, a feature fusion subnet, and a softmax layer. In particular, each textured three-dimensional (3D) face scan is represented as six types of 2D facial attribute maps (i.e., geometry map, three normal maps, curvature map, and texture map), all of which are jointly fed into DF-CNN for feature learning and fusion learning, resulting in a highly concentrated facial representation (32-dimensional). Expression prediction is performed by two ways: 1) learning linear support vector machine classifiers using the 32-dimensional fused deep features, or 2) directly performing softmax prediction using the six-dimensional expression probability vectors. Different from existing 3D FER methods, DF-CNN combines feature learning and fusion learning into a single end-to-end training framework. To demonstrate the effectiveness of DF-CNN, we conducted comprehensive experiments to compare the performance of DF-CNN with handcrafted features, pre-trained deep features, fine-tuned deep features, and state-of-the-art methods on three 3D face datasets (i.e., BU-3DFE Subset I, BU-3DFE Subset II, and Bosphorus Subset). In all cases, DF-CNN consistently achieved the best results. To the best of our knowledge, this is the first work of introducing deep CNN to 3D FER and deep learning-based feature-level fusion for multimodal 2D+3D FER.

181 citations


Posted Content
TL;DR: Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.
Abstract: Compressive sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR images from a small number of under-sampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and speed, in this paper, we propose two novel deep architectures, dubbed ADMM-Nets in basic and generalized versions. ADMM-Nets are defined over data flow graphs, which are derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a general CS-based MRI model. They take the sampled k-space data as inputs and output reconstructed MR images. Moreover, we extend our network to cope with complex-valued MR images. In the training phase, all parameters of the nets, e.g., transforms, shrinkage functions, etc., are discriminatively trained end-to-end. In the testing phase, they have computational overhead similar to ADMM algorithm but use optimized parameters learned from the data for CS-based reconstruction task. We investigate different configurations in network structures and conduct extensive experiments on MR image reconstruction under different sampling rates. Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.

106 citations


Journal ArticleDOI
TL;DR: This paper designs an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov random field smoothness constraint and iteratively optimize the classifiers and class labels by alternately minimizing the energy function with respect to them.
Abstract: This paper presents a novel unsupervised image classification method for polarimetric synthetic aperture radar (PolSAR) data. The proposed method is based on a discriminative clustering framework that explicitly relies on a discriminative supervised classification technique to perform unsupervised clustering. To implement this idea, we design an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov random field smoothness constraint. In this model, both the pixelwise class labels and classifiers are taken as unknown variables to be optimized. Starting from the initialized class labels generated by Cloude–Pottier decomposition and $K$ -Wishart distribution hypothesis, we iteratively optimize the classifiers and class labels by alternately minimizing the energy function with respect to them. Finally, the optimized class labels are taken as the classification result, and the classifiers for different classes are also derived as a side effect. We apply this approach to real PolSAR benchmark data. Extensive experiments justify that our approach can effectively classify the PolSAR image in an unsupervised way and produce higher accuracies than the compared state-of-the-art methods.

38 citations


Journal ArticleDOI
TL;DR: In this article, a method called point integral method (PIM) is proposed to solve the Poisson-type equations from point clouds, where the integral equation is derived from the unknown function.
Abstract: Partial differential equations (PDE) on manifolds arise in many areas, including mathematics and many applied fields. Due to the complicated geometrical structure of the manifold, it is difficult to get efficient numerical method to solve PDE on manifold. In the paper, we propose a method called point integral method (PIM) to solve the Poisson-type equations from point clouds. Among different kinds of PDEs, the Poisson-type equations including the standard Poisson equation and the related eigenproblem of the Laplace-Beltrami operator are one of the most important. In PIM, the key idea is to derive the integral equations which approximates the Poisson-type equations and contains no derivatives but only the values of the unknown function. This feature makes the integral equation easy to be discretized from point cloud. In the paper, we explain the derivation of the integral equations, describe the point integral method and its implementation, and present the numerical experiments to demonstrate the convergence of PIM.

36 citations


Journal ArticleDOI
Dong Chen1, Xudong Cao1, David Wipf1, Fang Wen1, Jian Sun1 
TL;DR: This paper revisits the classical Bayesian face recognition algorithm from Baback Moghaddam et al. and proposes enhancements tailored to face verification, the problem of predicting whether or not a pair of facial images share the same identity.
Abstract: This paper revisits the classical Bayesian face recognition algorithm from Baback Moghaddam et al. and proposes enhancements tailored to face verification, the problem of predicting whether or not a pair of facial images share the same identity. Like a variety of face verification algorithms, the original Bayesian face model only considers the appearance difference between two faces rather than the raw images themselves. However, we argue that such a fixed and blind projection may prematurely reduce the separability between classes. Consequently, we model two facial images jointly with an appropriate prior that considers intra- and extra-personal variations over the image pairs. This joint formulation is trained using a principled EM algorithm, while testing involves only efficient closed-formed computations that are suitable for real-time practical deployment. Supporting theoretical analyses investigate computational complexity, scale-invariance properties, and convergence issues. We also detail important relationships with existing algorithms, such as probabilistic linear discriminant analysis and metric learning. Finally, on extensive experimental evaluations, the proposed model is superior to the classical Bayesian face algorithm and many alternative state-of-the-art supervised approaches, achieving the best test accuracy on three challenging datasets, Labeled Face in Wild, Multi-PIE, and YouTube Faces, all with unparalleled computational efficiency.

Proceedings ArticleDOI
TL;DR: The proposed location sensitive sparse representation classifier (LS-SRC) for similarity measure among deep normal patterns associated with different 3D faces, achieves significantly high performance, and reveals that the performance of 3D face recognition would be constantly improved with the aid of training deep models from massive 2D face images, which opens the door for future directions of 3d face recognition.
Abstract: This paper presents a straight-forward yet efficient, and expression-robust 3D face recognition approach by exploring location sensitive sparse representation of deep normal patterns (DNP) In particular, given raw 3D facial surfaces, we first run 3D face pre-processing pipeline, including nose tip detection, face region cropping, and pose normalization The 3D coordinates of each normalized 3D facial surface are then projected into 2D plane to generate geometry images, from which three images of facial surface normal components are estimated Each normal image is then fed into a pre-trained deep face net to generate deep representations of facial surface normals, ie, deep normal patterns Considering the importance of different facial locations, we propose a location sensitive sparse representation classifier (LS-SRC) for similarity measure among deep normal patterns associated with different 3D faces Finally, simple score-level fusion of different normal components are used for the final decision The proposed approach achieves significantly high performance, and reporting rank-one scores of 9801%, 9760%, and 9613% on the FRGC v20, Bosphorus, and BU-3DFE databases when only one sample per subject is used in the gallery These experimental results reveals that the performance of 3D face recognition would be constantly improved with the aid of training deep models from massive 2D face images, which opens the door for future directions of 3D face recognition

Journal ArticleDOI
TL;DR: The convergence of PIM for Poisson equation with Neumann boundary condition on submanifolds that are isometrically embedded in Euclidean spaces is analyzed.
Abstract: The Laplace–Beltrami operator, a fundamental object associated with Riemannian manifolds, encodes all intrinsic geometry of manifolds and has many desirable properties Recently, we proposed the point integral method (PIM), a novel numerical method for discretizing the Laplace–Beltrami operator on point clouds (Li et al in Commun Comput Phys 22(1):228–258, 2017) In this paper, we analyze the convergence of PIM for Poisson equation with Neumann boundary condition on submanifolds that are isometrically embedded in Euclidean spaces

Proceedings ArticleDOI
18 May 2017
TL;DR: This paper designs an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov Random Field (MRF) smoothness constraint, and iteratively optimize the classifiers and class labels by alternately minimizing the energy function w.r.t. them.
Abstract: This paper presents a novel unsupervised image classification method for polarimetric synthetic aperture radar (PolSAR) data. The proposed method is based on a discriminative clustering framework that explicitly relies on a discriminative supervised classification technique to perform unsupervised clustering. To implement this idea, we design an energy function for unsupervised PolSAR image classification by combining a supervised softmax regression model with a Markov Random Field (MRF) smoothness constraint. In this model, both the pixel-wise class labels and classifiers are taken as unknown variables to be optimized. Starting from the initialized class labels generated by Cloude-Pottier decomposition and K-Wishart distribution hypothesis, we iteratively optimize the classifiers and class labels by alternately minimizing the energy function w.r.t. them. Finally, the optimized class labels are taken as the classification result, and the classifiers for different classes are also derived as a side effect. We apply this approach to real PolSAR benchmark data. Extensive experiments justify that our approach can effectively classify the PolSAR image in an unsupervised way, and produce higher accuracies than the compared state-of-the-art methods.