Showing papers by "Stan Z. Li published in 2017"

PDF

Open Access

Proceedings Article•DOI•

S^3FD: Single Shot Scale-Invariant Face Detector

[...]

Shifeng Zhang¹, Xiangyu Zhu¹, Zhen Lei¹, Hailin Shi¹, Xiaobo Wang¹, Stan Z. Li¹ - Show less +2 more•Institutions (1)

17 Aug 2017

TL;DR: S3FD as mentioned in this paper proposes a scale-equitable face detection framework to handle different scales of faces well and improves the recall rate of small faces by a scale compensation anchor matching strategy.

...read moreread less

Abstract: This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S3FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchorbased detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-theart detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.

...read moreread less

374 citations

Proceedings Article•DOI•

FaceBoxes: A CPU real-time face detector with high accuracy

[...]

Shifeng Zhang¹, Xiangyu Zhu¹, Zhen Lei¹, Hailin Shi¹, Xiaobo Wang¹, Stan Z. Li¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

17 Aug 2017-International Journal of Central Banking

TL;DR: Zhang et al. as mentioned in this paper proposed a novel face detector, named FaceBoxes, which consists of the Rapidly Digested Convolutional Layers (RDCL) and the multiple scale convolutional layers (MSCL).

...read moreread less

Abstract: Although tremendous strides have been made in face detection, one of the remaining open challenges is to achieve real-time speed on the CPU as well as maintain high performance, since effective models for face detection tend to be computationally prohibitive. To address this challenge, we propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Specifically, our method has a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales. Besides, we propose a new anchor densification strategy to make different types of anchors have the same density on the image, which significantly improves the recall rate of small faces. As a consequence, the proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU for VGA-resolution images. Moreover, the speed of FaceBoxes is invariant to the number of faces. We comprehensively evaluate this method and present state-of-the-art detection performance on several face detection benchmark datasets, including the AFW, PASCAL face, and FDDB.

...read moreread less

207 citations

Proceedings Article•DOI•

Exclusivity-Consistency Regularized Multi-view Subspace Clustering

[...]

Xiaobo Wang¹, Xiaojie Guo¹, Zhen Lei¹, Changqing Zhang², Stan Z. Li¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Tianjin University²

21 Jul 2017

TL;DR: A novel multi-view subspace clustering model that attempts to harness the complementary information between different representations by introducing a novel position-aware exclusivity term and a consistency term is employed to make these complementary representations to further have a common indicator.

...read moreread less

Abstract: Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering them simultaneously. In this paper, we propose a novel multi-view subspace clustering model that attempts to harness the complementary information between different representations by introducing a novel position-aware exclusivity term. Meanwhile, a consistency term is employed to make these complementary representations to further have a common indicator. We formulate the above concerns into a unified optimization framework. Experimental results on several benchmark datasets are conducted to reveal the effectiveness of our algorithm over other state-of-the-arts.

...read moreread less

199 citations

Posted Content•

S$^3$FD: Single Shot Scale-invariant Face Detector

[...]

Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, Stan Z. Li - Show less +2 more

17 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S3FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces.

...read moreread less

Abstract: This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S$^3$FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchor-based detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-the-art detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.

...read moreread less

150 citations

Posted Content•

FaceBoxes: A CPU Real-time Face Detector with High Accuracy

[...]

Shifeng Zhang¹, Xiangyu Zhu¹, Zhen Lei¹, Hailin Shi¹, Xiaobo Wang¹, Stan Z. Li¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

17 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: FaceBoxes as mentioned in this paper proposes a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the multiple scale convolutional layers (MSCL) to enable FaceBoxes to achieve real-time speed on the CPU.

...read moreread less

117 citations

Journal Article•DOI•

Multi-label convolutional neural network based pedestrian attributeclassification

[...]

Jianqing Zhu¹, Shengcai Liao², Zhen Lei², Stan Z. Li²•Institutions (2)

Huaqiao University¹, Chinese Academy of Sciences²

01 Feb 2017-Image and Vision Computing

TL;DR: The proposed multi-label convolutional neural network (MLCNN) can simultaneously predict multiple pedestrian attributes and significantly outperforms the SVM based method on the PETA database.

...read moreread less

106 citations

Book Chapter•DOI•

Soft-Margin Softmax for Deep Classification

[...]

Xuezhi Liang¹, Xiaobo Wang¹, Zhen Lei¹, Shengcai Liao¹, Stan Z. Li¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 Nov 2017

TL;DR: A novel soft-margin softmax (SM-Softmax) loss to improve the discriminative power of features that can not only adjust the desired continuous soft margin but also be easily optimized by the typical stochastic gradient descent (SGD).

...read moreread less

Abstract: In deep classification, the softmax loss (Softmax) is arguably one of the most commonly used components to train deep convolutional neural networks (CNNs). However, such a widely used loss is limited due to its lack of encouraging the discriminability of features. Recently, the large-margin softmax loss (L-Softmax [1]) is proposed to explicitly enhance the feature discrimination, with hard margin and complex forward and backward computation. In this paper, we propose a novel soft-margin softmax (SM-Softmax) loss to improve the discriminative power of features. Specifically, SM-Softamx only modifies the forward of Softmax by introducing a non-negative real number m, without changing the backward. Thus it can not only adjust the desired continuous soft margin but also be easily optimized by the typical stochastic gradient descent (SGD). Experimental results on three benchmark datasets have demonstrated the superiority of our SM-Softmax over the baseline Softmax, the alternative L-Softmax and several state-of-the-art competitors.

...read moreread less

81 citations

Proceedings Article•

Unsupervised Learning of Multi-Level Descriptors for Person Re-Identification.

[...]

Yang Yang¹, Longyin Wen², Siwei Lyu², Stan Z. Li¹•Institutions (2)

Chinese Academy of Sciences¹, University at Albany, SUNY²

01 Jan 2017

TL;DR: A novel coding method named weighted linear coding (WLC) to learn multi-level descriptors from raw pixel data in an unsupervised manner that guarantees the property of saliency with a similarity constraint and has a good balance between the robustness and distinctiveness.

...read moreread less

Abstract: In this paper, we propose a novel coding method named weighted linear coding (WLC) to learn multi-level (e.g., pixel-level, patch-level and image-level) descriptors from raw pixel data in an unsupervised manner. It guarantees the property of saliency with a similarity constraint. The resulting multi-level descriptors have a good balance between the robustness and distinctiveness. Based on WLC, all data from the same region can be jointly encoded. Consequently, when we extract the holistic image features, it is able to preserve the spatial consistency. Furthermore, we apply PCA to these features and compact person representations are then achieved. During the stage of matching persons, we exploit the complementary information resided in multi-level descriptors via a score-level fusion strategy. Experiments on the challenging person re-identification datasets VIPeR and CUHK 01, demonstrate the effectiveness of our method.

...read moreread less

46 citations

Posted Content•

Single-Shot Refinement Neural Network for Object Detection

[...]

Shifeng Zhang¹, Longyin Wen², Xiao Bian², Zhen Lei¹, Stan Z. Li³ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, General Electric², Macau University of Science and Technology³

18 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one- stage methods.

...read moreread less

Abstract: For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at this https URL

...read moreread less

37 citations

Proceedings Article•DOI•

Deep person re-identification with improved embedding and efficient training

[...]

Haibo Jin¹, Xiaobo Wang¹, Shengcai Liao¹, Stan Z. Li¹•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2017-International Journal of Central Banking

TL;DR: Zhang et al. as mentioned in this paper employed identification loss with center loss to train a deep model for person re-identification, which does not require image pairs or triplets for training while the inter-class distinction and intra-class variance are well handled.

...read moreread less

Abstract: Person re-identification task has been greatly boosted by deep convolutional neural networks (CNNs) in recent years. The core of which is to enlarge the inter-class distinction as well as reduce the intra-class variance. However, to achieve this, existing deep models prefer to adopt image pairs or triplets to form verification loss, which is inefficient and unstable since the number of training pairs or triplets grows rapidly as the number of training data grows. Moreover, their performance is limited since they ignore the fact that different dimension of embedding may play different importance. In this paper, we propose to employ identification loss with center loss to train a deep model for person re-identification. The training process is efficient since it does not require image pairs or triplets for training while the inter-class distinction and intra-class variance are well handled. To boost the performance, a new feature reweighting (FRW) layer is designed to explicitly emphasize the importance of each embedding dimension, thus leading to an improved embedding. Experiments 1 on several benchmark datasets have shown the superiority of our method over the state-of-the-art alternatives on both accuracy and speed.

...read moreread less

37 citations

Proceedings Article•DOI•

Multi-modality Network with Visual and Geometrical Information for Micro Emotion Recognition

[...]

Jianzhu Guo¹, Shuai Zhou², Jinlin Wu¹, Jun Wan¹, Xiangyu Zhu¹, Zhen Lei¹, Stan Z. Li¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Macau University of Science and Technology²

01 May 2017

TL;DR: In the final testing phase of Micro Emotion Challenge1, the proposed multi-modality convolutional neural networks based on visual and geometrical information is more effective and has got better performance.

...read moreread less

Abstract: Micro emotion recognition is a very challenging problem because of the subtle appearance variants among different facial expression classes. To deal with the mentioned problem, we proposed a multi-modality convolutional neural networks (CNNs) based on visual and geometrical information in this paper. The visual face image and structured geometry are embedded into a unified network and the recognition accuracy can be benefic from the fused information. The proposed network includes two branches. The first branch is used to extract visual feature from color face images, and another branch is used to extract the geometry feature from 68 facial landmarks. Then, both visual and geometry features are concatenated into a long vector. Finally, the concatenated vector is fed to the hinge loss layer. Compared with the CNN architecture only used face images, our method is more effective and has got better performance. In the final testing phase of Micro Emotion Challenge1, our method has got the first place with the misclassification of 80.212137.

...read moreread less

Journal Article•DOI•

Cross-Modality Face Recognition via Heterogeneous Joint Bayesian

[...]

Hailin Shi¹, Xiaobo Wang¹, Dong Yi², Zhen Lei¹, Xiangyu Zhu¹, Stan Z. Li¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Alibaba Group²

01 Jan 2017-IEEE Signal Processing Letters

TL;DR: The proposed HJB explicitly models the modality difference of image pairs and is able to better discriminate the same/different face pairs accurately, showing the superiority of the HJB over previous methods.

...read moreread less

Abstract: In many face recognition applications, the modalities of face images between the gallery and probe sets are different, which is known as heterogeneous face recognition. How to reduce the feature gap between images from different modalities is a critical issue to develop a highly accurate face recognition algorithm. Recently, joint Bayesian (JB) has demonstrated superior performance on general face recognition compared to traditional discriminant analysis methods like subspace learning. However, the original JB treats the two input samples equally and does not take into account the modality difference between them and may be suboptimal to address the heterogeneous face recognition problem. In this work, we extend the original JB by modeling the gallery and probe images using two different Gaussian distributions to propose a heterogeneous joint Bayesian (HJB) formulation for cross-modality face recognition. The proposed HJB explicitly models the modality difference of image pairs and, therefore, is able to better discriminate the same/different face pairs accurately. Extensive experiments conducted in the case of visible–near-infrared and ID photo versus spot face recognition problems show the superiority of the HJB over previous methods.

...read moreread less

Book Chapter•DOI•

Detecting Face with Densely Connected Face Proposal Network

[...]

Shifeng Zhang¹, Xiangyu Zhu¹, Zhen Lei¹, Hailin Shi¹, Xiaobo Wang¹, Stan Z. Li¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

28 Oct 2017

TL;DR: This work proposes a novel face detector, dubbed the Densely Connected Face Proposal Network (DCFPN), with high performance as well as real-time speed on the CPU devices, and uses the dense anchor strategy and a fair L1 loss function to handle small faces well.

...read moreread less

Abstract: Accuracy and efficiency are two conflicting challenges for face detection, since effective models tend to be computationally prohibitive. To address these two conflicting challenges, our core idea is to shrink the input image and focus on detecting small faces. Specifically, we propose a novel face detector, dubbed the name Densely Connected Face Proposal Network (DCFPN), with high performance as well as real-time speed on the CPU devices. On the one hand, we subtly design a lightweight-but-powerful fully convolutional network with the consideration of efficiency and accuracy. On the other hand, we use the dense anchor strategy and propose a fair L1 loss function to handle small faces well. As a consequence, our method can detect faces at 30 FPS on a single 2.60 GHz CPU core and 250 FPS using a GPU for the VGA-resolution images. We achieve state-of-the-art performance on the AFW, PASCAL face and FDDB datasets.

...read moreread less

Posted Content•

Deep Person Re-Identification with Improved Embedding.

[...]

Haibo Jin, Xiaobo Wang, Shengcai Liao, Stan Z. Li

09 May 2017

TL;DR: A new feature reweighting (FRW) layer is designed to explicitly emphasize the importance of each embedding dimension, thus leading to an improved embedding in a deep model for person re-identification.

...read moreread less

Abstract: Person re-identification task has been greatly boosted by deep convolutional neural networks (CNNs) in recent years. The core of which is to enlarge the inter-class distinction as well as reduce the intra-class variance. However, to achieve this, existing deep models prefer to adopt image pairs or triplets to form verification loss, which is inefficient and unstable since the number of training pairs or triplets grows rapidly as the number of training data grows. Moreover, their performance is limited since they ignore the fact that different dimension of embedding may play different importance. In this paper, we propose to employ identification loss with center loss to train a deep model for person re-identification. The training process is efficient since it does not require image pairs or triplets for training while the inter-class distinction and intra-class variance are well handled. To boost the performance, a new feature reweighting (FRW) layer is designed to explicitly emphasize the importance of each embedding dimension, thus leading to an improved embedding. Experiments on several benchmark datasets have shown the superiority of our method over the state-of-the-art alternatives.

...read moreread less

Book Chapter•DOI•

Countermeasures to face photo spoofing attacks by exploiting structure and texture information from rotated face sequences

[...]

Zhen Lei, Wang Tao, Xiangyu Zhu, Tianyu Fu, Stan Z. Li - Show less +1 more

30 Sep 2017

TL;DR: The proposed anti-spoofing method is applicable to face recognition applications such as face access control and remote authentication on mobile devices, and the simple head rotation requirement is acceptable in these applications.

...read moreread less

Abstract: This work focuses on the most common and cheapest face spoofing methods, i.e., photo attacks (including the printed photo on a paper or a photo demonstrated on an electronic screen). Many previous works [3-6] propose to classify genuine and fake samples based on frontal face images and achieve good performance on several face spoofing databases. However, in real applications, the imposter will try his best to fool the system and the texture difference between the genuine and fake samples is usually very small. In order to achieve robust face anti-spoofing performance, other cues like 3D face structure and motion pattern can be incorporated. In this work, we propose to detect spoofing photo attacks based on a sequence of rotated face images. Both the structure and texture information from the rotated face sequence are exploited. In practice, the users are only asked to take simple movement (i.e., rotate their faces). As pointed in [7], this head rotation requirement is much simpler than traditional challenge-response-based face anti-spoofing method, in which a combination of multiple movements is usually necessary. The proposed anti-spoofing method is applicable to face recognition applications such as face access control and remote authentication on mobile devices. The simple head rotation requirement is acceptable in these applications.

...read moreread less

Posted Content•

Learning Efficient Image Representation for Person Re-Identification

[...]

Yang Yang, Shengcai Liao, Zhen Lei, Stan Z. Li

07 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper model the discrepancies between color names and pixels using a Gaussian and utilize the inverse of covariance matrix to bridge the gap between them and proposes a new method named soft Gaussian mapping (SGM) to address this problem.

...read moreread less

Abstract: Color names based image representation is successfully used in person re-identification, due to the advantages of being compact, intuitively understandable as well as being robust to photometric variance. However, there exists the diversity between underlying distribution of color names' RGB values and that of image pixels' RGB values, which may lead to inaccuracy when directly comparing them in Euclidean space. In this paper, we propose a new method named soft Gaussian mapping (SGM) to address this problem. We model the discrepancies between color names and pixels using a Gaussian and utilize the inverse of covariance matrix to bridge the gap between them. Based on SGM, an image could be converted to several soft Gaussian maps. In each soft Gaussian map, we further seek to establish stable and robust descriptors within a local region through a max pooling operation. Then, a robust image representation based on color names is obtained by concatenating the statistical descriptors in each stripe. When labeled data are available, one discriminative subspace projection matrix is learned to build efficient representations of an image via cross-view coupling learning. Experiments on the public datasets - VIPeR, PRID450S and CUHK03, demonstrate the effectiveness of our method.

...read moreread less

Posted Content•

Deep Person Re-Identification with Improved Embedding and Efficient Training

[...]

Haibo Jin¹, Xiaobo Wang¹, Shengcai Liao¹, Stan Z. Li¹•Institutions (1)

Chinese Academy of Sciences¹

09 May 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors employed identification loss with center loss to train a deep model for person re-identification, which does not require image pairs or triplets for training while the inter-class distinction and intra-class variance are well handled.

...read moreread less

Abstract: Person re-identification task has been greatly boosted by deep convolutional neural networks (CNNs) in recent years. The core of which is to enlarge the inter-class distinction as well as reduce the intra-class variance. However, to achieve this, existing deep models prefer to adopt image pairs or triplets to form verification loss, which is inefficient and unstable since the number of training pairs or triplets grows rapidly as the number of training data grows. Moreover, their performance is limited since they ignore the fact that different dimension of embedding may play different importance. In this paper, we propose to employ identification loss with center loss to train a deep model for person re-identification. The training process is efficient since it does not require image pairs or triplets for training while the inter-class distinction and intra-class variance are well handled. To boost the performance, a new feature reweighting (FRW) layer is designed to explicitly emphasize the importance of each embedding dimension, thus leading to an improved embedding. Experiments on several benchmark datasets have shown the superiority of our method over the state-of-the-art alternatives on both accuracy and speed.

...read moreread less