scispace - formally typeset
Search or ask a question
Author

Stan Z. Li

Bio: Stan Z. Li is an academic researcher from Westlake University. The author has contributed to research in topics: Facial recognition system & Face detection. The author has an hindex of 97, co-authored 532 publications receiving 41793 citations. Previous affiliations of Stan Z. Li include Microsoft & Macau University of Science and Technology.


Papers
More filters
Posted Content
TL;DR: A novel CNN-based method to learn a discriminative metric with good robustness to the over-fitting problem in person re-identification is proposed and it is found that the selection of intra-class sample pairs is crucial for learning but has received little attention.
Abstract: Person re-identification aims to re-identify the probe image from a given set of images under different camera views. It is challenging due to large variations of pose, illumination, occlusion and camera view. Since the convolutional neural networks (CNN) have excellent capability of feature extraction, certain deep learning methods have been recently applied in person re-identification. However, in person re-identification, the deep networks often suffer from the over-fitting problem. In this paper, we propose a novel CNN-based method to learn a discriminative metric with good robustness to the over-fitting problem in person re-identification. Firstly, a novel deep architecture is built where the Mahalanobis metric is learned with a weight constraint. This weight constraint is used to regularize the learning, so that the learned metric has a better generalization ability. Secondly, we find that the selection of intra-class sample pairs is crucial for learning but has received little attention. To cope with the large intra-class variations in pedestrian images, we propose a novel training strategy named moderate positive mining to prevent the training process from over-fitting to the extreme samples in intra-class pairs. Experiments show that our approach significantly outperforms state-of-the-art methods on several benchmarks of person re-identification.

42 citations

Proceedings ArticleDOI
17 Oct 2003
TL;DR: This work shows that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights, and presents a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image.
Abstract: We present a general framework for face modeling under varying lighting conditions. First, we show that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights. The lighting of any face image can be represented as a point in this subspace. Second, we show that the extreme rays, i.e. the boundary of an illumination cone, cover the entire light sphere. Therefore, a relatively sparsely sampled face images can be used to build a face model instead of calculating each extremely illuminated face image. Third, we present a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image. Experiments are presented.

42 citations

Journal ArticleDOI
TL;DR: This paper extends the original shallow face descriptors to deep discriminant face features by introducing a stacked image descriptor (SID), with deep structure, more complex facial information can be extracted and the discriminant and compactness of feature representation can be improved.
Abstract: Learning-based face descriptors have constantly improved the face recognition performance. Compared with the hand-crafted features, learning-based features are considered to be able to exploit information with better discriminative ability for specific tasks. Motivated by the recent success of deep learning, in this paper, we extend the original shallow face descriptors to deep discriminant face features by introducing a stacked image descriptor (SID). With deep structure, more complex facial information can be extracted and the discriminant and compactness of feature representation can be improved. The SID is learned in a forward optimization way, which is computational efficient compared with deep learning. Extensive experiments on various face databases are conducted to show that SID is able to achieve high face recognition performance with compact face representation, compared with other state-of-the-art descriptors.

42 citations

Proceedings ArticleDOI
Changtao Zhou1, Zhiwei Zhang1, Dong Yi1, Zhen Lei1, Stan Z. Li1 
TL;DR: Simultaneous Discriminant Analysis learns two mappings from LR and HR images respectively to a common subspace where discrimination property is maximized and the conventional classification method is applied in the common space for final decision.
Abstract: Low resolution (LR) is an important issue when handling real world face recognition problems. The performance of traditional recognition algorithms will drop drastically due to the loss of facial texture information in original high resolution (HR) images. To address this problem, in this paper we propose an effective approach named Simultaneous Discriminant Analysis (SDA). SDA learns two mappings from LR and HR images respectively to a common subspace where discrimination property is maximized. In SDA, (1) the data gap between LR and HR is reduced by mapping into a common space; and (2) the mapping is designed for preserving most discriminative information. After that, the conventional classification method is applied in the common space for final decision. Extensive experiments are conducted on both FERET and Multi-PIE, and the results clearly show the superiority of the proposed SDA over state-of-the-art methods.

42 citations

Posted Content
TL;DR: In this article, the authors conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make their work easily reproducible, and propose three CNN architectures which are the first reported architectures trained using LFW data.
Abstract: Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluate the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available.

41 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations

Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

7,741 citations