scispace - formally typeset
Search or ask a question
Author

Stan Z. Li

Bio: Stan Z. Li is an academic researcher from Westlake University. The author has contributed to research in topics: Facial recognition system & Face detection. The author has an hindex of 97, co-authored 532 publications receiving 41793 citations. Previous affiliations of Stan Z. Li include Microsoft & Macau University of Science and Technology.


Papers
More filters
Posted Content
TL;DR: A more general way that can learn a similarity metric from image pixels directly by using a "siamese" deep neural network that can jointly learn the color feature, texture feature and metric in a unified framework is proposed.
Abstract: Various hand-crafted features and metric learning methods prevail in the field of person re-identification. Compared to these methods, this paper proposes a more general way that can learn a similarity metric from image pixels directly. By using a "siamese" deep neural network, the proposed method can jointly learn the color feature, texture feature and metric in a unified framework. The network has a symmetry structure with two sub-networks which are connected by Cosine function. To deal with the big variations of person images, binomial deviance is used to evaluate the cost between similarities and labels, which is proved to be robust to outliers. Compared to existing researches, a more practical setting is studied in the experiments that is training and test on different datasets (cross dataset person re-identification). Both in "intra dataset" and "cross dataset" settings, the superiorities of the proposed method are illustrated on VIPeR and PRID.

171 citations

Book ChapterDOI
Dong Yi1, Rong Liu1, Rufeng Chu1, Zhen Lei1, Stan Z. Li1 
27 Aug 2007
TL;DR: The work is aimed to develop a new solution for meeting the accuracy requirement of face-based biometric recognition, by taking advantages of the recent NIR face technology while allowing the use of existing VIS face photos as gallery templates.
Abstract: In many applications, such as E-Passport and driver's license, the enrollment of face templates is done using visible light (VIS) face images. Such images are normally acquired in controlled environment where the lighting is approximately frontal. However, Authentication is done in variable lighting conditions. Matching of faces in VIS images taken in different lighting conditions is still a big challenge. A recent development in near infrared (NIR) image based face recognition [1] has well overcome the difficulty arising from lighting changes. However, it requires that enrollment face images be acquired using NIR as well. In this paper, we present a new problem, that of matching a face in an NIR image against one in a VIS images, and propose a solution to it. The work is aimed to develop a new solution for meeting the accuracy requirement of face-based biometric recognition, by taking advantages of the recent NIR face technology while allowing the use of existing VIS face photos as gallery templates. Face recognition is done by matching an NIR probe face against a VIS gallery face. Based on an analysis of properties of NIR and VIS face images, we propose a learning-based approach for the different modality matching. A mechanism of correlation between NIR and VIS faces is learned from NIR → VIS face pairs, and the learned correlation is used to evaluate similarity between an NIR face and a VIS face. We provide preliminary results of NIR → VIS face matching for recognition under different illumination conditions. The results demonstrate advantages of NIR → VIS matching over VIS → VIS matching.

171 citations

Proceedings ArticleDOI
01 Dec 2001
TL;DR: This work proposes a new appearance model, called direct appearance model (DAM), without combining from shape and texture as in AAM, which uses texture information directly in the prediction of the shape and in the estimation of position and appearance (hence the name DAM).
Abstract: Active appearance model (AAM), which makes ingenious use of both shape and texture constraints, is a powerful tool for face modeling, alignment and facial feature extraction under shape deformations and texture variations. However, as we show through our analysis and experiments, there exist admissible appearances that are not modeled by AAM and hence cannot be reached by AAM search; also the mapping from the texture subspace to the shape subspace is many-to-one and therefore a shape should be determined entirely by the texture in it. We propose a new appearance model, called direct appearance model (DAM), without combining from shape and texture as in AAM. The DAM model uses texture information directly in the prediction of the shape and in the estimation of position and appearance (hence the name DAM). In addition, DAM predicts the new face position and appearance based on principal components of texture difference vectors, instead of the raw vectors themselves as in AAM. These lead to the following advantages over AAM: (1) DAM subspaces include admissible appearances previously unseen in AAM, (2) convergence and accuracy are improved, and (3) memory requirement is cut down to a large extent. The advantages are substantiated by comparative experimental results.

168 citations

Proceedings ArticleDOI
Peng Yang1, Shiguang Shan1, Wen Gao1, Stan Z. Li2, Dong Zhang2 
17 May 2004
TL;DR: AdaBoost is successfully applied to face recognition by introducing the intra-face and extra-face difference space in the Gabor feature space and an appropriate re-sampling scheme is adopted to deal with the imbalance between the amount of the positive samples and that of the negative samples.
Abstract: Face representation based on Gabor features has attracted much attention and achieved great success in face recognition area for the advantages of the Gabor features. However, Gabor features currently adopted by most systems are redundant and too high dimensional. In this paper, we propose a face recognition method using AdaBoosted Gabor features, which are not only low dimensional but also discriminant. The main contribution of the paper lies in two points: (1) AdaBoost is successfully applied to face recognition by introducing the intra-face and extra-face difference space in the Gabor feature space; (2) an appropriate re-sampling scheme is adopted to deal with the imbalance between the amount of the positive samples and that of the negative samples. By using the proposed method, only hundreds of Gabor features are selected. Experiments on FERET database have shown that these hundreds of Gabor features are enough to achieve good performance comparable to that of methods using the complete set of Gabor features.

164 citations

Proceedings ArticleDOI
24 Oct 2004
TL;DR: A novel framework, called the self-quotient image, for the elimination of the lighting effect in the image is presented, which combines the image processing technique of edge-preserved filtering with the Retinex applications of by Jobson, et al., (1997) and Gross and Brajovie (2003).
Abstract: The reliability of facial recognition techniques is often affected by the variation of illumination, such as shadows and illumination direction changes. In this paper, we present a novel framework, called the self-quotient image, for the elimination of the lighting effect in the image. Although this method has a similar invariant form to the quotient image by Shashua etc. (2001), it does not need the alignment and bootstrap images. Our method combines the image processing technique of edge-preserved filtering with the Retinex applications of by Jobson, et al., (1997) and Gross and Brajovie (2003). We have analyzed this algorithm with a 3D imaging model and formulated the conditions where illumination-invariant and -variant properties can be realized, respectively. A fast anisotropic filter is also presented. The experiment results show that our method is effective in removing the effect of illumination for robust face recognition.

158 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations

Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

7,741 citations