scispace - formally typeset
Search or ask a question
Author

Stan Z. Li

Bio: Stan Z. Li is an academic researcher from Westlake University. The author has contributed to research in topics: Facial recognition system & Face detection. The author has an hindex of 97, co-authored 532 publications receiving 41793 citations. Previous affiliations of Stan Z. Li include Microsoft & Macau University of Science and Technology.


Papers
More filters
Book ChapterDOI
20 Oct 2007
TL;DR: This work presents an effective approach for spatiotemporal face recognition from videos using an Extended set of Volume LBP (Local Binary Pattern features) and a boosting scheme and is the first work addressing the issue of learning the personal specific facial dynamics for face recognition.
Abstract: In this paper, we present an effective approach for spatiotemporal face recognition from videos using an Extended set of Volume LBP (Local Binary Pattern features) and a boosting scheme. Among the key properties of our approach are: (1) the use of local Extended Volume LBP based spatiotemporal description instead of the holistic representations commonly used in previous works; (2) the selection of only personal specific facial dynamics while discarding the intrapersonal temporal information; and (3) the incorporation of the contribution of each local spatiotemporal information. To the best of our knowledge, this is the first work addressing the issue of learning the personal specific facial dynamics for face recognition. We experimented with three different publicly available video face databases (MoBo, CRIM and Honda/UCSD) and considered five benchmark methods (PCA, LDA, LBP, HMMs and ARMA) for comparison. Our extensive experimental analysis clearly assessed the excellent performance of the proposed approach, significantly outperforming the comparative methods and thus advancing the state-of-the-art.

44 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes to learn the discriminative image filter to improve the discriminant power of the LBP like feature and a coupled discriminant image filters learning method is proposed to deal with the heterogenous face images matching problem by reducing the feature gap between the heterogeneous images.
Abstract: Local binary pattern (LBP) and its variants are effective descriptors for face recognition. The traditional LBP like features are extracted based on the original pixel or patch values of images. In this paper, we propose to learn the discriminative image filter to improve the discriminant power of the LBP like feature. The basic idea is after the image filtering with the learned filter, the difference of pixel difference vectors (PDVs) between the images from the same person is consistent and the difference between the images from different persons is enlarged. In this way, the LBP like features extracted from the filtered images are considered to be more discriminant than those extracted from the original images. Moreover, a coupled discriminant image filters learning method is proposed to deal with the heterogenous face images matching problem by reducing the feature gap between the heterogeneous images. Experiments on FERET, FRGC and a VIS-NIR heterogeneous face databases validate the effectiveness of our proposed image filter learning method combined with LBP like features.

43 citations

Posted Content
TL;DR: This paper calls the proposed method "CRAFT" (Cascade Regionproposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade, and shows that the cascade structure helps in both tasks.
Abstract: Object detection is a fundamental problem in image understanding. One popular solution is the R-CNN framework and its fast versions. They decompose the object detection problem into two cascaded easier tasks: 1) generating object proposals from images, 2) classifying proposals into various object categories. Despite that we are handling with two relatively easier tasks, they are not solved perfectly and there's still room for improvement. In this paper, we push the "divide and conquer" solution even further by dividing each task into two sub-tasks. We call the proposed method "CRAFT" (Cascade Region-proposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade. We show that the cascade structure helps in both tasks: in proposal generation, it provides more compact and better localized object proposals; in object classification, it reduces false positives (mainly between ambiguous categories) by capturing both inter- and intra-category variances. CRAFT achieves consistent and considerable improvement over the state-of-the-art on object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.

42 citations

Book ChapterDOI
Dong Yi1, Shengcai Liao1, Zhen Lei1, Jitao Sang1, Stan Z. Li1 
04 Jun 2009
TL;DR: The proposed approach, without knowing statistical characteristics of the subjects or data, outperforms the methods of contrast significantly, with ten-fold higher verification rates at FAR of 0.1%.
Abstract: The latest multi-biometric grand challenge (MBGC 2008) sets up a new experiment in which near infrared (NIR) face videos containing partial faces are used as a probe set and the visual (VIS) images of full faces are used as the target set. This is challenging for two reasons: (1) it has to deal with partially occluded faces in the NIR videos, and (2) the matching is between heterogeneous NIR and VIS faces. Partial face matching is also a problem often confronted in many video based face biometric applications. In this paper, we propose a novel approach for solving this challenging problem. For partial face matching, we propose a local patch based method to deal with partial face data. For heterogeneous face matching, we propose the philosophy of enhancing common features in heterogeneous images while reducing differences. This is realized by using edge-enhancing filters, which at the same time is also beneficial for partial face matching. The approach requires neither learning procedures nor training data. Experiments are performed using the MBGC portal challenge data, comparing with several known state-of-the-arts methods. Extensive results show that the proposed approach, without knowing statistical characteristics of the subjects or data, outperforms the methods of contrast significantly, with ten-fold higher verification rates at FAR of 0.1%.

42 citations

Posted Content
TL;DR: A single-shot refinement face detector namely RefineFace is presented to achieve high performance and achieves state-of-the-art results and runs at 37.3 FPS with ResNet-18 for VGA-resolution images.
Abstract: Face detection has achieved significant progress in recent years. However, high performance face detection still remains a very challenging problem, especially when there exists many tiny faces. In this paper, we present a single-shot refinement face detector namely RefineFace to achieve high performance. Specifically, it consists of five modules: Selective Two-step Regression (STR), Selective Two-step Classification (STC), Scale-aware Margin Loss (SML), Feature Supervision Module (FSM) and Receptive Field Enhancement (RFE). To enhance the regression ability for high location accuracy, STR coarsely adjusts locations and sizes of anchors from high level detection layers to provide better initialization for subsequent regressor. To improve the classification ability for high recall efficiency, STC first filters out most simple negatives from low level detection layers to reduce search space for subsequent classifier, then SML is applied to better distinguish faces from background at various scales and FSM is introduced to let the backbone learn more discriminative features for classification. Besides, RFE is presented to provide more diverse receptive field to better capture faces in some extreme poses. Extensive experiments conducted on WIDER FACE, AFW, PASCAL Face, FDDB, MAFA demonstrate that our method achieves state-of-the-art results and runs at $37.3$ FPS with ResNet-18 for VGA-resolution images.

42 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations

Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

7,741 citations