Stan Z. Li
Other affiliations: Microsoft, Macau University of Science and Technology, Beihang University ...read more
Bio: Stan Z. Li is an academic researcher from Westlake University. The author has contributed to research in topic(s): Facial recognition system & Face detection. The author has an hindex of 97, co-authored 532 publication(s) receiving 41793 citation(s). Previous affiliations of Stan Z. Li include Microsoft & Macau University of Science and Technology.
Papers published on a yearly basis
07 Jun 2015
TL;DR: This paper proposes an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA), and presents a practical computation method for XQDA.
Abstract: Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.
01 Jan 2001
TL;DR: This detailed and thoroughly enhanced third edition presents a comprehensive study / reference to theories, methodologies and recent developments in solving computer vision problems based on MRFs, statistics and optimisation.
Abstract: Markov random field (MRF) theory provides a basis for modeling contextual constraints in visual processing and interpretation. It enables systematic development of optimal vision algorithms when used with optimization principles. This detailed and thoroughly enhanced third edition presents a comprehensive study / reference to theories, methodologies and recent developments in solving computer vision problems based on MRFs, statistics and optimisation. It treats various problems in low- and high-level computational vision in a systematic and unified way within the MAP-MRF framework. Among the main issues covered are: how to use MRFs to encode contextual constraints that are indispensable to image understanding; how to derive the objective function for the optimal solution to a problem; and how to design computational algorithms for finding an optimal solution. Easy-to-follow and coherent, the revised edition is accessible, includes the most recent advances, and has new and expanded sections on such topics as: Discriminative Random Fields (DRF) Strong Random Fields (SRF) Spatial-Temporal Models Total Variation Models Learning MRF for Classification (motivation + DRF) Relation to Graphic Models Graph Cuts Belief Propagation Features: Focuses on the application of Markov random fields to computer vision problems, such as image restoration and edge detection in the low-level domain, and object matching and recognition in the high-level domain Presents various vision models in a unified framework, including image restoration and reconstruction, edge and region segmentation, texture, stereo and motion, object matching and recognition, and pose estimation Uses a variety of examples to illustrate how to convert a specific vision problem involving uncertainties and constraints into essentially an optimization problem under the MRF setting Introduces readers to the basic concepts, important models and various special classes of MRFs on the regular image lattice and MRFs on relational graphs derived from images Examines the problems of parameter estimation and function optimization Includes an extensive list of references This broad-ranging and comprehensive volume is an excellent reference for researchers working in computer vision, image processing, statistical pattern recognition and applications of MRFs. It has been class-tested and is suitable as a textbook for advanced courses relating to these areas.
31 Aug 2011
TL;DR: This highly anticipated new edition provides a comprehensive account of face recognition research and technology, spanning the full range of topics needed for designing operational face recognition systems, as well as offering challenges and future directions.
Abstract: This highly anticipated new edition provides a comprehensive account of face recognition research and technology, spanning the full range of topics needed for designing operational face recognition systems. After a thorough introductory chapter, each of the following chapters focus on a specific topic, reviewing background information, up-to-date techniques, and recent results, as well as offering challenges and future directions. Features: fully updated, revised and expanded, covering the entire spectrum of concepts, methods, and algorithms for automated face detection and recognition systems; provides comprehensive coverage of face detection, tracking, alignment, feature extraction, and recognition technologies, and issues in evaluation, systems, security, and applications; contains numerous step-by-step algorithms; describes a broad range of applications; presents contributions from an international selection of experts; integrates numerous supporting graphs, tables, charts, and performance data.
TL;DR: A semi-automatical way to collect face images from Internet is proposed and a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace is built, based on which a 11-layer CNN is used to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF.
Abstract: Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, ie, 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi- automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state- of-theart accuracy on LFW and YTF.
01 Aug 1995
TL;DR: This book presents a comprehensive study on the use of MRFs for solving computer vision problems, and covers the following parts essential to the subject: introduction to fundamental theories, formulations of MRF vision models, MRF parameter estimation, and optimization algorithms.
Abstract: From the Publisher: Markov random field (MRF) theory provides a basis for modeling contextual constraints in visual processing and interpretation. It enables us to develop optimal vision algorithms systematically when used with optimization principles. This book presents a comprehensive study on the use of MRFs for solving computer vision problems. The book covers the following parts essential to the subject: introduction to fundamental theories, formulations of MRF vision models, MRF parameter estimation, and optimization algorithms. Various vision models are presented in a unified framework, including image restoration and reconstruction, edge and region segmentation, texture, stereo and motion, object matching and recognition, and pose estimation. This book is an excellent reference for researchers working in computer vision, image processing, statistical pattern recognition, and applications of MRFs. It is also suitable as a text for advanced courses in these areas.
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
01 Dec 1996-ACM Computing Surveys
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Abstract: Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy.