scispace - formally typeset
Search or ask a question

Showing papers by "Stan Z. Li published in 2002"


Book ChapterDOI
Stan Z. Li1, Long Zhu1, ZhenQiu Zhang, Andrew Blake1, Hong-Jiang Zhang1, Heung-Yeung Shum1 
28 May 2002
TL;DR: FloatBoost incorporates the idea of Floating Search into AdaBoost to solve the non-monotonicity problem encountered in the sequential search of AdaBoost and leads to the first real-time multi-view face detection system in the world.
Abstract: A new boosting algorithm, called FloatBoost, is proposed to overcome the monotonicity problem of the sequential AdaBoost learning. AdaBoost [1, 2] is a sequential forward search procedure using the greedy selection strategy. The premise oyered by the sequential procedure can be broken-down when the monotonicity assumption, i.e. that when adding a new feature to the current set, the value of the performance criterion does not decrease, is violated. FloatBoost incorporates the idea of Floating Search [3] into AdaBoost to solve the non-monotonicity problem encountered in the sequential search of AdaBoost.We then present a system which learns to detect multi-view faces using FloatBoost. The system uses a coarse-to-fine, simple-to-complex architecture called detector-pyramid. FloatBoost learns the component detectors in the pyramid and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers. This work leads to the first real-time multi-view face detection system in the world. It runs at 200 ms per image of size 320x240 pixels on a Pentium-III CPU of 700 MHz. A live demo will be shown at the conference.

489 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: A set of orthogonal, binary, localized basis components are learned from a well-aligned face image database and leads to a Walsh function-based representation of the face images, which can be used to resolve the occlusion problem, improve the computing efficiency and compress the storage requirements of a face detection and recognition system.
Abstract: Proposes a novel method, called local non-negative matrix factorization (LNMF), for learning a spatially localized, parts-based subspace representation of visual patterns. An objective function is defined to impose the localization constraint, in addition to the non-negativity constraint in the standard non-negative matrix factorization (NMF). This gives a set of bases which not only allows a non-subtractive (part-based) representation of images but also manifests localized features. An algorithm is presented for the learning of such basis components. Experimental results are presented to compare LNMF with the NMF and principal component analysis (PCA) methods for face representation and recognition, which demonstrates the advantages of LNMF. Based on our LNMF approach, a set of orthogonal, binary, localized basis components are learned from a well-aligned face image database. It leads to a Walsh function-based representation of the face images. These properties can be used to resolve the occlusion problem, improve the computing efficiency and compress the storage requirements of a face detection and recognition system.

108 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This work presents the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence and trains by using a new meta booting learning algorithm.
Abstract: We present a detector-pyramid architecture for real-time multi-view face detection. Using a coarse to fine strategy, the full view is partitioned into finer and finer views. Each face detector in the pyramid detects faces of its respective view range. Its training is performed by using a new meta booting learning algorithm. This results in the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence.

88 citations


Proceedings Article
01 Jan 2002
TL;DR: FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates, and proposes a statistical model for learning weak classifier, based on a stagewise approximation of the posterior using an overcomplete set of scalar features.
Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

75 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: The way that DAM models shapes and textures has the following advantages as compared to AAM: (1) DAM subspaces include admissible appearances previously unseen in AAM, (2) the convergence and accuracy are improved, and (3) the memory requirement is cut down to a large extent.
Abstract: Alignment makes face distribution statistically more compact than un-aligned faces and provides a good basis for face modeling, recognition and synthesis. In this paper we present a method for multi-view face alignment using a new model called direct appearance model (DAM). Like active appearance model (AAM), DAM also makes use of both shape and texture constraints; however it does this without combining shape and texture as in AAM. The way that DAM models shapes and textures has the following advantages as compared to AAM: (1) DAM subspaces include admissible appearances previously unseen in AAM, (2) the convergence and accuracy are improved, and (3) the memory requirement is cut down to a large extent. Extensive experiments are presented to evaluate the DAM alignment in comparison with AAM.

65 citations


Journal ArticleDOI
TL;DR: Experiments based on real object matching demonstrate that the proposed AI-ES model is more robust and insensitive to the positions, viewpoints, and large deformations of object shapes, as compared to the Active Shape Model and the AI-Snake Model.

31 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: A new boosting algorithm, called FloatBoost, is proposed to construct a strong face-nonface classifier from weak classifiers for the component detectors in the pyramid, which leads to the first real-time multi-view face detection system in the world.
Abstract: In this paper, we present a system which learns to detect multi-view faces. The system uses a coarse-to-fine, simple-to-complex architecture called detector-pyramid. A new boosting algorithm, called FloatBoost, is proposed to construct a strong face-nonface classifier from weak classifiers for the component detectors in the pyramid. FloatBoost incorporates the idea of Floating Search into AdaBoost, and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers. This work leads to the first real-time multi-view face detection system in the world. It runs at 200 ms per image of size 320/spl times/240 pixels on a Pentium-III CPU of 700 MHz.

30 citations


01 Jan 2002
TL;DR: A texture-constrained active shape model (TC-ASM) to localize a face in an image that performs stable to initialization, accurate in shape localization and robust to illumination variation, with low computational cost.
Abstract: In this paper, we propose a texture-constrained active shape model (TC-ASM) to localize a face in an image. TC-ASM effectively incorporates not only the shape prior and local appearance around each landmark, but also the global texture constraint over the shape. Therefore, it performs stable to initialization, accurate in shape localization and robust to illumination variation, with low computational cost. Extensive experiments are provided to demonstrate

28 citations


Proceedings ArticleDOI
03 Dec 2002
TL;DR: A new boosting algorithm, called FloatBoost, is proposed to construct a strong face-nonface classifier that incorporates the idea of Floating Search into AdaBoost, and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers.
Abstract: In this paper, a new boosting algorithm, called FloatBoost, is proposed to construct a strong face-nonface classifier. FloatBoost incorporates the idea of Floating Search into AdaBoost, and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers. We also present a novel framework for fast multi-view face detection. A detector-pyramid architecture is designed to quickly discard a vast number of non-face sub-windows and hence perform multi-view face detection efficiently. This results in the first real-time multi-view face detection system which runs at 5 frames per second for 320x240 image sequence.

26 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: A supervised method is presented for more effective learning of view-subspace, assuming that view-labeled face examples are available and the models thus learned give more accurate pose estimation than those obtained with the unsupervised ISA.
Abstract: Independent subspace analysis (ISA) is able to learn view-subspaces unsupervisedly from (view-unlabeled) multi-view face examples (S.Z. Li et al., 2001). We explain underlying reasons for the emergent formation of ISA view-subspaces. Based on the analysis, we present a supervised method for more effective learning of view-subspace, assuming that view-labeled face examples are available. The models thus learned give more accurate pose estimation than those obtained with the unsupervised ISA.

22 citations


Proceedings Article
Lie Lu1, Stan Z. Li1, Liu Wenyin1, Hong-Jiang Zhang1, Yi Mao2 
01 May 2002
TL;DR: This paper introduces a new audio medium, called audio texture, as a means of synthesizing long audio stream according to a given short example audio clip, and proposes a method for implementing audio textures.
Abstract: In this paper, we introduce a new audio medium, called audio texture, as a means of synthesizing long audio stream according to a given short example audio clip. The example clip is analyzed, and basic building patterns are extracted. Then an audio stream of arbitrary length is synthesized using a sequence of extracted building patterns. The patterns can be varied in the synthesis process to add variations to the generated sound. Audio textures are useful in applications such as background music, lullabies, game music, and screen saver sounds. A method is proposed for implementing audio textures. Preliminary results of audio textures are provided at our website for evaluation.

Proceedings ArticleDOI
01 Dec 2002
TL;DR: This paper presents a novel application of the Bayesian shape model (BSM) for facial feature extraction, which is designed to describe the shape of a face, and the PCA is used to estimate the shape variance of the face model.
Abstract: This paper presents a novel application of the Bayesian shape model (BSM) for facial feature extraction First, a full-face model is designed to describe the shape of a face, and the PCA is used to estimate the shape variance of the face model Then, the BSM is applied to match and extract the face patch from input face images Finally, using the face model, the extracted face patches are easily warped or normalized to a standard view Applications of this facial feature extraction algorithm include face recognition, face video coding and retrieval, face animation and multimedia

Journal ArticleDOI
TL;DR: In this article, both the microtexture and mesotexture of powder-in-tube (PIT) processed (Bi,Pb)2SrCaCuO10 (Bi2223) superconductor tapes were analyzed in the transverse direction of Bi2223 tapes.
Abstract: Powder-in-tube (PIT) processing of BiSrCaCuO superconductor is widely used to introduce textured microstructure to high temperature superconductor tapes, thus effectively minimizing the weak-link effects caused by grain boundary misorientations. Although it was reported that PIT tapes have parabolic critical current density (Jc) distribution across the tape width, the role played by texture in this is not clearly understood. In this study, both the micro- and the mesotexture of PIT processed (Bi,Pb)2Sr2Ca2Cu3O10 (Bi2223) superconductor tapes were analysed in the transverse direction of Bi2223 tapes. Microtexture and mesotexture of PIT processed Bi2223 tapes were characterized using angle-axis pairs and Rodrigues–Frank vectors. The results of microtexture evaluation indicates that a/b axes texture did exist in PIT processed tapes while the mesotexture RF plot exhibits that majority of the grain boundaries were formed by grains with non-parallel c-axis. These grain boundaries generally had low mismatch angles of up to ~10°. High-angle misorientation boundaries ranging up to 45° were generally associated with c-axis twist boundaries. The dominating misorientation angle for both sample sides and centre was found to be 4°. It is believed that the micro- and mesotexture distribution characteristics has influence over the Jc distribution in the transverse direction of Bi2223 tapes.