Showing papers by "Stan Z. Li published in 2003"

PDF

Open Access

Journal Article•DOI•

Content-based audio classification and retrieval by support vector machines

[...]

Guodong Guo¹, Stan Z. Li²•Institutions (2)

University of Wisconsin-Madison¹, Microsoft²

01 Jan 2003-IEEE Transactions on Neural Networks

TL;DR: The SVMs with a binary tree recognition strategy are used to tackle the audio classification problem and experimental comparisons for audio retrieval are presented to show the superiority of this novel metric, called distance-from-boundary (DFB).

...read moreread less

Abstract: Support vector machines (SVMs) have been recently proposed as a new learning algorithm for pattern recognition. In this paper, the SVMs with a binary tree recognition strategy are used to tackle the audio classification problem. We illustrate the potential of SVMs on a common audio database, which consists of 409 sounds of 16 classes. We compare the SVMs based classification with other popular approaches. For audio retrieval, we propose a new metric, called distance-from-boundary (DFB). When a query audio is given, the system first finds a boundary inside which the query pattern is located. Then, all the audio patterns in the database are sorted by their distances to this boundary. All boundaries are learned by the SVMs and stored together with the audio database. Experimental comparisons for audio retrieval are presented to show the superiority of this novel metric to other similarity measures.

...read moreread less

455 citations

Journal Article•DOI•

Content-based audio classification and segmentation by using support vector machines

[...]

Lie Lu¹, Hong-Jiang Zhang¹, Stan Z. Li¹•Institutions (1)

Microsoft¹

01 Apr 2003-Multimedia Systems

TL;DR: Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation and shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

...read moreread less

Abstract: Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

...read moreread less

251 citations

Journal Article•DOI•

Relevance feedback in content-based image retrieval: Bayesian framework, feature subspaces, and progressive learning

[...]

Zhong Su¹, Hong-Jiang Zhang², Stan Z. Li², Shaoping Ma¹•Institutions (2)

Tsinghua University¹, Microsoft²

01 Aug 2003-IEEE Transactions on Image Processing

TL;DR: A new feedback approach with progressive learning capability combined with a novel method for the feature subspace extraction based on a Bayesian classifier that treats positive and negative feedback examples with different strategies to improve the retrieval accuracy.

...read moreread less

Abstract: Research has been devoted in the past few years to relevance feedback as an effective solution to improve performance of content-based image retrieval (CBIR). In this paper, we propose a new feedback approach with progressive learning capability combined with a novel method for the feature subspace extraction. The proposed approach is based on a Bayesian classifier and treats positive and negative feedback examples with different strategies. Positive examples are used to estimate a Gaussian distribution that represents the desired images for a given query; while the negative examples are used to modify the ranking of the retrieved candidates. In addition, feature subspace is extracted and updated during the feedback process using a principal component analysis (PCA) technique and based on user's feedback. That is, in addition to reducing the dimensionality of feature spaces, a proper subspace for each type of features is obtained in the feedback process to further improve the retrieval accuracy. Experiments demonstrate that the proposed method increases the retrieval speed, reduces the required memory and improves the retrieval accuracy significantly.

...read moreread less

214 citations

Proceedings Article•DOI•

Face alignment using statistical models and wavelet features

[...]

Feng Jiao¹, Stan Z. Li², Heung-Yeung Shum², Dale Schuurmans¹•Institutions (2)

University of Waterloo¹, Microsoft²

18 Jun 2003

TL;DR: A method in which Gabor wavelet features are used for modeling local image structure, in which the ability of W-ASM to accurately align and locate facial features is demonstrated.

...read moreread less

Abstract: Active shape model (ASM) is a powerful statistical tool for face alignment by shape. However, it can suffer from changes in illumination and facial expression changes, and local minima in optimization. In this paper, we present a method, W-ASM, in which Gabor wavelet features are used for modeling local image structure. The magnitude and phase of Gabor features contain rich information about the local structural features of face images to be aligned, and provide accurate guidance for search. To a large extent, this repairs defects in gray scale based search. An E-M algorithm is used to model the Gabor feature distribution, and a coarse-to-fine grained search is used to position local features in the image. Experimental results demonstrate the ability of W-ASM to accurately align and locate facial features.

...read moreread less

105 citations

Journal Article•DOI•

Face alignment using texture-constrained active shape models☆

[...]

Shuicheng Yan¹, Ce Liu², Stan Z. Li², Hong-Jiang Zhang², Heung-Yeung Shum², Qiansheng Cheng¹ - Show less +2 more•Institutions (2)

Peking University¹, Microsoft²

10 Jan 2003-Image and Vision Computing

TL;DR: A texture-constrained active shape model (TC-ASM) to localize a face in an image that performs stable to initialization, accurate in shape localization and robust to illumination variation, with low computational cost.

...read moreread less

77 citations

Journal Article•DOI•

Bayesian shape model for facial feature extraction and recognition

[...]

Zhong Xue¹, Stan Z. Li², Eam Khwang Teoh¹•Institutions (2)

Nanyang Technological University¹, Microsoft²

01 Dec 2003-Pattern Recognition

TL;DR: Experimental results demonstrate that the proposed BSM facial feature extraction algorithm is more accurate and effective as compared to that of the active shape model (ASM).

...read moreread less

48 citations

Proceedings Article•DOI•

Illumination modeling and normalization for face recognition

[...]

Haitao Wang¹, Stan Z. Li, Yangsheng Wang, Weiwei Zhang•Institutions (1)

Chinese Academy of Sciences¹

17 Oct 2003

TL;DR: This work shows that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights, and presents a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image.

...read moreread less

Abstract: We present a general framework for face modeling under varying lighting conditions. First, we show that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights. The lighting of any face image can be represented as a point in this subspace. Second, we show that the extreme rays, i.e. the boundary of an illumination cone, cover the entire light sphere. Therefore, a relatively sparsely sampled face images can be used to build a face model instead of calculating each extremely illuminated face image. Third, we present a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image. Experiments are presented.

...read moreread less

42 citations

Journal Article•DOI•

Face alignment using view‐based direct appearance models

[...]

Shuicheng Yan¹, Xinwen Hou², Stan Z. Li³, Hong-Jiang Zhang³, Qiansheng Cheng¹ - Show less +1 more•Institutions (3)

Peking University¹, Nankai University², Microsoft³

01 Jan 2003-International Journal of Imaging Systems and Technology

TL;DR: A novel appearance model, called direct appearance model (DAM), is proposed and its extended view‐based models are applied for multiview face alignment and it can converge more quickly and has higher accuracy.

...read moreread less

Abstract: Accurate face alignment is the prerequisite for many computer vision problems, such as face recognition, synthesis and 3D face modeling. In this article, a novel appearance model, called direct appearance model (DAM), is proposed and its extended view-based models are applied for multiview face alignment. Similar to the active appearance model (AAM), DAM also makes ingenious use of both shape and texture constraints; however, it does not combine them as in AAM; texture information is used directly to predict the shape and estimate the position and appearance (hence the name DAM). The way that DAM models shapes and textures has the following advantages as compared with AAM: (1) DAM subspaces include admissible appearances previously unseen in AAM, (2) it can converge more quickly and has higher accuracy, and (3) the memory requirement is cut down to a large extent. Extensive experiments are presented to evaluate the DAM alignment in comparison with AAM. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13: 106–112, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10039

...read moreread less

32 citations

Proceedings Article•DOI•

Multi-modal face tracking using Bayesian network

[...]

Fang Liu¹, Xueyin Lin¹, Stan Z. Li, Yuanchun Shi•Institutions (1)

Tsinghua University¹

17 Oct 2003

TL;DR: A Bayesian network based multimodal fusion method for robust and real-time face tracking with a prior of second order system dynamics, and the likelihood cues from color, edge and face appearance is presented.

...read moreread less

Abstract: We present a Bayesian network based multimodal fusion method for robust and real-time face tracking. The Bayesian network integrates a prior of second order system dynamics, and the likelihood cues from color, edge and face appearance. While different modalities have different confidence scales, we encode the environmental factors related to the confidences of modalities into the Bayesian network, and develop a Fisher discriminant analysis method for learning optimal fusion. The face tracker may track multiple faces under different poses. It is made up of two stages. First hypotheses are efficiently generated using a coarse-to-fine strategy; then multiple modalities are integrated in the Bayesian network to evaluate the posterior of each hypothesis. The hypothesis that maximizes a posterior (MAP) is selected as the estimate of the object state. Experimental results demonstrate the robustness and real-time performance of our face tracking approach.

...read moreread less

26 citations

Proceedings Article•

Learning to boost GMM based speaker verification.

[...]

Stan Z. Li¹, Dong Zhang, Chengyuan Ma, Heung-Yeung Shum, Eric Chang - Show less +1 more•Institutions (1)

Microsoft¹

01 Jan 2003

TL;DR: The proposed AdaBoost-GMM method is non-parametric in which a selected set of weak classifiers, each constructed based on a single Gaussian model, is optimally combined to form a strong classifier, the optimality being in the sense of maximum margin.

...read moreread less

Abstract: The Gaussian mixture models (GMM) has proved to be an effective probabilistic model for speaker verification, and has been widely used in most of state-of-the-art systems. In this paper, we introduce a new method for the task: that using AdaBoost learning based on the GMM. The motivation is the following: While a GMM linearly combines a number of Gaussian models according to a set of mixing weights, we believe that there exists a better means of combining individual Gaussian mixture models. The proposed AdaBoost-GMM method is non-parametric in which a selected set of weak classifiers, each constructed based on a single Gaussian model, is optimally combined to form a strong classifier, the optimality being in the sense of maximum margin. Experiments show that the boosted GMM classifier yields 10.81% relative reduction in equal error rate for the same handsets and 11.24% for different handsets, a significant improvement over the baseline adapted GMM system.

...read moreread less

14 citations

Proceedings Article•

Real-Time Face Detection Using Boosting Learning in Hierarchical Feature Spaces

[...]

Dong Zhang, Stan Z. Li, Daniel Gatica-Perez

01 Jan 2003

TL;DR: It is argued that global features, like those derived from Principal Component Analysis, can be advantageously used in the later stages of boosting, when local features do not provide any further benefit, without affecting computational complexity.

...read moreread less

Abstract: Boosting-based methods have recently led to the state-of-the-art face detection systems. In these systems, weak classifiers to be boosted are based on simple, local, Haar-like features. However, it can be empirically observed that in later stages of the boosting process, the non-face examples collected by bootstrapping become very similar to the face examples, and the classification error of Haar-like feature-based weak classifiers is thus very close to 50%. As a result, the performance of a face detector cannot be further improved. This paper proposed a solution to this problem, introducing a face detection method based on boosting in hierarchical feature spaces (both local and global). We argue that global features, like those derived from Principal Component Analysis, can be advantageously used in the later stages of boosting, when local features do not provide any further benefit, without affecting computational complexity. We show, based on statistics of face and non-face examples, that weak classifiers learned in hierarchical feature spaces are better boosted. Our methodology leads to a face detection system that achieves higher performance than the current state-of-the-art system, at a comparable speed.

...read moreread less

Parameter optimization for active shape models

[...]

Chun Chen¹, Ming Zhao, Stan Z. Li, Jiajun Bu•Institutions (1)

Zhejiang University¹

01 Jan 2003

TL;DR: By optimizing subspace explanation proportion, the overall performance of ASM can improve by a percentage of about 20 in the authors' experiments, and a method to estimate the optimal explanation proportion is proposed.

...read moreread less

Abstract: Active Shape Models (ASM) is a powerful statistical tool for extracting objects, e.g. face, from images. It is composed of two parts: ASM model and ASM search. In ASM, these two parts are treated separately. First, ASM model is trained. Then, ASM search is performed using this model. However, we find that these two parts are closely interrelated. The performance of ASM depends on both of them. Improvement on one of them does not consequentially improve the overall performance, for it may worsen the other. In this paper, we find the key parameter that relates these two parts: subspace explanation proportion. By optimizing subspace explanation proportion, the overall performance of ASM can improve by a percentage of about 20 in our experiments. Furthermore, this paper proposes to decompose the ASM overall error into ASM model subspace reconstruction error and ASM search error, proving that the square of the subspace reconstruction error is linearly related with the subspace explanation proportion and finding that the square of the search error is a piecewise function of the explanation proportion. This decomposition is a new method for further analysis and possible improvement. Based on this decomposition, we propose a method to estimate the optimal explanation proportion. Experiments show that the estimation is satisfactory.

...read moreread less

Journal Article•DOI•

Effects of varying mechanical deformations on the relationship between mesotexture and current percolation in (Bi, Pb)2Sr2Ca2Cu3O10/Ag superconductor tapes

[...]

Thiam Teck Tan¹, Stan Z. Li¹, Shu Ping Lau¹, Y Y Tay¹, Chang Q. Sun¹, Sihai Zhou², Shi Xue Dou² - Show less +3 more•Institutions (2)

Nanyang Technological University¹, University of Wollongong²

09 Jul 2003-Superconductor Science and Technology

TL;DR: In this article, the effects of varying mechanical deformations on the relationship between mesotexture and current percolation in (Bi, Pb)2Sr2Ca2Cu3O10+x (Bi2223) tapes are investigated.

...read moreread less

Abstract: In this work, effects of varying mechanical deformations on the relationship between mesotexture and current percolation in (Bi, Pb)2Sr2Ca2Cu3O10+x (Bi2223) tapes are investigated. Electron backscattered diffraction analysis demonstrates that the mesotexture distribution characteristics influence critical current density (Jc) as results of the processing variations. The disorientation angle distribution dependence of Jc is also discussed using current percolation theory. The results show that improving the mesotexture distribution in central region of Bi2223 tapes through optimization of the mechanical deformation processing can significantly increase Jc.

...read moreread less