scispace - formally typeset
Search or ask a question

Showing papers by "Stan Z. Li published in 2003"


Journal ArticleDOI
TL;DR: The SVMs with a binary tree recognition strategy are used to tackle the audio classification problem and experimental comparisons for audio retrieval are presented to show the superiority of this novel metric, called distance-from-boundary (DFB).
Abstract: Support vector machines (SVMs) have been recently proposed as a new learning algorithm for pattern recognition. In this paper, the SVMs with a binary tree recognition strategy are used to tackle the audio classification problem. We illustrate the potential of SVMs on a common audio database, which consists of 409 sounds of 16 classes. We compare the SVMs based classification with other popular approaches. For audio retrieval, we propose a new metric, called distance-from-boundary (DFB). When a query audio is given, the system first finds a boundary inside which the query pattern is located. Then, all the audio patterns in the database are sorted by their distances to this boundary. All boundaries are learned by the SVMs and stored together with the audio database. Experimental comparisons for audio retrieval are presented to show the superiority of this novel metric to other similarity measures.

455 citations


Journal ArticleDOI
TL;DR: Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation and shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.
Abstract: Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

251 citations


Journal ArticleDOI
TL;DR: A new feedback approach with progressive learning capability combined with a novel method for the feature subspace extraction based on a Bayesian classifier that treats positive and negative feedback examples with different strategies to improve the retrieval accuracy.
Abstract: Research has been devoted in the past few years to relevance feedback as an effective solution to improve performance of content-based image retrieval (CBIR). In this paper, we propose a new feedback approach with progressive learning capability combined with a novel method for the feature subspace extraction. The proposed approach is based on a Bayesian classifier and treats positive and negative feedback examples with different strategies. Positive examples are used to estimate a Gaussian distribution that represents the desired images for a given query; while the negative examples are used to modify the ranking of the retrieved candidates. In addition, feature subspace is extracted and updated during the feedback process using a principal component analysis (PCA) technique and based on user's feedback. That is, in addition to reducing the dimensionality of feature spaces, a proper subspace for each type of features is obtained in the feedback process to further improve the retrieval accuracy. Experiments demonstrate that the proposed method increases the retrieval speed, reduces the required memory and improves the retrieval accuracy significantly.

214 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: A method in which Gabor wavelet features are used for modeling local image structure, in which the ability of W-ASM to accurately align and locate facial features is demonstrated.
Abstract: Active shape model (ASM) is a powerful statistical tool for face alignment by shape. However, it can suffer from changes in illumination and facial expression changes, and local minima in optimization. In this paper, we present a method, W-ASM, in which Gabor wavelet features are used for modeling local image structure. The magnitude and phase of Gabor features contain rich information about the local structural features of face images to be aligned, and provide accurate guidance for search. To a large extent, this repairs defects in gray scale based search. An E-M algorithm is used to model the Gabor feature distribution, and a coarse-to-fine grained search is used to position local features in the image. Experimental results demonstrate the ability of W-ASM to accurately align and locate facial features.

105 citations


Journal ArticleDOI
TL;DR: A texture-constrained active shape model (TC-ASM) to localize a face in an image that performs stable to initialization, accurate in shape localization and robust to illumination variation, with low computational cost.

77 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed BSM facial feature extraction algorithm is more accurate and effective as compared to that of the active shape model (ASM).

48 citations


Proceedings ArticleDOI
17 Oct 2003
TL;DR: This work shows that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights, and presents a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image.
Abstract: We present a general framework for face modeling under varying lighting conditions. First, we show that a face lighting subspace can be constructed based on three or more training face images illuminated by noncoplanar lights. The lighting of any face image can be represented as a point in this subspace. Second, we show that the extreme rays, i.e. the boundary of an illumination cone, cover the entire light sphere. Therefore, a relatively sparsely sampled face images can be used to build a face model instead of calculating each extremely illuminated face image. Third, we present a face normalization algorithm, illumination alignment, i.e. changing the lighting of one face image to that of another face image. Experiments are presented.

42 citations


Journal ArticleDOI
TL;DR: A novel appearance model, called direct appearance model (DAM), is proposed and its extended view‐based models are applied for multiview face alignment and it can converge more quickly and has higher accuracy.
Abstract: Accurate face alignment is the prerequisite for many computer vision problems, such as face recognition, synthesis and 3D face modeling. In this article, a novel appearance model, called direct appearance model (DAM), is proposed and its extended view-based models are applied for multiview face alignment. Similar to the active appearance model (AAM), DAM also makes ingenious use of both shape and texture constraints; however, it does not combine them as in AAM; texture information is used directly to predict the shape and estimate the position and appearance (hence the name DAM). The way that DAM models shapes and textures has the following advantages as compared with AAM: (1) DAM subspaces include admissible appearances previously unseen in AAM, (2) it can converge more quickly and has higher accuracy, and (3) the memory requirement is cut down to a large extent. Extensive experiments are presented to evaluate the DAM alignment in comparison with AAM. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13: 106–112, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10039

32 citations


Proceedings ArticleDOI
17 Oct 2003
TL;DR: A Bayesian network based multimodal fusion method for robust and real-time face tracking with a prior of second order system dynamics, and the likelihood cues from color, edge and face appearance is presented.
Abstract: We present a Bayesian network based multimodal fusion method for robust and real-time face tracking. The Bayesian network integrates a prior of second order system dynamics, and the likelihood cues from color, edge and face appearance. While different modalities have different confidence scales, we encode the environmental factors related to the confidences of modalities into the Bayesian network, and develop a Fisher discriminant analysis method for learning optimal fusion. The face tracker may track multiple faces under different poses. It is made up of two stages. First hypotheses are efficiently generated using a coarse-to-fine strategy; then multiple modalities are integrated in the Bayesian network to evaluate the posterior of each hypothesis. The hypothesis that maximizes a posterior (MAP) is selected as the estimate of the object state. Experimental results demonstrate the robustness and real-time performance of our face tracking approach.

26 citations


Proceedings Article
Stan Z. Li1, Dong Zhang, Chengyuan Ma, Heung-Yeung Shum, Eric Chang 
01 Jan 2003
TL;DR: The proposed AdaBoost-GMM method is non-parametric in which a selected set of weak classifiers, each constructed based on a single Gaussian model, is optimally combined to form a strong classifier, the optimality being in the sense of maximum margin.
Abstract: The Gaussian mixture models (GMM) has proved to be an effective probabilistic model for speaker verification, and has been widely used in most of state-of-the-art systems. In this paper, we introduce a new method for the task: that using AdaBoost learning based on the GMM. The motivation is the following: While a GMM linearly combines a number of Gaussian models according to a set of mixing weights, we believe that there exists a better means of combining individual Gaussian mixture models. The proposed AdaBoost-GMM method is non-parametric in which a selected set of weak classifiers, each constructed based on a single Gaussian model, is optimally combined to form a strong classifier, the optimality being in the sense of maximum margin. Experiments show that the boosted GMM classifier yields 10.81% relative reduction in equal error rate for the same handsets and 11.24% for different handsets, a significant improvement over the baseline adapted GMM system.

14 citations


Proceedings Article
01 Jan 2003
TL;DR: It is argued that global features, like those derived from Principal Component Analysis, can be advantageously used in the later stages of boosting, when local features do not provide any further benefit, without affecting computational complexity.
Abstract: Boosting-based methods have recently led to the state-of-the-art face detection systems. In these systems, weak classifiers to be boosted are based on simple, local, Haar-like features. However, it can be empirically observed that in later stages of the boosting process, the non-face examples collected by bootstrapping become very similar to the face examples, and the classification error of Haar-like feature-based weak classifiers is thus very close to 50%. As a result, the performance of a face detector cannot be further improved. This paper proposed a solution to this problem, introducing a face detection method based on boosting in hierarchical feature spaces (both local and global). We argue that global features, like those derived from Principal Component Analysis, can be advantageously used in the later stages of boosting, when local features do not provide any further benefit, without affecting computational complexity. We show, based on statistics of face and non-face examples, that weak classifiers learned in hierarchical feature spaces are better boosted. Our methodology leads to a face detection system that achieves higher performance than the current state-of-the-art system, at a comparable speed.

01 Jan 2003
TL;DR: By optimizing subspace explanation proportion, the overall performance of ASM can improve by a percentage of about 20 in the authors' experiments, and a method to estimate the optimal explanation proportion is proposed.
Abstract: Active Shape Models (ASM) is a powerful statistical tool for extracting objects, e.g. face, from images. It is composed of two parts: ASM model and ASM search. In ASM, these two parts are treated separately. First, ASM model is trained. Then, ASM search is performed using this model. However, we find that these two parts are closely interrelated. The performance of ASM depends on both of them. Improvement on one of them does not consequentially improve the overall performance, for it may worsen the other. In this paper, we find the key parameter that relates these two parts: subspace explanation proportion. By optimizing subspace explanation proportion, the overall performance of ASM can improve by a percentage of about 20 in our experiments. Furthermore, this paper proposes to decompose the ASM overall error into ASM model subspace reconstruction error and ASM search error, proving that the square of the subspace reconstruction error is linearly related with the subspace explanation proportion and finding that the square of the search error is a piecewise function of the explanation proportion. This decomposition is a new method for further analysis and possible improvement. Based on this decomposition, we propose a method to estimate the optimal explanation proportion. Experiments show that the estimation is satisfactory.

Journal ArticleDOI
TL;DR: In this article, the effects of varying mechanical deformations on the relationship between mesotexture and current percolation in (Bi, Pb)2Sr2Ca2Cu3O10+x (Bi2223) tapes are investigated.
Abstract: In this work, effects of varying mechanical deformations on the relationship between mesotexture and current percolation in (Bi, Pb)2Sr2Ca2Cu3O10+x (Bi2223) tapes are investigated. Electron backscattered diffraction analysis demonstrates that the mesotexture distribution characteristics influence critical current density (Jc) as results of the processing variations. The disorientation angle distribution dependence of Jc is also discussed using current percolation theory. The results show that improving the mesotexture distribution in central region of Bi2223 tapes through optimization of the mechanical deformation processing can significantly increase Jc.