scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2002"


Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations


01 Jan 2002
TL;DR: Support Vector Machines (SVM’s) are intuitive, theoretically wellfounded, and have shown to be practically successful.
Abstract: Support Vector Machines (SVM’s) are a relatively new learning method used for binary classification. The basic idea is to find a hyperplane which separates the d-dimensional data perfectly into its two classes. However, since example data is often not linearly separable, SVM’s introduce the notion of a “kernel induced feature space” which casts the data into a higher dimensional space where the data is separable. Typically, casting into such a space would cause problems computationally, and with overfitting. The key insight used in SVM’s is that the higher-dimensional space doesn’t need to be dealt with directly (as it turns out, only the formula for the dot-product in that space is needed), which eliminates the above concerns. Furthermore, the VC-dimension (a measure of a system’s likelihood to perform well on unseen data) of SVM’s can be explicitly calculated, unlike other learning methods like neural networks, for which there is no measure. Overall, SVM’s are intuitive, theoretically wellfounded, and have shown to be practically successful. SVM’s have also been extended to solve regression tasks (where the system is trained to output a numerical value, rather than “yes/no” classification).

1,816 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors introduced a novel Gabor-Fisher (1936) classifier (GFC) for face recognition, which is robust to changes in illumination and facial expression, applies the enhanced Fisher linear discriminant model (EFM) to an augmented Gabor feature vector derived from the Gabor wavelet representation of face images.
Abstract: This paper introduces a novel Gabor-Fisher (1936) classifier (GFC) for face recognition. The GFC method, which is robust to changes in illumination and facial expression, applies the enhanced Fisher linear discriminant model (EFM) to an augmented Gabor feature vector derived from the Gabor wavelet representation of face images. The novelty of this paper comes from (1) the derivation of an augmented Gabor feature vector, whose dimensionality is further reduced using the EFM by considering both data compression and recognition (generalization) performance; (2) the development of a Gabor-Fisher classifier for multi-class problems; and (3) extensive performance evaluation studies. In particular, we performed comparative studies of different similarity measures applied to various classifiers. We also performed comparative experimental studies of various face recognition schemes, including our novel GFC method, the Gabor wavelet method, the eigenfaces method, the Fisherfaces method, the EFM method, the combination of Gabor and the eigenfaces method, and the combination of Gabor and the Fisherfaces method. The feasibility of the new GFC method has been successfully tested on face recognition using 600 FERET frontal face images corresponding to 200 subjects, which were acquired under variable illumination and facial expressions. The novel GFC method achieves 100% accuracy on face recognition using only 62 features.

1,759 citations


Journal ArticleDOI
TL;DR: A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.
Abstract: We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.

1,281 citations


Journal ArticleDOI
TL;DR: It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.
Abstract: The article presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a nonlinear data transformation into some high dimensional feature space increases the probability of the linear separability of the patterns within the transformed space and therefore simplifies the associated data structure. It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.

905 citations


Journal ArticleDOI
TL;DR: A probabilistic approach that is able to compensate for imprecisely localized, partially occluded, and expression-variant faces even when only one single training sample per class is available to the system.
Abstract: The classical way of attempting to solve the face (or object) recognition problem is by using large and representative data sets. In many applications, though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate for imprecisely localized, partially occluded, and expression-variant faces even when only one single training sample per class is available to the system. To solve the localization problem, we find the subspace (within the feature space, e.g., eigenspace) that represents this error for each of the training images. To resolve the occlusion problem, each face is divided into k local regions which are analyzed in isolation. In contrast with other approaches where a simple voting space is used, we present a probabilistic method that analyzes how "good" a local match is. To make the recognition system less sensitive to the differences between the facial expression displayed on the training and the testing images, we weight the results obtained on each local area on the basis of how much of this local area is affected by the expression displayed on the current test image.

885 citations


Journal ArticleDOI
TL;DR: A novel variational framework to deal with frame partition problems in Computer Vision that exploits boundary and region-based segmentation modules under a curve-based optimization objective function is presented.
Abstract: This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonstrate the potentials of the proposed framework. The textured feature space is generated by filtering the given textured images using isotropic and anisotropic filters, and analyzing their responses as multi-component conditional probability density functions. The texture segmentation is obtained by unifying region and boundary-based information as an improved Geodesic Active Contour Model. The defined objective function is minimized using a gradient-descent method where a level set approach is used to implement the obtained PDE. According to this PDE, the curve propagation towards the final solution is guided by boundary and region-based segmentation forces, and is constrained by a regularity force. The level set implementation is performed using a fast front propagation algorithm where topological changes are naturally handled. The performance of our method is demonstrated on a variety of synthetic and real textured frames.

867 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This work describes a representation of gait appearance based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion that contains enough information to perform well on human identification and gender classification tasks.
Abstract: We describe a representation of gait appearance for the purpose of person identification and classification This gait representation is based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion Despite its simplicity, the resulting feature vector contains enough information to perform well on human identification and gender classification tasks We explore the recognition behaviors of two different methods to aggregate features over time under different recognition tasks We demonstrate the accuracy of recognition using gait video sequences collected over different days and times and under varying lighting environments In addition, we show results for gender classification based our gait appearance features using a support-vector machine

775 citations


Journal ArticleDOI
TL;DR: The grating cell operator is the only one that selectively responds only to texture and does not give false response to nontexture features such as object contours and the texture detection capabilities of the operators are compared.
Abstract: Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear post-processing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and grating cell operator features. The capability of the corresponding operators to produce distinct feature vector clusters for different textures is compared using two methods: the Fisher (1923) criterion and the classification result comparison. Both methods give consistent results. The grating cell operator gives the best discrimination and segmentation results. The texture detection capabilities of the operators and their robustness to nontexture features are also compared. The grating cell operator is the only one that selectively responds only to texture and does not give false response to nontexture features such as object contours.

738 citations


Journal ArticleDOI
TL;DR: A set of real-world problems to random labelings of points is compared and it is found that real problems contain structures in this measurement space that are significantly different from the random sets.
Abstract: We studied a number of measures that characterize the difficulty of a classification problem, focusing on the geometrical complexity of the class boundary. We compared a set of real-world problems to random labelings of points and found that real problems contain structures in this measurement space that are significantly different from the random sets. Distributions of problems in this space show that there exist at least two independent factors affecting a problem's difficulty. We suggest using this space to describe a classifier's domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as subproblems formed by confinement, projection, and transformations of the feature vectors.

650 citations


Book ChapterDOI
01 Jan 2002
TL;DR: A new geometric framework for unsupervised anomaly detection is presented, which are algorithms that are designed to process unlabeled data to detect anomalies in sparse regions of the feature space.
Abstract: Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛd. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.

Journal ArticleDOI
TL;DR: This work applies the multiresolution wavelet transform to extract the waveletface and performs the linear discriminant analysis on waveletfaces to reinforce discriminant power.
Abstract: Feature extraction, discriminant analysis, and classification rules are three crucial issues for face recognition. We present hybrid approaches to handle three issues together. For feature extraction, we apply the multiresolution wavelet transform to extract the waveletface. We also perform the linear discriminant analysis on waveletfaces to reinforce discriminant power. During classification, the nearest feature plane (NFP) and nearest feature space (NFS) classifiers are explored for robust decisions in presence of wide facial variations. Their relationships to conventional nearest neighbor and nearest feature line classifiers are demonstrated. In the experiments, the discriminant waveletface incorporated with the NFS classifier achieves the best face recognition performance.

Proceedings Article
01 Jan 2002
TL;DR: Two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems are described.
Abstract: Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels of performance. This paper describes two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems. The approaches include both acoustic scoring and a recently developed GMM tokenization system that is based on a variation of phonetic recognition and language modeling. System performance is evaluated on both the CallFriend and OGI corpora.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The method consists of three major components: image preprocessing, feature extraction and classifier design, which uses an efficient approach called nearest feature line (NFL) for iris matching.
Abstract: Proposes a method for personal identification based on iris recognition. The method consists of three major components: image preprocessing, feature extraction and classifier design. A bank of circular symmetric filters is used to capture local iris characteristics to form a fixed length feature vector. In iris matching, an efficient approach called nearest feature line (NFL) is used. Constraints are imposed on the original NFL method to improve performance. Experimental results show that the proposed method has an encouraging performance.

Book ChapterDOI
28 May 2002
TL;DR: The issues that need to be resolved before fully automated outdoor surveillance systems can be developed are discussed, and solutions to some of these problems are presented.
Abstract: In this paper we discuss the issues that need to be resolved before fully automated outdoor surveillance systems can be developed, and present solutions to some of these problems. Any outdoor surveillance system must be able to track objects moving in its field of view, classify these objects and detect some of their activities. We have developed a method to track and classify these objects in realistic scenarios. Object tracking in a single camera is performed using background subtraction, followed by region correspondence. This takes into account multiple cues including velocities, sizes and distances of bounding boxes. Objects can be classified based on the type of their motion. This property may be used to label objects as a single person, vehicle or group of persons. Our proposed method to classify objects is based upon detecting recurrent motion for each tracked object. We develop a specific feature vector called a 'Recurrent Motion Image' (RMI) to calculate repeated motion of objects. Different types of objects yield very different RMI's and therefore can easily be classified into different categories on the basis of their RMI. The proposed approach is very efficient both in terms of computational and space criteria. RMI's are further used to detect carried objects. We present results on a large number of real world sequences including the PETS 2001 sequences. Our surveillance system works in real time at approximately 15Hz for 320?240 resolution color images on a 1.7 GHz pentium-4 PC.

Journal ArticleDOI
TL;DR: This work proposes a method for generating artificial outliers, uniformly distributed in a hypersphere around the target set, and gets an efficient estimate for the volume covered by the one-class classifiers.
Abstract: In one-class classification, one class of data, called the target class, has to be distinguished from the rest of the feature space. It is assumed that only examples of the target class are available. This classifier has to be constructed such that objects not originating from the target set, by definition outlier objects, are not classified as target objects. In previous research the support vector data description (SVDD) is proposed to solve the problem of one-class classification. It models a hypersphere around the target set, and by the introduction of kernel functions, more flexible descriptions are obtained. In the original optimization of the SVDD, two parameters have to be given beforehand by the user. To automatically optimize the values for these parameters, the error on both the target and outlier data has to be estimated. Because no outlier examples are available, we propose a method for generating artificial outliers, uniformly distributed in a hypersphere. An (relative) efficient estimate for the volume covered by the one-class classifiers is obtained, and so an estimate for the outlier error. Results are shown for artificial data and for real world data.

01 Jan 2002
TL;DR: A new approach for personal identification based on iris recognition is presented, which uses a bank of Gabor filters to capture both local and global iris characteristics to form a fixed length feature vector.
Abstract: A new approach for personal identification based on iris recognition is presented in this paper The body of this paper details the steps of iris recognition, including image preprocessing, feature extraction and classifier design The proposed algorithm uses a bank of Gabor filters to capture both local and global iris characteristics to form a fixed length feature vector Iris matching is based on the weighted Euclidean distance between the two corresponding iris vectors and is therefore very fast Experimental results are reported to demonstrate the performance of the algorithm

Journal ArticleDOI
TL;DR: This article presents applications of entropic spanning graphs to imaging and feature clustering applications, naturally suited to applications where entropy and information divergence are used as discriminants.
Abstract: This article presents applications of entropic spanning graphs to imaging and feature clustering applications. Entropic spanning graphs span a set of feature vectors in such a way that the normalized spanning length of the graph converges to the entropy of the feature distribution as the number of random feature vectors increases. This property makes these graphs naturally suited to applications where entropy and information divergence are used as discriminants: texture classification, feature clustering, image indexing, and image registration. Among other areas, these problems arise in geographical information systems, digital libraries, medical information processing, video indexing, multisensor fusion, and content-based retrieval.

Journal ArticleDOI
TL;DR: This work discusses the relation between-support vector regression (-SVR) and v- support vector regression (v-SVR), and focuses on properties that are different from those of C- Support vector classification (C-SVC) andv-supportvector classification (v -SVC).
Abstract: We discuss the relation between e-support vector regression (e-SVR) and ν-support vector regression (ν-SVR). In particular, we focus on properties that are different from those of C-support vector classification (C-SVC) and ν-support vector classification (ν-SVC). We then discuss some issues that do not occur in the case of classification: the possible range of e and the scaling of target values. A practical decomposition method for ν-SVR is implemented, and computational experiments are conducted. We show some interesting numerical observations specific to regression.

Journal ArticleDOI
TL;DR: A feature-based face recognition system based on both 3D range data as well as 2D gray-level facial images and the best match in the model library is identified according to similarity function or Support Vector Machine.

Proceedings ArticleDOI
24 Jun 2002
TL;DR: A local Fourier transform is adopted as a texture representation scheme and eight characteristic maps for describing different aspects of cooccurrence relations of image pixels in each channel of the (SVcosH, SVsinH, V) color space are derived, resulting in a 48-dimensional feature vector.
Abstract: We adopt a local Fourier transform as a texture representation scheme and derive eight characteristic maps for describing different aspects of cooccurrence relations of image pixels in each channel of the (SVcosH, SVsinH, V) color space. Then we calculate the first and second moments of these maps as a representation of the natural color image pixel distribution, resulting in a 48-dimensional feature vector. The novel low-level feature is named color texture moments (CTM), which can also be regarded as a certain extension to color moments in eight aspects through eight orthogonal templates. Experiments show that this new feature can achieve good retrieval performance for CBIR.

Patent
25 Apr 2002
TL;DR: In this article, a spoken query is represented as a lattice indicating possible sequential combinations of words in the spoken query, and the lattice is converted to a query certainty vector.
Abstract: A system and method indexes and retrieves documents stored in a database. A document feature vector is extracted for each document to be indexed. The feature vector is projected to a low dimension document feature vector, and the documents are indexed according to the low dimension document feature vectors. A spoken query is represented as a lattice indicating possible sequential combinations of words in the spoken query. The lattice is converted to a query certainty vector, which is also projected to a low dimension query certainty vector. The low dimension query vector is compared to each of the low dimension document feature vectors to retrieve a matching result set of documents.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: It is shown that using mixtures of kernels can result in having both good interpolation and extrapolation abilities and the performance is illustrated with an artificial as well as an industrial data set.
Abstract: Kernels are used in support vector machines to map the learning data (nonlinearly) into a higher dimensional feature space where the computational power of the linear learning machine is increased. Every kernel has its advantages and disadvantages. A desirable characteristic for learning may not be a desirable characteristic for generalization. Preferably the 'good' characteristics of two or more kernels should be combined. It is shown that using mixtures of kernels can result in having both good interpolation and extrapolation abilities. The performance of this method is illustrated with an artificial as well as an industrial data set.

Journal ArticleDOI
TL;DR: A fully automatic method is presented to detect abnormalities in frontal chest radiographs which are aggregated into an overall abnormality score, aimed at finding abnormal signs of a diffuse textural nature, such as they are encountered in mass chest screening against tuberculosis (TB).
Abstract: A fully automatic method is presented to detect abnormalities in frontal chest radiographs which are aggregated into an overall abnormality score. The method is aimed at finding abnormal signs of a diffuse textural nature, such as they are encountered in mass chest screening against tuberculosis (TB). The scheme starts with automatic segmentation of the lung fields, using active shape models. The segmentation is used to subdivide the lung fields into overlapping regions of various sizes. Texture features are extracted from each region, using the moments of responses to a multiscale filter bank. Additional "difference features" are obtained by subtracting feature vectors from corresponding regions in the left and right lung fields. A separate training set is constructed for each region. All regions are classified by voting among the k nearest neighbors, with leave-one-out. Next, the classification results of each region are combined, using a weighted multiplier in which regions with higher classification reliability weigh more heavily. This produces an abnormality score for each image. The method is evaluated on two databases. The first database was collected from a TB mass chest screening program, from which 147 images with textural abnormalities and 241 normal images were selected. Although this database contains many subtle abnormalities, the classification has a sensitivity of 0.86 at a specificity of 0.50 and an area under the receiver operating characteristic (ROC) curve of 0.820. The second database consist of 100 normal images and 100 abnormal images with interstitial disease. For this database, the results were a sensitivity of 0.97 at a specificity of 0.90 and an area under the ROC curve of 0.986.

01 Jan 2002
TL;DR: A high-frequency device, comprising a number of solenoid coils which are arranged in a row along the edge of a board, and which is provided with apertures which are coaxial with the coils.
Abstract: A high-frequency device, comprising a number of solenoid coils which are arranged in a row along the edge of a board. Parallel to the row of coils there is provided a plate which extends perpendicularly to the board and which is provided with apertures which are coaxial with the coils. Adjusting cores for adjusting the coils are provided in the apertures.

Journal ArticleDOI
TL;DR: This work investigates the use of interior-point methods for solving quadratic programming problems with a small number of linear constraints, where the quadratics term consists of a low-rank update to a positive semidefinite matrix.
Abstract: We investigate the use of interior-point methods for solving quadratic programming problems with a small number of linear constraints, where the quadratic term consists of a low-rank update to a positive semidefinite matrix. Several formulations of the support vector machine fit into this category. An interesting feature of these particular problems is the volume of data, which can lead to quadratic programs with between 10 and 100 million variables and, if written explicitly, a dense Q matrix. Our code is based on OOQP, an object-oriented interior-point code, with the linear algebra specialized for the support vector machine application. For the targeted massive problems, all of the data is stored out of core and we overlap computation and input/output to reduce overhead. Results are reported for several linear support vector machine formulations demonstrating that the method is reliable and scalable.

Proceedings Article
01 Jan 2002
TL;DR: An application of gray level co-occurrence matrix (GLCM) to texturebased similarity evaluation of rock images could reduce the cost of geological investigations by allowing improved accuracy in automatic rock sample selection.
Abstract: Nowadays, as the computational power increases, the role of automatic visual inspection becomes more important. Therefore, also visual quality control has gained in popularity. This paper presents an application of gray level co-occurrence matrix (GLCM) to texturebased similarity evaluation of rock images. Retrieval results were evaluated for two databases, one consisting of the whole images and the other with blocks obtained by splitting the original images. Retrieval results for both databases were obtained by calculating distance between the feature vector of the query image and other feature vectors in the database. Performance of the cooccurrence matrices was also compared to that of Gabor wavelet features. Co-occurrence matrices performed better for the given rock image dataset. This similarity evaluation application could reduce the cost of geological investigations by allowing improved accuracy in automatic rock sample selection.

Journal ArticleDOI
TL;DR: This work proposes a general active learning framework for content-based information retrieval and uses this framework to guide hidden annotations in order to improve the retrieval performance.
Abstract: We propose a general active learning framework for content-based information retrieval. We use this framework to guide hidden annotations in order to improve the retrieval performance. For each object in the database, we maintain a list of probabilities, each indicating the probability of this object having one of the attributes. During training, the learning algorithm samples objects in the database and presents them to the annotator to assign attributes. For each sampled object, each probability is set to be one or zero depending on whether or not the corresponding attribute is assigned by the annotator. For objects that have not been annotated, the learning algorithm estimates their probabilities with biased kernel regression. Knowledge gain is then defined to determine, among the objects that have not been annotated, which one the system is the most uncertain. The system then presents it as the next sample to the annotator to which it is assigned attributes. During retrieval, the list of probabilities works as a feature vector for us to calculate the semantic distance between two objects, or between the user query and an object in the database. The overall distance between two objects is determined by a weighted sum of the semantic distance and the low-level feature distance. The algorithm is tested on both synthetic databases and real databases of 3D models. In both cases, the retrieval performance of the system improves rapidly with the number of annotated samples. Furthermore, we show that active learning outperforms learning based on random sampling.

Dissertation
01 Jan 2002
TL;DR: This dissertation takes a relationship-based approach to cluster analysis of high (1000 and more) dimensional data that side-steps the ‘curse of dimensionality’ issue by working in a suitable similarity space instead of the original feature space, and proposes two frameworks that leverage graph algorithms to achieve relationship- based clustering and visualization, respectively.
Abstract: This dissertation takes a relationship-based approach to cluster analysis of high (1000 and more) dimensional data that side-steps the ‘curse of dimensionality’ issue by working in a suitable similarity space instead of the original feature space. We propose two frameworks that leverage graph algorithms to achieve relationship-based clustering and visualization, respectively. In the visualization framework, the output from the clustering algorithm is used to reorder the data points so that the resulting permuted similarity matrix can be readily visualized in 2 dimensions, with clusters showing up as bands. Results on retail transaction, document (bag-of-words), and web-log data show that our approach can yield superior results while also taking additional balance constraints into account. The choice of similarity is a critical step in relationship-based clustering and this motivates our systematic comparative study of the impact of similarity measures on the quality of document clusters . The key findings of our experimental study are: (i) Cosine, correlation, and extended Jaccard similarities perform comparably; (ii) Euclidean distances do not work well; (iii) graph partitioning tends to be superior to k-means and SOMs especially when balanced clusters are desired; and (iv) performance curves generally do not cross. We also propose a cluster quality evaluation measure based on normalized mutual information and find an analytical relation between similarity measures. It is widely recognized that combining multiple classification or regression models typically provides superior results compared to using a single, well-tuned model. However, there are no well known approaches to combining multiple clusterings. The idea of combining cluster labelings without accessing the original features leads to a general knowledge reuse framework that we call cluster ensembles. We propose a formal definition of the cluster ensemble as an optimization problem. Taking a relationship-based approach we propose three effective and efficient combining algorithms for solving it heuristically based on a hypergraph model. Results on synthetic as well as real data-sets show that cluster ensembles can (i) improve quality and robustness, and (ii) enable distributed clustering, and (iii) speed up processing significantly with little loss in quality.

Proceedings ArticleDOI
03 Dec 2002
TL;DR: It is argued that feature selection is an important issue in gender classification and demonstrated that Genetic Algorithms (GA) can select good subsets of features (i.e., features that encode mostly gender information), reducing the classification error.
Abstract: We consider the problem of gender classification from frontal facial images using genetic feature subset selection. We argue that feature selection is an important issue in gender classification and demonstrate that Genetic Algorithms (GA) can select good subsets of features (i.e., features that encode mostly gender information), reducing the classification error. First, Principal Component Analysis (PCA) is used to represent each image as a feature vector (i.e., eigen-features) in a low-dimensional space. Genetic Algorithms (GAs) are then employed to select a subset of features from the low-dimensional representation by disregarding certain eigenvectors that do not seem to encode important gender information. Four different classifiers were compared in this study using genetic feature subset selection: a Bayes classifier, a Neural Network (NN) classifier, a Support Vector Machine (SVM) classifier, and a classifier based on Linear Discriminant Analysis (LDA). Our experimental results show a significant error rate reduction in all cases. The best performance was obtained using the SVM classifier. Using only 8.4% of the features in the complete set, the SVM classifier achieved an error rate of 4.7% from an average error rate of 8.9% using manually selected features.