scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2003"


Proceedings Article
09 Dec 2003
TL;DR: A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points.
Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

4,205 citations


Proceedings Article
21 Aug 2003
TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.
Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.

3,908 citations


Journal ArticleDOI
TL;DR: Locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data, is described and several extensions that enhance its performance are discussed.
Abstract: The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

1,614 citations


Proceedings Article
09 Dec 2003
TL;DR: A unified framework for extending Local Linear Embedding, Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling as well as for Spectral Clustering is provided.
Abstract: Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.

1,072 citations


Proceedings Article
16 Sep 2003
TL;DR: This work combines active and semi-supervised learning techniques under a Gaussian random field model, which requires a much smaller number of queries to achieve high accuracy compared with random query selection.
Abstract: Active and semi-supervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semi-supervised learning problem is then formulated in terms of a Gaussian random field on this graph, the mean of which is characterized in terms of harmonic functions. Active learning is performed on top of the semisupervised learning scheme by greedily selecting queries from the unlabeled data to minimize the estimated expected classification error (risk); in the case of Gaussian fields the risk is efficiently computed using matrix methods. We present experimental results on synthetic data, handwritten digit recognition, and text classification tasks. The active learning scheme requires a much smaller number of queries to achieve high accuracy compared with random query selection.

578 citations


Proceedings Article
09 Dec 2003
TL;DR: It is argued that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm in situations where training data is abundant and computing resources are comparatively scarce.
Abstract: We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented.

440 citations


Journal ArticleDOI
TL;DR: In this article, a generalized nonlinear TAH learning rule was proposed that allows a balance between stability and sensitivity of learning, and the capacity of the system to learn patterns of correlations between afferent spike trains.
Abstract: Triggered by recent experimental results, temporally asymmetric Hebbian (TAH) plasticity is considered as a candidate model for the biological implementation of competitive synaptic learning, a key concept for the experience-based development of cortical circuitry. However, because of the well known positive feedback instability of correlation-based plasticity, the stability of the resulting learning process has remained a central problem. Plagued by either a runaway of the synaptic efficacies or a greatly reduced sensitivity to input correlations, the learning performance of current models is limited. Here we introduce a novel generalized nonlinear TAH learning rule that allows a balance between stability and sensitivity of learning. Using this rule, we study the capacity of the system to learn patterns of correlations between afferent spike trains. Specifically, we address the question of under which conditions learning induces spontaneous symmetry breaking and leads to inhomogeneous synaptic distributions that capture the structure of the input correlations. To study the efficiency of learning temporal relationships between afferent spike trains through TAH plasticity, we introduce a novel sensitivity measure that quantifies the amount of information about the correlation structure in the input, a learning rule capable of storing in the synaptic weights. We demonstrate that by adjusting the weight dependence of the synaptic changes in TAH plasticity, it is possible to enhance the synaptic representation of temporal input correlations while maintaining the system in a stable learning regime. Indeed, for a given distribution of inputs, the learning efficiency can be optimized.

401 citations


Journal ArticleDOI
TL;DR: This paper investigates two models of accuracy-based learning classifier systems on different types of classification problems, and provides a model on the learning complexity of LCS which is based on the representative examples given to the system.
Abstract: Recently, Learning Classifier Systems (LCS) and particularly XCS have arisen as promising methods for classification tasks and data mining. This paper investigates two models of accuracy-based learning classifier systems on different types of classification problems. Departing from XCS, we analyze the evolution of a complete action map as a knowledge representation. We propose an alternative, UCS, which evolves a best action map more efficiently. We also investigate how the fitness pressure guides the search towards accurate classifiers. While XCS bases fitness on a reinforcement learning scheme, UCS defines fitness from a supervised learning scheme. We find significant differences in how the fitness pressure leads towards accuracy, and suggest the use of a supervised approach specially for multi-class problems and problems with unbalanced classes. We also investigate the complexity factors which arise in each type of accuracy-based LCS. We provide a model on the learning complexity of LCS which is based on the representative examples given to the system. The results and observations are also extended to a set of real world classification problems, where accuracy-based LCS are shown to perform competitively with respect to other learning algorithms. The work presents an extended analysis of accuracy-based LCS, gives insight into the understanding of the LCS dynamics, and suggests open issues for further improvement of LCS on classification tasks.

339 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: This paper proposes to use adaptive hidden Markov models (HMM) to perform video-based face recognition and shows that the proposed algorithm results in better performance than using majority voting of image-based recognition results.
Abstract: While traditional face recognition is typically based on still images, face recognition from video sequences has become popular. In this paper, we propose to use adaptive hidden Markov models (HMM) to perform video-based face recognition. During the training process, the statistics of training video sequences of each subject, and the temporal dynamics, are learned by an HMM. During the recognition process, the temporal characteristics of the test video sequence are analyzed over time by the HMM corresponding to each subject. The likelihood scores provided by the HMMs are compared, and the highest score provides the identity of the test video sequence. Furthermore, with unsupervised learning, each HMM is adapted with the test video sequence, which results in better modeling over time. Based on extensive experiments with various databases, we show that the proposed algorithm results in better performance than using majority voting of image-based recognition results.

315 citations


Book ChapterDOI
TL;DR: This chapter extends the stability-based validation of cluster structure, and proposes stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices.
Abstract: Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2) . The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4) . Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3, 5, 6) , including sensitivity to drugs (7) , and can also detect artifacts of experimental conditions (8) . Clustering and its applications in biology are presented in greater detail in the chapter by Zhao and Karypis (see also (9) ). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well. Clustering is a form of unsupervised learning, i.e. no information on the class variable is assumed, and the objective is to find the “natural” groups in the data. However, most clustering algorithms generate a clustering even if the data has no inherent cluster structure, so external validation tools are required. Given a set of partitions of the data into an increasing number of clusters (e.g. by a hierarchical clustering algorithm, or k-means), such a validation tool will tell the user the number of clusters in the data (if any). Many methods have been proposed in the literature to address this problem (10–15) . Recent studies have shown the advantages of sampling-based methods (12, 14) . These methods are based on the idea that when a partition has captured the structure in the data, this partition should be stable with respect to perturbation of the data. Bittner et al. (16) used a similar approach to validate clusters representing gene expression of melanoma patients. The emergence of cluster structure depends on several choices: data representation and normalization, the choice of a similarity measure and clustering algorithm. In this chapter we extend the stability-based validation of cluster structure, and propose stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices. We use this framework to demonstrate the ability of Principal Component Analysis (PCA) to extract features relevant to the cluster structure. We use stability as a tool for simultaneously choosing the number of principal components and the number of clusters; we compare the performance of different similarity measures and normalization schemes. The approach is demonstrated through a case study of yeast gene expression data from Eisen et al. (1) . For yeast, a functional classification of a large number of genes is known, and we use this classification for validating the results produced by clustering. A method for comparing clustering solutions specifically applicable to gene expression data was introduced in (17) . However, it cannot be used to choose the number of clusters, and is not directly applicable in choosing the number of principal components. The results of clustering are easily corrupted by the addition of noise: even a few

288 citations


Proceedings ArticleDOI
13 Oct 2003
TL;DR: A new technique for training visual detectors which requires only a small quantity of labeled data, and then uses unlabeled data to improve performance over time is described, based on the cotraining framework of Blum and Mitchell.
Abstract: One significant challenge in the construction of visual detection systems is the acquisition of sufficient labeled data. We describe a new technique for training visual detectors which requires only a small quantity of labeled data, and then uses unlabeled data to improve performance over time. Unsupervised improvement is based on the cotraining framework of Blum and Mitchell, in which two disparate classifiers are trained simultaneously. Unlabeled examples which are confidently labeled by one classifier are added, with labels, to the training set of the other classifier. Experiments are presented on the realistic task of automobile detection in roadway surveillance video. In this application, cotraining reduces the false positive rate by a factor of 2 to 11 from the classifier trained with labeled data alone.

Journal ArticleDOI
TL;DR: The results show that the system using CQA retrieval doubled the doctors' diagnostic accuracy and developed a new algorithm called FSSEM (feature subset selection using expectation-maximization clustering), which radically improves the retrieval precision over the single feature vector approach.
Abstract: This paper describes a new hierarchical approach to content-based image retrieval called the "customized-queries" approach (CQA). Contrary to the single feature vector approach which tries to classify the query and retrieve similar images in one step, CQA uses multiple feature sets and a two-step approach to retrieval. The first step classifies the query according to the class labels of the images using the features that best discriminate the classes. The second step then retrieves the most similar images within the predicted class using the features customized to distinguish "subclasses" within that class. Needing to find the customized feature subset for each class led us to investigate feature selection for unsupervised learning. As a result, we developed a new algorithm called FSSEM (feature subset selection using expectation-maximization clustering). We applied our approach to a database of high resolution computed tomography lung images and show that CQA radically improves the retrieval precision over the single feature vector approach. To determine whether our CBIR system is helpful to physicians, we conducted an evaluation trial with eight radiologists. The results show that our system using CQA retrieval doubled the doctors' diagnostic accuracy.

Book ChapterDOI
03 Dec 2003
TL;DR: This research work proposes the utilization of the unsupervised Hebbian algorithm to nonlinear units for training FCMs and proposes the proposed learning procedure, which modifies its fuzzy causal web as causal patterns change and as experts update their causal knowledge.
Abstract: Fuzzy Cognitive Map (FCM) is a soft computing technique for modeling systems. It combines synergistically the theories of neural networks and fuzzy logic. The methodology of developing FCMs is easily adaptable but relies on human experience and knowledge, and thus FCMs exhibit weaknesses and dependence on human experts. The critical dependence on the expert’s opinion and knowledge, and the potential convergence to undesired steady states are deficiencies of FCMs. In order to overcome these deficiencies and improve the efficiency and robustness of FCM a possible solution is the utilization of learning methods. This research work proposes the utilization of the unsupervised Hebbian algorithm to nonlinear units for training FCMs. Using the proposed learning procedure, the FCM modifies its fuzzy causal web as causal patterns change and as experts update their causal knowledge.

Journal ArticleDOI
TL;DR: It is suggested that the phasic and tonic components of dopamine neuron firing can encode the signal required for meta-learning of reinforcement learning.

Proceedings Article
21 Aug 2003
TL;DR: This paper analyzes the performance of semi-supervised learning of mixture models and shows that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error.
Abstract: This paper analyzes the performance of semi-supervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data. We discuss the impact of these theoretical results to practical situations.

Journal ArticleDOI
TL;DR: An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts automatically from unlabeled training data is presented.
Abstract: An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful "foreground" features as well as features that arise from irrelevant background clutter - the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences.

Book
21 Aug 2003
TL;DR: The results show clear trends in the direction of improvement in the level of supervised learning in relation to the recognition of meter values and in the application of Matrix Codes.
Abstract: I. Theory.- Neurobiological Background.- Related Work.- Neural Abstraction Pyramid Architecture.- Unsupervised Learning.- Supervised Learning.- II. Applications.- Recognition of Meter Values.- Binarization of Matrix Codes.- Learning Iterative Image Reconstruction.- Face Localization.- Summary and Conclusions.

Journal ArticleDOI
Heiko Wersing1, Edgar Körner1
TL;DR: This work proposes a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network.
Abstract: There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

Proceedings ArticleDOI
21 Oct 2003
TL;DR: The results indicate the power of the method to determine a meaningful user context model while only requiring data from a comfortable physiological sensor device.
Abstract: Context-aware computing describes the situationwhere a wearable / mobile computer is aware of itsuser's state and surroundings and modifies its behaviorbased on this information. We designed, implemented andevaluated a wearable system which can determine typicaluser context and context transition probabilities onlineand without external supervision. The system relies ontechniques from machine learning, statistical analysisand graph algorithms. It can be used for onlineclassification and prediction. Our results indicate thepower of our method to determine a meaningful usercontext model while only requiring data from acomfortable physiological sensor device.

Book ChapterDOI
21 Feb 2003
TL;DR: This chapter reviews the main clustering methods used for analyzing chemical structures pertinent to pharmaceutical discovery and is used increasingly in preliminary analyses of large data sets of medium and high dimensionality as a method of selection, diversity analysis, and data reduction.
Abstract: Clustering is a data analysis technique that, when applied to a set of heterogeneous items, identifies homogeneous subgroups as defined by a given model or measure of similarity. Of the many uses of clustering, a prime motivation for the increasing interest in clustering methods is their use in the selection and design of combinatorial libraries of chemical structures pertinent to pharmaceutical discovery. One feature of clustering is that the process is unsupervised, that is, there is no predefined grouping that the clustering seeks to reproduce. In contrast to supervised learning, where the task is to establish relationships between given inputs and outputs to enable prediction of the output from new inputs, in unsupervised learning only the inputs are available and the task is to reveal aspects of the underlying distribution of the input data. Clustering is thus complemented by the related supervised process of classification, in which items are assigned labels applied to predefined groups: examples include recursive partitioning, naive Bayesian analysis, and K nearest-neighbor selection. Clustering is a technique for exploratory data analysis and is used increasingly in preliminary analyses of large data sets of medium and high dimensionality as a method of selection, diversity analysis, and data reduction. This chapter reviews the main clustering methods that are used for analyzing chemical

Proceedings Article
21 Aug 2003
TL;DR: It is argued that the use of SVMs, particularly in combination with the kernel trick, can make it easier to apply reinforcement learning as an "out-of-the-box" technique, without extensive feature engineering.
Abstract: The basic tools of machine learning appear in the inner loop of most reinforcement learning algorithms, typically in the form of Monte Carlo methods or function approximation techniques. To a large extent, however, current reinforcement learning algorithms draw upon machine learning techniques that are at least ten years old and, with a few exceptions, very little has been done to exploit recent advances in classification learning for the purposes of reinforcement learning. We use a variant of approximate policy iteration based on rollouts that allows us to use a pure classification learner, such as a support vector machine (SVM), in the inner loop of the algorithm. We argue that the use of SVMs, particularly in combination with the kernel trick, can make it easier to apply reinforcement learning as an "out-of-the-box" technique, without extensive feature engineering. Our approach opens the door to modern classification methods, but does not preclude the use of classical methods. We present experimental results in the pendulum balancing and bicycle riding domains using both SVMs and neural networks for classifiers.

Proceedings ArticleDOI
Lin Liao1, Dieter Fox1, Jeffrey Hightower1, Henry Kautz1, Dirk Schulz1 
08 Dec 2003
TL;DR: This paper proposes a novel approach to tracking moving objects and their identity using noisy, sparse information collected by id-sensors such as infrared and ultrasound badge systems and demonstrates that EM-based learning of behavior patterns increases the tracking performance and provides valuable information for high-level behavior recognition.
Abstract: Tracking the activity of people in indoor environments has gained considerable attention in the robotics community over the last years. Most of the existing approaches are based on sensors, which allow to accurately determining the locations of people but do not provide means to distinguish between different persons. In this paper we propose a novel approach to tracking moving objects and their identity using noisy, sparse information collected by id-sensors such as infrared and ultrasound badge systems. The key idea of our approach is to use particle filters to estimate the locations of people on the Voronoi graph of the environment. By restricting particles to a graph, we make use of the inherent structure of indoor environments. The approach has two key advantages. First, it is by far more efficient and robust than unconstrained particle filters. Second, the Voronoi graph provides a natural discretization of human motion, which allows us to apply unsupervised learning techniques to derive typical motion patterns of the people in the environment. Experiments using a robot to collect ground-truth data indicate the superior performance of Voronoi tracking. Furthermore, we demonstrate that EM-based learning of behavior patterns increases the tracking performance and provides valuable information for high-level behavior recognition.

Proceedings ArticleDOI
07 Jul 2003
TL;DR: The experimental results show that the proposed competition learning approach to coreference resolution can outperform those based on the single-candidate model and applies a candidate filter to reduce the computational cost and data noises during training and resolution.
Abstract: In this paper we propose a competition learning approach to coreference resolution. Traditionally, supervised machine learning approaches adopt the single-candidate model. Nevertheless the preference relationship between the antecedent candidates cannot be determined accurately in this model. By contrast, our approach adopts a twin-candidate learning model. Such a model can present the competition criterion for antecedent candidates reliably, and ensure that the most preferred candidate is selected. Furthermore, our approach applies a candidate filter to reduce the computational cost and data noises during training and resolution. The experimental results on MUC-6 and MUC-7 data set show that our approach can outperform those based on the single-candidate model.

Journal ArticleDOI
01 Feb 2003
TL;DR: A neural fuzzy system with mixed coarse learning and fine learning phases is proposed, which is able to perform collision-free navigation and a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration.
Abstract: Fuzzy logic systems are promising for efficient obstacle avoidance. However, it is difficult to maintain the correctness, consistency, and completeness of a fuzzy rule base constructed and tuned by a human expert. A reinforcement learning method is capable of learning the fuzzy rules automatically. However, it incurs a heavy learning phase and may result in an insufficiently learned rule base due to the curse of dimensionality. In this paper, we propose a neural fuzzy system with mixed coarse learning and fine learning phases. In the first phase, a supervised learning method is used to determine the membership functions for input and output variables simultaneously. After sufficient training, fine learning is applied which employs reinforcement learning algorithm to fine-tune the membership functions for output variables. For sufficient learning, a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration. Through this two-step tuning approach, the mobile robot is able to perform collision-free navigation. To deal with the difficulty of acquiring a large amount of training data with high consistency for supervised learning, we develop a virtual environment (VE) simulator, which is able to provide desktop virtual environment (DVE) and immersive virtual environment (IVE) visualization. Through operating a mobile robot in the virtual environment (DVE/IVE) by a skilled human operator, training data are readily obtained and used to train the neural fuzzy system.

Proceedings Article
09 Dec 2003
TL;DR: This paper proposes three theoretical methods for taking into account this distribution P(x) for regularization and provides links to existing graph-based semi-supervised learning algorithms.
Abstract: We address in this paper the question of how the knowledge of the marginal distribution P(x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

Journal ArticleDOI
TL;DR: The use of a multiobjective EA (NSGA-II) has enabled a smaller gene subset size to correctly classify 100% or near 100% samples for three cancer samples and introduced a prediction strength threshold for determining a sample's belonging to one class or the other.
Abstract: In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A methodology for feature selection in unsupervisedlearning makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures the quality of clusters have been used to guide the search toward more discriminant features and the best number of clusters.
Abstract: In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.

Proceedings ArticleDOI
01 Jan 2003
TL;DR: Two useful extensions of the incremental SVM are presented, which enable application of the online paradigm to unsupervised learning and can be used in the large-scale classification problems to limit the memory requirements for storage of the kernel matrix.
Abstract: The paper presents two useful extensions of the incremental SVM in the context of online learning. An online support vector data description algorithm enables application of the online paradigm to unsupervised learning. Furthermore, online learning can be used in the large-scale classification problems to limit the memory requirements for storage of the kernel matrix. The proposed algorithms are evaluated on the task of online monitoring of EEG data, and on the classification task of learning the USPS dataset with a-priori chosen working set size.

Proceedings ArticleDOI
16 Jun 2003
TL;DR: A shape learning algorithm and a general technique for "teaching" the algorithm to identify new or hidden classifications that are relevant in many engineering applications to allow for great flexibility in search and data mining of engineering data.
Abstract: This paper describes a new approach to automate the classification of solid models using machine learning techniques. Existing approaches, based on group technology, fixed matching algorithms or pre-defined feature sets, impose a priori categorization schemes on engineering data or require significant human labeling of design data. This paper describes a shape learning algorithm and a general technique for "teaching" the algorithm to identify new or hidden classifications that are relevant in many engineering applications. In this way, the core shape learning algorithm can be used to find a wide variety of model classifications based on user input and training data. This allows for great flexibility in search and data mining of engineering data.

Proceedings ArticleDOI
21 Jul 2003
TL;DR: Methods that allow unsupervised learning of the model from trajectory data derived from automatic visual surveillance cameras are illustrated and the benefits of such a model in a visual surveillance system are discussed.
Abstract: The paper proposes an activity-based semantic model for a scene under visual surveillance. It illustrates methods that allow unsupervised learning of the model from trajectory data derived from automatic visual surveillance cameras. Results are shown for each method. Finally, the benefits of such a model in a visual surveillance system are discussed.