scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2005"


Book
23 Nov 2005
TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

11,357 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.
Abstract: We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.

3,920 citations


Book
01 Dec 2005
TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification.
Abstract: Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

2,732 citations


Proceedings Article
05 Dec 2005
TL;DR: This paper proposes a "filter" method for feature selection which is independent of any learning algorithm, based on the observation that, in many real world classification problems, data from the same class are often close to each other.
Abstract: In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm.

1,817 citations


Journal ArticleDOI
TL;DR: This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated.
Abstract: One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods are proposed, at the current stage, we still don't have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semi-supervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semi-supervised learning setting.

1,484 citations


Journal ArticleDOI
TL;DR: Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.
Abstract: In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.

1,067 citations


Journal ArticleDOI
TL;DR: A hybrid learning algorithm is proposed which uses the differential evolutionary algorithm to select the input weights and Moore-Penrose (MP) generalized inverse to analytically determine the output weights.

734 citations


01 Jan 2005
TL;DR: A series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances are presented.
Abstract: In traditional machine learning approaches to classification, one uses only a labeled set to train the classifier. Labeled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice. We present a series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances. They address the following questions: How to use unlabeled data? (label propagation); What is the probabilistic interpretation? (Gaussian fields and harmonic functions); What if we can choose labeled data? (active learning); How to construct good graphs? (hyperparameter learning); How to work with kernel machines like SVM? (graph kernels); How to handle complex data like sequences? (kernel conditional random fields); How to handle scalability and induction? (harmonic mixtures). An extensive literature review is included at the end.

707 citations


Journal ArticleDOI
TL;DR: The key idea is to view clustering as a supervised classification problem, in which the “true” class labels are estimated, and the resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well.
Abstract: This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the “true” class labels. The resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabeled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established.

594 citations


Journal ArticleDOI
TL;DR: The results show that temporal codes may be a key to understanding the phenomenal processing speed achieved by the visual system and that STDP can lead to fast and selective responses.
Abstract: Spike timing dependent plasticity (STDP) is a learning rule that modifies synaptic strength as a function of the relative timing of pre- and postsynaptic spikes. When a neuron is repeatedly presented with similar inputs, STDP is known to have the effect of concentrating high synaptic weights on afferents that systematically fire early, while postsynaptic spike latencies decrease. Here we use this learning rule in an asynchronous feedforward spiking neural network that mimics the ventral visual pathway and shows that when the network is presented with natural images, selectivity to intermediate-complexity visual features emerges. Those features, which correspond to prototypical patterns that are both salient and consistently present in the images, are highly informative and enable robust object recognition, as demonstrated on various classification tasks. Taken together, these results show that temporal codes may be a key to understanding the phenomenal processing speed achieved by the visual system and that STDP can lead to fast and selective responses.

550 citations


Proceedings ArticleDOI
15 Nov 2005
TL;DR: This work proposes a novel method for traffic classification and application identification using an unsupervised machine learning technique that uses feature selection to find an optimal feature set and determine the influence of different features in traffic flows.
Abstract: The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (e.g. port numbers) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires a high amount of computing resources or is simply infeasible in case protocols are unknown or encrypted. We propose a novel method for traffic classification and application identification using an unsupervised machine learning technique. Flows are automatically classified based on statistical flow characteristics. We evaluate the efficiency of our approach using data from several traffic traces collected at different locations of the Internet. We use feature selection to find an optimal feature set and determine the influence of different features

Proceedings ArticleDOI
07 Aug 2005
TL;DR: This paper constructs a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS) that allow the structure of the RKHS to reflect the underlying geometry of the data.
Abstract: Due to its occurrence in engineering domains and implications for natural learning, the problem of utilizing unlabeled data is attracting increasing attention in machine learning. A large body of recent literature has focussed on the transductive setting where labels of unlabeled examples are estimated by learning a function defined only over the point cloud data. In a truly semi-supervised setting however, a learning machine has access to labeled and unlabeled examples and must make predictions on data points never encountered before. In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. We construct a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS). These norms allow us to warp the structure of the RKHS to reflect the underlying geometry of the data. We derive explicit formulas for the corresponding new kernels. Our approach demonstrates state of the art performance on a variety of classification tasks.

Journal ArticleDOI
TL;DR: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining techniques, applied on the different stages of the process.
Abstract: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining (DM) techniques, applied on the different stages of the process. The core of this framework is a data mining model based on a combination of unsupervised and supervised learning techniques. Two main modules compose this framework: the load profiling module and the classification module. The load profiling module creates a set of consumer classes using a clustering operation and the representative load profiles for each class. The classification module uses this knowledge to build a classification model able to assign different consumers to the existing classes. The quality of this framework is illustrated with a case study concerning a real database of LV consumers from the Portuguese distribution company.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: An approach in which supervised learning is first used to estimate depths from single monocular images, which is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene is presented.
Abstract: We consider the task of driving a remote control car at high speeds through unstructured outdoor environments. We present an approach in which supervised learning is first used to estimate depths from single monocular images. The learning algorithm can be trained either on real camera images labeled with ground-truth distances to the closest obstacles, or on a training set consisting of synthetic graphics images. The resulting algorithm is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene. Reinforcement learning/policy search is then applied within a simulator that renders synthetic scenes. This learns a control policy that selects a steering direction as a function of the vision system's output. We present results evaluating the predictive ability of the algorithm both on held out test data, and in actual autonomous driving experiments.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: A SARSA based extension of GPTD is presented, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.
Abstract: Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

ReportDOI
09 Jul 2005
TL;DR: A new active learning paradigm is proposed which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate, which can vary widely in structured prediction tasks.
Abstract: A common obstacle preventing the rapid deployment of supervised machine learning algorithms is the lack of labeled training data. This is particularly expensive to obtain for structured prediction tasks, where each training instance may have multiple, interacting labels, all of which must be correctly annotated for the instance to be of use to the learner. Traditional active learning addresses this problem by optimizing the order in which the examples are labeled to increase learning efficiency. However, this approach does not consider the difficulty of labeling each example, which can vary widely in structured prediction tasks. For example, the labeling predicted by a partially trained system may be easier to correct for some instances than for others. We propose a new active learning paradigm which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate. The system also leverages information from partially correct predictions to efficiently solicit annotations from the user. We validate this active learning framework in an interactive information extraction system, reducing the total number of annotation actions by 22%.

Book ChapterDOI
01 Aug 2005
TL;DR: A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, and demonstrates the different ingredients of the DMP approach in various examples.
Abstract: This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

Journal ArticleDOI
TL;DR: Results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.
Abstract: In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE), for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.

Journal ArticleDOI
TL;DR: This study investigates temporal slowness as a learning principle for receptive fields using slow feature analysis, a new algorithm to determine functions that extract slowly varying signals from the input data.
Abstract: In this study we investigate temporal slowness as a learning principle for receptive fields using slow feature analysis, a new algorithm to determine functions that extract slowly varying signals from the input data. We find a good qualitative and quantitative match between the set of learned functions trained on image sequences and the population of complex cells in the primary visual cortex (V1). The functions show many properties found also experimentally in complex cells, such as direction selectivity, non-orthogonal inhibition, end-inhibition, and side-inhibition. Our results demonstrate that a single unsupervised learning principle can account for such a rich repertoire of receptive field properties.

Proceedings Article
01 Jan 2005
TL;DR: A review of BSS and ICA, including various algorithms for static and dynamic models and their applications, including several algorithms for dynamic models (convolutive mixtures) incorporating with sparseness or non-negativity constraints is presented.
Abstract: Blind source separation (BSS) and independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. A recent trend in BSS is to consider problems in the framework of matrix factorization or more general signals decomposition with probabilistic generative and tree structured graphical models and exploit a priori knowledge about true nature and structure of latent (hidden) variables or sources such as spatio-temporal decorrelation, statistical independence, sparseness, smoothness or lowest complexity in the sense e.g., of best predictability. The possible goal of such decomposition can be considered as the estimation of sources not necessary statistically independent and parameters of a mixing system or more generally as finding a new reduced or hierarchical and structured representation for the observed (sensor) data that can be interpreted as physically meaningful coding or blind source estimation. The key issue is to find a such transformation or coding (linear or nonlinear) which has true physical meaning and interpretation. We present a review of BSS and ICA, including various algorithms for static and dynamic models and their applications. The paper mainly consists of three parts: (1) BSS algorithms for static models (instantaneous mixtures); (2) extension of BSS and ICA incorporating with sparseness or non-negativity constraints; (3) BSS algorithms for dynamic models (convolutive mixtures).

Journal ArticleDOI
01 Jun 2005
TL;DR: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data and proposes a scene model that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones.
Abstract: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A semi-supervised adapted hidden Markov model (HMM) framework is proposed, in which usual event models are first learned from a large amount of (commonly available) training data, while unusualevent models are learned by Bayesian adaptation in an unsupervised manner.
Abstract: We address the problem of temporal unusual event detection. Unusual events are characterized by a number of features (rarity, unexpectedness, and relevance) that limit the application of traditional supervised model-based approaches. We propose a semi-supervised adapted hidden Markov model (HMM) framework, in which usual event models are first learned from a large amount of (commonly available) training data, while unusual event models are learned by Bayesian adaptation in an unsupervised manner. The proposed framework has an iterative structure, which adapts a new unusual event model at each iteration. We show that such a framework can address problems due to the scarcity of training data and the difficulty in pre-defining unusual events. Experiments on audio, visual, and audiovisual data streams illustrate its effectiveness, compared with both supervised and unsupervised baseline methods.

Journal ArticleDOI
TL;DR: A new cluster labeling method for SVC is developed based on some invariant topological properties of a trained kernel radius function that outperforms previously reported labeling techniques.
Abstract: The support vector clustering (SVC) algorithm is a recently emerged unsupervised learning method inspired by support vector machines. One key step involved in the SVC algorithm is the cluster assignment of each data point. A new cluster labeling method for SVC is developed based on some invariant topological properties of a trained kernel radius function. Benchmark results show that the proposed method outperforms previously reported labeling techniques.

Proceedings Article
30 Jul 2005
TL;DR: Experiments show that COREG can effectively exploit unlabeled data to improve regression estimates and is proposed as a co-training style semi-supervised regression algorithm.
Abstract: In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. Previous research mainly focuses on semi-supervised classification. In this paper, a co-training style semi-supervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two k-nearest neighbor regressors with different distance metrics, each of which labels the unlabeled data for the other regressor where the labeling confidence is estimated through consulting the influence of the labeling of unlabeled examples on the labeled ones. Experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.

Journal ArticleDOI
TL;DR: This work shows that this so-called credit assignment problem can be solved by a new role for attention in learning and shows that the new scheme, called attention-gated reinforcement learning (AGREL), is as efficient as supervised learning in classification tasks.
Abstract: Animal learning is associated with changes in the efficacy of connections between neurons. The rules that govern this plasticity can be tested in neural networks. Rules that train neural networks to map stimuli onto outputs are given by supervised learning and reinforcement learning theories. Supervised learning is efficient but biologically implausible. In contrast, reinforcement learning is biologically plausible but comparatively inefficient. It lacks a mechanism that can identify units at early processing levels that play a decisive role in the stimulus-response mapping. Here we show that this so-called credit assignment problem can be solved by a new role for attention in learning. There are two factors in our new learning scheme that determine synaptic plasticity: (1) a reinforcement signal that is homogeneous across the network and depends on the amount of reward obtained after a trial, and (2) an attentional feedback signal from the output layer that limits plasticity to those units at earlier processing levels that are crucial for the stimulus-response mapping. The new scheme is called attention-gated reinforcement learning (AGREL). We show that it is as efficient as supervised learning in classification tasks. AGREL is biologically realistic and integrates the role of feedback connections, attention effects, synaptic plasticity, and reinforcement learning signals into a coherent framework.

Journal ArticleDOI
TL;DR: The proposed method for the selection of salient regions likely to contain objects, based on bottom-up visual attention, can enable one-shot learning of multiple objects from complex scenes, and can strongly improve learning and recognition performance in the presence of large amounts of clutter.

Proceedings ArticleDOI
27 Dec 2005
TL;DR: It is demonstrated that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used.
Abstract: The spherical k-means algorithm, i.e., the k-means algorithm with cosine similarity, is a popular method for clustering high-dimensional text data. In this algorithm, each document as well as each cluster mean is represented as a high-dimensional unit-length vector. However, it has been mainly used in hatch mode. Thus is, each cluster mean vector is updated, only after all document vectors being assigned, as the (normalized) average of all the document vectors assigned to that cluster. This paper investigates an online version of the spherical k-means algorithm based on the well-known winner-take-all competitive learning. In this online algorithm, each cluster centroid is incrementally updated given a document. We demonstrate that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used. We also present heuristics to improve the speed, yet almost without loss of clustering quality.

Proceedings ArticleDOI
Rie Ando1, Tong Zhang1
25 Jun 2005
TL;DR: A novel semi-supervised method that employs a learning paradigm which is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data, which produces performance higher than the previous best results.
Abstract: In machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. Although a number of semi-supervised methods have been proposed, their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. The idea is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data. By doing so, the common predictive structure shared by the multiple classification problems can be discovered, which can then be used to improve performance on the target problem. The method produces performance higher than the previous best results on CoNLL'00 syntactic chunking and CoNLL'03 named entity chunking (English and German).

Book ChapterDOI
06 Sep 2005
TL;DR: This contribution develops an experimental framework for comparative analysis of unsupervised techniques into a special case of classification, for which training and model selection can be performed by means of ROC analysis.
Abstract: Application and development of specialized machine learning techniques is gaining increasing attention in the intrusion detection community. A variety of learning techniques proposed for different intrusion detection problems can be roughly classified into two broad categories: supervised (classification) and unsupervised (anomaly detection and clustering). In this contribution we develop an experimental framework for comparative analysis of both kinds of learning techniques. In our framework we cast unsupervised techniques into a special case of classification, for which training and model selection can be performed by means of ROC analysis. We then investigate both kinds of learning techniques with respect to their detection accuracy and ability to detect unknown attacks.

Proceedings Article
05 Dec 2005
TL;DR: It is shown that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.