Showing papers on "Unsupervised learning published in 2005"

PDF

Open Access

Book•

[...]

Carl Edward Rasmussen¹, Christopher Williams•Institutions (1)

23 Nov 2005

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.

...read moreread less

Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

...read moreread less

11,357 citations

Proceedings Article•DOI•

A Bayesian hierarchical model for learning natural scene categories

[...]

Li Fei-Fei¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

20 Jun 2005

TL;DR: This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.

...read moreread less

Abstract: We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.

...read moreread less

3,920 citations

Book•

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

[...]

Carl Edward Rasmussen, Christopher Williams

01 Dec 2005

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification.

...read moreread less

Abstract: Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

...read moreread less

2,732 citations

Proceedings Article•

Laplacian Score for Feature Selection

[...]

Xiaofei He¹, Deng Cai², Partha Niyogi¹•Institutions (2)

University of Chicago¹, University of Illinois at Urbana–Champaign²

05 Dec 2005

TL;DR: This paper proposes a "filter" method for feature selection which is independent of any learning algorithm, based on the observation that, in many real world classification problems, data from the same class are often close to each other.

...read moreread less

Abstract: In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm.

...read moreread less

1,817 citations

Journal Article•DOI•

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

[...]

Rie Kubota Ando, Tong Zhang

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated.

...read moreread less

Abstract: One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods are proposed, at the current stage, we still don't have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semi-supervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semi-supervised learning setting.

...read moreread less

1,484 citations

Journal Article•DOI•

Tri-training: exploiting unlabeled data using three classifiers

[...]

Zhi-Hua Zhou¹, Ming Li¹•Institutions (1)

Nanjing University¹

01 Nov 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.

...read moreread less

Abstract: In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.

...read moreread less

1,067 citations

Journal Article•DOI•

Rapid and brief communication: Evolutionary extreme learning machine

[...]

Qin-Yu Zhu¹, A. K. Qin¹, Ponnuthurai Nagaratnam Suganthan¹, Guang-Bin Huang¹•Institutions (1)

Nanyang Technological University¹

01 Oct 2005-Pattern Recognition

TL;DR: A hybrid learning algorithm is proposed which uses the differential evolutionary algorithm to select the input weights and Moore-Penrose (MP) generalized inverse to analytically determine the output weights.

...read moreread less

734 citations

Semi-supervised learning with graphs

[...]

Xiaojin Zhu¹, John Lafferty¹, Ronald Rosenfeld¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2005

TL;DR: A series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances are presented.

...read moreread less

Abstract: In traditional machine learning approaches to classification, one uses only a labeled set to train the classifier. Labeled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice. We present a series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances. They address the following questions: How to use unlabeled data? (label propagation); What is the probabilistic interpretation? (Gaussian fields and harmonic functions); What if we can choose labeled data? (active learning); How to construct good graphs? (hyperparameter learning); How to work with kernel machines like SVM? (graph kernels); How to handle complex data like sequences? (kernel conditional random fields); How to handle scalability and induction? (harmonic mixtures). An extensive literature review is included at the end.

...read moreread less

707 citations

Journal Article•DOI•

Cluster Validation by Prediction Strength

[...]

Robert Tibshirani¹, Guenther Walther¹•Institutions (1)

Stanford University¹

01 Sep 2005-Journal of Computational and Graphical Statistics

TL;DR: The key idea is to view clustering as a supervised classification problem, in which the “true” class labels are estimated, and the resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well.

...read moreread less

Abstract: This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the “true” class labels. The resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabeled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established.

...read moreread less

594 citations

Journal Article•DOI•

Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity

[...]

Timothée Masquelier¹, Simon J. Thorpe¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2005-PLOS Computational Biology

TL;DR: The results show that temporal codes may be a key to understanding the phenomenal processing speed achieved by the visual system and that STDP can lead to fast and selective responses.

...read moreread less

Abstract: Spike timing dependent plasticity (STDP) is a learning rule that modifies synaptic strength as a function of the relative timing of pre- and postsynaptic spikes. When a neuron is repeatedly presented with similar inputs, STDP is known to have the effect of concentrating high synaptic weights on afferents that systematically fire early, while postsynaptic spike latencies decrease. Here we use this learning rule in an asynchronous feedforward spiking neural network that mimics the ventral visual pathway and shows that when the network is presented with natural images, selectivity to intermediate-complexity visual features emerges. Those features, which correspond to prototypical patterns that are both salient and consistently present in the images, are highly informative and enable robust object recognition, as demonstrated on various classification tasks. Taken together, these results show that temporal codes may be a key to understanding the phenomenal processing speed achieved by the visual system and that STDP can lead to fast and selective responses.

...read moreread less

550 citations

Proceedings Article•DOI•

Automated traffic classification and application identification using machine learning

[...]

Sebastian Zander¹, Thuy T. T. Nguyen¹, Grenville Armitage¹•Institutions (1)

Swinburne University of Technology¹

15 Nov 2005

TL;DR: This work proposes a novel method for traffic classification and application identification using an unsupervised machine learning technique that uses feature selection to find an optimal feature set and determine the influence of different features in traffic flows.

...read moreread less

Abstract: The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (e.g. port numbers) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires a high amount of computing resources or is simply infeasible in case protocols are unknown or encrypted. We propose a novel method for traffic classification and application identification using an unsupervised machine learning technique. Flows are automatically classified based on statistical flow characteristics. We evaluate the efficiency of our approach using data from several traffic traces collected at different locations of the Internet. We use feature selection to find an optimal feature set and determine the influence of different features

...read moreread less

Proceedings Article•DOI•

Beyond the point cloud: from transductive to semi-supervised learning

[...]

Vikas Sindhwani¹, Partha Niyogi¹, Mikhail Belkin¹•Institutions (1)

University of Chicago¹

07 Aug 2005

TL;DR: This paper constructs a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS) that allow the structure of the RKHS to reflect the underlying geometry of the data.

...read moreread less

Abstract: Due to its occurrence in engineering domains and implications for natural learning, the problem of utilizing unlabeled data is attracting increasing attention in machine learning. A large body of recent literature has focussed on the transductive setting where labels of unlabeled examples are estimated by learning a function defined only over the point cloud data. In a truly semi-supervised setting however, a learning machine has access to labeled and unlabeled examples and must make predictions on data points never encountered before. In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. We construct a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS). These norms allow us to warp the structure of the RKHS to reflect the underlying geometry of the data. We derive explicit formulas for the corresponding new kernels. Our approach demonstrates state of the art performance on a variety of classification tasks.

...read moreread less

Journal Article•DOI•

An electric energy consumer characterization framework based on data mining techniques

[...]

Vera Figueiredo, Fátima Rodrigues¹, Zita Vale¹, Joaquim Borges Gouveia²•Institutions (2)

Polytechnic Institute of Porto¹, University of Aveiro²

02 May 2005-IEEE Transactions on Power Systems

TL;DR: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining techniques, applied on the different stages of the process.

...read moreread less

Abstract: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining (DM) techniques, applied on the different stages of the process. The core of this framework is a data mining model based on a combination of unsupervised and supervised learning techniques. Two main modules compose this framework: the load profiling module and the classification module. The load profiling module creates a set of consumer classes using a clustering operation and the representative load profiles for each class. The classification module uses this knowledge to build a classification model able to assign different consumers to the existing classes. The quality of this framework is illustrated with a case study concerning a real database of LV consumers from the Portuguese distribution company.

...read moreread less

Proceedings Article•DOI•

High speed obstacle avoidance using monocular vision and reinforcement learning

[...]

Jeffrey Lawrence Michels¹, Ashutosh Saxena¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

07 Aug 2005

TL;DR: An approach in which supervised learning is first used to estimate depths from single monocular images, which is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene is presented.

...read moreread less

Abstract: We consider the task of driving a remote control car at high speeds through unstructured outdoor environments. We present an approach in which supervised learning is first used to estimate depths from single monocular images. The learning algorithm can be trained either on real camera images labeled with ground-truth distances to the closest obstacles, or on a training set consisting of synthetic graphics images. The resulting algorithm is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene. Reinforcement learning/policy search is then applied within a simulator that renders synthetic scenes. This learns a control policy that selects a steering direction as a function of the vision system's output. We present results evaluating the predictive ability of the algorithm both on held out test data, and in actual autonomous driving experiments.

...read moreread less

Proceedings Article•DOI•

Reinforcement learning with Gaussian processes

[...]

Yaakov Engel¹, Shie Mannor², Ron Meir³•Institutions (3)

University of Alberta¹, McGill University², Technion – Israel Institute of Technology³

07 Aug 2005

TL;DR: A SARSA based extension of GPTD is presented, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

...read moreread less

Abstract: Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

...read moreread less

Report•DOI•

Reducing labeling effort for structured prediction tasks

[...]

Aron Culotta¹, Andrew McCallum¹•Institutions (1)

University of Massachusetts Amherst¹

09 Jul 2005

TL;DR: A new active learning paradigm is proposed which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate, which can vary widely in structured prediction tasks.

...read moreread less

Abstract: A common obstacle preventing the rapid deployment of supervised machine learning algorithms is the lack of labeled training data. This is particularly expensive to obtain for structured prediction tasks, where each training instance may have multiple, interacting labels, all of which must be correctly annotated for the instance to be of use to the learner. Traditional active learning addresses this problem by optimizing the order in which the examples are labeled to increase learning efficiency. However, this approach does not consider the difficulty of labeling each example, which can vary widely in structured prediction tasks. For example, the labeling predicted by a partially trained system may be easier to correct for some instances than for others. We propose a new active learning paradigm which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate. The system also leverages information from partially correct predictions to efficiently solicit annotations from the user. We validate this active learning framework in an interactive information extraction system, reducing the total number of annotation actions by 22%.

...read moreread less

Book Chapter•DOI•

Learning Movement Primitives

[...]

Stefan Schaal¹, Jan Peters¹, Jun Nakanishi, Auke Jan Ijspeert¹•Institutions (1)

University of Southern California¹

01 Aug 2005

TL;DR: A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, and demonstrates the different ingredients of the DMP approach in various examples.

...read moreread less

Abstract: This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

...read moreread less

Journal Article•DOI•

CLUE: cluster-based retrieval of images by unsupervised learning

[...]

Yixin Chen¹, James Z. Wang², Robert Krovetz•Institutions (2)

University of New Orleans¹, Penn State College of Information Sciences and Technology²

01 Aug 2005-IEEE Transactions on Image Processing

TL;DR: Results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.

...read moreread less

Abstract: In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE), for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.

...read moreread less

Journal Article•DOI•

Slow feature analysis yields a rich repertoire of complex cell properties.

[...]

Pietro Berkes¹, Laurenz Wiskott¹•Institutions (1)

Humboldt University of Berlin¹

01 Jun 2005-Journal of Vision

TL;DR: This study investigates temporal slowness as a learning principle for receptive fields using slow feature analysis, a new algorithm to determine functions that extract slowly varying signals from the input data.

...read moreread less

Abstract: In this study we investigate temporal slowness as a learning principle for receptive fields using slow feature analysis, a new algorithm to determine functions that extract slowly varying signals from the input data. We find a good qualitative and quantitative match between the set of learned functions trained on image sequences and the population of complex cells in the primary visual cortex (V1). The functions show many properties found also experimentally in complex cells, such as direction selectivity, non-orthogonal inhibition, end-inhibition, and side-inhibition. Our results demonstrate that a single unsupervised learning principle can account for such a rich repertoire of receptive field properties.

...read moreread less

Proceedings Article•

Blind Source Separation and Independent Component Analysis: A Review

[...]

Soo-Young Lee¹•Institutions (1)

Pohang University of Science and Technology¹

01 Jan 2005

TL;DR: A review of BSS and ICA, including various algorithms for static and dynamic models and their applications, including several algorithms for dynamic models (convolutive mixtures) incorporating with sparseness or non-negativity constraints is presented.

...read moreread less

Abstract: Blind source separation (BSS) and independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. A recent trend in BSS is to consider problems in the framework of matrix factorization or more general signals decomposition with probabilistic generative and tree structured graphical models and exploit a priori knowledge about true nature and structure of latent (hidden) variables or sources such as spatio-temporal decorrelation, statistical independence, sparseness, smoothness or lowest complexity in the sense e.g., of best predictability. The possible goal of such decomposition can be considered as the estimation of sources not necessary statistically independent and parameters of a mixing system or more generally as finding a new reduced or hierarchical and structured representation for the observed (sensor) data that can be interpreted as physically meaningful coding or blind source estimation. The key issue is to find a such transformation or coding (linear or nonlinear) which has true physical meaning and interpretation. We present a review of BSS and ICA, including various algorithms for static and dynamic models and their applications. The paper mainly consists of three parts: (1) BSS algorithms for static models (instantaneous mixtures); (2) extension of BSS and ICA incorporating with sparseness or non-negativity constraints; (3) BSS algorithms for dynamic models (convolutive mixtures).

...read moreread less

Journal Article•DOI•

Learning semantic scene models from observing activity in visual surveillance

[...]

Dimitrios Makris¹, Tim Ellis¹•Institutions (1)

Kingston University¹

01 Jun 2005

TL;DR: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data and proposes a scene model that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones.

...read moreread less

Abstract: This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.

...read moreread less

Proceedings Article•DOI•

Semi-supervised adapted HMMs for unusual event detection

[...]

Dong Zhang¹, Daniel Gatica-Perez¹, Samy Bengio¹, Iain McCowan¹•Institutions (1)

Idiap Research Institute¹

20 Jun 2005

TL;DR: A semi-supervised adapted hidden Markov model (HMM) framework is proposed, in which usual event models are first learned from a large amount of (commonly available) training data, while unusualevent models are learned by Bayesian adaptation in an unsupervised manner.

...read moreread less

Abstract: We address the problem of temporal unusual event detection. Unusual events are characterized by a number of features (rarity, unexpectedness, and relevance) that limit the application of traditional supervised model-based approaches. We propose a semi-supervised adapted hidden Markov model (HMM) framework, in which usual event models are first learned from a large amount of (commonly available) training data, while unusual event models are learned by Bayesian adaptation in an unsupervised manner. The proposed framework has an iterative structure, which adapts a new unusual event model at each iteration. We show that such a framework can address problems due to the scarcity of training data and the difficulty in pre-defining unusual events. Experiments on audio, visual, and audiovisual data streams illustrate its effectiveness, compared with both supervised and unsupervised baseline methods.

...read moreread less

Journal Article•DOI•

An improved cluster labeling method for support vector clustering

[...]

Jaewook Lee, Daewon Lee

01 Mar 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new cluster labeling method for SVC is developed based on some invariant topological properties of a trained kernel radius function that outperforms previously reported labeling techniques.

...read moreread less

Abstract: The support vector clustering (SVC) algorithm is a recently emerged unsupervised learning method inspired by support vector machines. One key step involved in the SVC algorithm is the cluster assignment of each data point. A new cluster labeling method for SVC is developed based on some invariant topological properties of a trained kernel radius function. Benchmark results show that the proposed method outperforms previously reported labeling techniques.

...read moreread less

Proceedings Article•

Semi-supervised regression with co-training

[...]

Zhi-Hua Zhou¹, Ming Li¹•Institutions (1)

Nanjing University¹

30 Jul 2005

TL;DR: Experiments show that COREG can effectively exploit unlabeled data to improve regression estimates and is proposed as a co-training style semi-supervised regression algorithm.

...read moreread less

Abstract: In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. Previous research mainly focuses on semi-supervised classification. In this paper, a co-training style semi-supervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two k-nearest neighbor regressors with different distance metrics, each of which labels the unlabeled data for the other regressor where the labeling confidence is estimated through consulting the influence of the labeling of unlabeled examples on the labeled ones. Experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.

...read moreread less

Journal Article•DOI•

Attention-Gated Reinforcement Learning of Internal Representations for Classification

[...]

Pieter R. Roelfsema¹, Arjen van Ooyen¹•Institutions (1)

VU University Amsterdam¹

01 Oct 2005-Neural Computation

TL;DR: This work shows that this so-called credit assignment problem can be solved by a new role for attention in learning and shows that the new scheme, called attention-gated reinforcement learning (AGREL), is as efficient as supervised learning in classification tasks.

...read moreread less

Abstract: Animal learning is associated with changes in the efficacy of connections between neurons. The rules that govern this plasticity can be tested in neural networks. Rules that train neural networks to map stimuli onto outputs are given by supervised learning and reinforcement learning theories. Supervised learning is efficient but biologically implausible. In contrast, reinforcement learning is biologically plausible but comparatively inefficient. It lacks a mechanism that can identify units at early processing levels that play a decisive role in the stimulus-response mapping. Here we show that this so-called credit assignment problem can be solved by a new role for attention in learning. There are two factors in our new learning scheme that determine synaptic plasticity: (1) a reinforcement signal that is homogeneous across the network and depends on the amount of reward obtained after a trial, and (2) an attentional feedback signal from the output layer that limits plasticity to those units at earlier processing levels that are crucial for the stimulus-response mapping. The new scheme is called attention-gated reinforcement learning (AGREL). We show that it is as efficient as supervised learning in classification tasks. AGREL is biologically realistic and integrates the role of feedback connections, attention effects, synaptic plasticity, and reinforcement learning signals into a coherent framework.

...read moreread less

Journal Article•DOI•

Selective visual attention enables learning and recognition of multiple objects in cluttered scenes

[...]

Dirk B. Walther¹, Ueli Rutishauser¹, Christof Koch¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

01 Oct 2005-Computer Vision and Image Understanding

TL;DR: The proposed method for the selection of salient regions likely to contain objects, based on bottom-up visual attention, can enable one-shot learning of multiple objects from complex scenes, and can strongly improve learning and recognition performance in the presence of large amounts of clutter.

...read moreread less

Proceedings Article•DOI•

Efficient online spherical k-means clustering

[...]

Shi Zhong¹•Institutions (1)

Florida Atlantic University¹

27 Dec 2005

TL;DR: It is demonstrated that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used.

...read moreread less

Abstract: The spherical k-means algorithm, i.e., the k-means algorithm with cosine similarity, is a popular method for clustering high-dimensional text data. In this algorithm, each document as well as each cluster mean is represented as a high-dimensional unit-length vector. However, it has been mainly used in hatch mode. Thus is, each cluster mean vector is updated, only after all document vectors being assigned, as the (normalized) average of all the document vectors assigned to that cluster. This paper investigates an online version of the spherical k-means algorithm based on the well-known winner-take-all competitive learning. In this online algorithm, each cluster centroid is incrementally updated given a document. We demonstrate that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used. We also present heuristics to improve the speed, yet almost without loss of clustering quality.

...read moreread less

Proceedings Article•DOI•

A High-Performance Semi-Supervised Learning Method for Text Chunking

[...]

Rie Ando¹, Tong Zhang¹•Institutions (1)

IBM¹

25 Jun 2005

TL;DR: A novel semi-supervised method that employs a learning paradigm which is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data, which produces performance higher than the previous best results.

...read moreread less

Abstract: In machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. Although a number of semi-supervised methods have been proposed, their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. The idea is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data. By doing so, the common predictive structure shared by the multiple classification problems can be discovered, which can then be used to improve performance on the target problem. The method produces performance higher than the previous best results on CoNLL'00 syntactic chunking and CoNLL'03 named entity chunking (English and German).

...read moreread less

Book Chapter•DOI•

Learning intrusion detection: supervised or unsupervised?

[...]

Pavel Laskov, Patrick Düssel, Christin Schäfer, Konrad Rieck

06 Sep 2005

TL;DR: This contribution develops an experimental framework for comparative analysis of unsupervised techniques into a special case of classification, for which training and model selection can be performed by means of ROC analysis.

...read moreread less

Abstract: Application and development of specialized machine learning techniques is gaining increasing attention in the intrusion detection community. A variety of learning techniques proposed for different intrusion detection problems can be roughly classified into two broad categories: supervised (classification) and unsupervised (anomaly detection and clustering). In this contribution we develop an experimental framework for comparative analysis of both kinds of learning techniques. In our framework we cast unsupervised techniques into a special case of classification, for which training and model selection can be performed by means of ROC analysis. We then investigate both kinds of learning techniques with respect to their detection accuracy and ability to detect unknown attacks.

...read moreread less

Proceedings Article•

Interpolating between types and tokens by estimating power-law generators

[...]

Sharon Goldwater, Mark Johnson¹, Thomas L. Griffiths¹•Institutions (1)

Brown University¹

05 Dec 2005

TL;DR: It is shown that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

...read moreread less

Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

...read moreread less

Collapse