Showing papers on "Unsupervised learning published in 2004"

PDF

Open Access

Book•

[...]

01 Oct 2004

TL;DR: Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts, and discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining.

...read moreread less

Abstract: The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. In order to present a unified treatment of machine learning problems and solutions, it discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program. The text covers such topics as supervised learning, Bayesian decision theory, parametric methods, multivariate methods, multilayer perceptrons, local models, hidden Markov models, assessing and comparing classification algorithms, and reinforcement learning. New to the second edition are chapters on kernel machines, graphical models, and Bayesian estimation; expanded coverage of statistical tests in a chapter on design and analysis of machine learning experiments; case studies available on the Web (with downloadable results for instructors); and many additional exercises. All chapters have been revised and updated. Introduction to Machine Learning can be used by advanced undergraduates and graduate students who have completed courses in computer programming, probability, calculus, and linear algebra. It will also be of interest to engineers in the field who are concerned with the application of machine learning methods. Adaptive Computation and Machine Learning series

...read moreread less

3,950 citations

Proceedings Article•DOI•

Extreme learning machine: a new learning scheme of feedforward neural networks

[...]

Guang-Bin Huang¹, Qin-Yu Zhu¹, Chee-Kheong Siew¹•Institutions (1)

Nanyang Technological University¹

25 Jul 2004

TL;DR: A new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs is proposed.

...read moreread less

Abstract: It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradient-based learning algorithms are extensively used to train neural networks, and 2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these traditional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real-world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.

...read moreread less

3,643 citations

Book•

Learning Bayesian networks

[...]

Richard E. Neapolitan¹•Institutions (1)

Northeastern Illinois University¹

01 Jan 2004

TL;DR: This chapter discusses Bayesian Networks, a framework for Bayesian Structure Learning, and some of the algorithms used in this framework.

...read moreread less

Abstract: Preface. I. BASICS. 1. Introduction to Bayesian Networks. 2. More DAG/Probability Relationships. II. INFERENCE. 3. Inference: Discrete Variables. 4. More Inference Algorithms. 5. Influence Diagrams. III. LEARNING. 6. Parameter Learning: Binary Variables. 7. More Parameter Learning. 8. Bayesian Structure Learning. 9. Approximate Bayesian Structure Learning. 10. Constraint-Based Learning. 11. More Structure Learning. IV. APPICATIONS. 12. Applications. Bibliography. Index.

...read moreread less

2,575 citations

Proceedings Article•DOI•

Regularized multi--task learning

[...]

Theodoros Evgeniou¹, Massimiliano Pontil²•Institutions (2)

INSEAD¹, University College London²

22 Aug 2004

TL;DR: An approach to multi--task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines, that have been successfully used in the past for single-- task learning is presented.

...read moreread less

Abstract: Past empirical work has shown that learning multiple related tasks from data simultaneously can be advantageous in terms of predictive performance relative to learning these tasks independently. In this paper we present an approach to multi--task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines (SVMs), that have been successfully used in the past for single--task learning. Our approach allows to model the relation between tasks in terms of a novel kernel function that uses a task--coupling parameter. We implement an instance of the proposed approach similar to SVMs and test it empirically using simulated as well as real data. The experimental results show that the proposed method performs better than existing multi--task learning methods and largely outperforms single--task learning using SVMs.

...read moreread less

1,617 citations

Proceedings Article•

Semi-supervised Learning by Entropy Minimization

[...]

Yves Grandvalet¹, Yoshua Bengio²•Institutions (2)

University of Technology of Compiègne¹, Université de Montréal²

01 Dec 2004

TL;DR: This framework, which motivates minimum entropy regularization, enables to incorporate unlabeled data in the standard supervised learning, and includes other approaches to the semi-supervised problem as particular or limiting cases.

...read moreread less

Abstract: We consider the semi-supervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. Our approach includes other approaches to the semi-supervised problem as particular or limiting cases. A series of experiments illustrates that the proposed solution benefits from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the "cluster assumption". Finally, we also illustrate that the method can also be far superior to manifold learning in high dimension spaces.

...read moreread less

1,606 citations

Proceedings Article•DOI•

Support vector machine learning for interdependent and structured output spaces

[...]

Ioannis Tsochantaridis¹, Thomas Hofmann¹, Thorsten Joachims², Yasemin Altun¹•Institutions (2)

Brown University¹, Cornell University²

04 Jul 2004

TL;DR: This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

...read moreread less

Abstract: Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

...read moreread less

1,446 citations

Proceedings Article•DOI•

K-means clustering via principal component analysis

[...]

Chris Ding¹, Xiaofeng He¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

04 Jul 2004

TL;DR: It is proved that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, which indicates that unsupervised dimension reduction is closely related to unsuper supervised learning.

...read moreread less

Abstract: Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.

...read moreread less

1,431 citations

Journal Article•DOI•

An approach to online identification of Takagi-Sugeno fuzzy models

[...]

Plamen Angelov¹, Dimitar Petrov Filev²•Institutions (2)

Lancaster University¹, Ford Motor Company²

01 Feb 2004

TL;DR: An approach to the online learning of Takagi-Sugeno (TS) type models is proposed, based on a novel learning algorithm that recursively updates TS model structure and parameters by combining supervised and unsupervised learning.

...read moreread less

Abstract: An approach to the online learning of Takagi-Sugeno (TS) type models is proposed in the paper. It is based on a novel learning algorithm that recursively updates TS model structure and parameters by combining supervised and unsupervised learning. The rule-base and parameters of the TS model continually evolve by adding new rules with more summarization power and by modifying existing rules and parameters. In this way, the rule-base structure is inherited and up-dated when new data become available. By applying this learning concept to the TS model we arrive at a new type adaptive model called the Evolving Takagi-Sugeno model (ETS). The adaptive nature of these evolving TS models in combination with the highly transparent and compact form of fuzzy rules makes them a promising candidate for online modeling and control of complex processes, competitive to neural networks. The approach has been tested on data from an air-conditioning installation serving a real building. The results illustrate the viability and efficiency of the approach. The proposed concept, however, has significantly wider implications in a number of fields, including adaptive nonlinear control, fault detection and diagnostics, performance analysis, forecasting, knowledge extraction, robotics, behavior modeling.

...read moreread less

956 citations

Journal Article•DOI•

Feature Selection for Unsupervised Learning

[...]

Jennifer G. Dy, Carla E. Brodley

01 Dec 2004-Journal of Machine Learning Research

TL;DR: This paper explores the feature selection problem and issues through FSSEM (Feature Subset Selection using Expectation-Maximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood.

...read moreread less

Abstract: In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using Expectation-Maximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a cross-projection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.

...read moreread less

939 citations

Journal Article•DOI•

SUSTAIN: A Network Model of Category Learning.

[...]

Bradley C. Love¹, Douglas L. Medin², Todd M. Gureckis¹•Institutions (2)

University of Texas at Austin¹, Northwestern University²

01 Apr 2004-Psychological Review

TL;DR: SUSTAIN successfully extends category learning models to studies of inference learning, unsupervised learning, category construction, and contexts in which identification learning is faster than classification learning.

...read moreread less

Abstract: SUSTAIN (Supervised and Unsupervised STratified Adaptive Incremental Network) is a model of how humans learn categories from examples. SUSTAIN initially assumes a simple category structure. If simple solutions prove inadequate and SUSTAIN is confronted with a surprising event (e.g., it is told that a bat is a mammal instead of a bird), SUSTAIN recruits an additional cluster to represent the surprising event. Newly recruited clusters are available to explain future events and can themselves evolve into prototypes-attractors-rules. SUSTAIN's discovery of category substructure is affected not only by the structure of the world but by the nature of the learning task and the learner's goals. SUSTAIN successfully extends category learning models to studies of inference learning, unsupervised learning, category construction, and contexts in which identification learning is faster than classification learning.

...read moreread less

724 citations

Journal Article•DOI•

Simultaneous feature selection and clustering using mixture models

[...]

M.H.C. Law¹, Mário A. T. Figueiredo, Anil K. Jain•Institutions (1)

Michigan State University¹

01 Sep 2004-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes the concept of feature saliency and introduces an expectation-maximization algorithm to estimate it, in the context of mixture-based clustering, and extends the criterion and algorithm to simultaneously estimate the feature saliencies and the number of clusters.

...read moreread less

Abstract: Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

...read moreread less

Proceedings Article•DOI•

Probabilistic author-topic models for information discovery

[...]

Mark Steyvers¹, Padhraic Smyth¹, Michal Rosen-Zvi¹, Thomas L. Griffiths²•Institutions (2)

University of California, Irvine¹, Stanford University²

22 Aug 2004

TL;DR: The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm.

...read moreread less

Abstract: We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An online query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.

...read moreread less

Proceedings Article•DOI•

Is bottom-up attention useful for object recognition?

[...]

Ueli Rutishauser¹, Dirk B. Walther¹, Christof Koch¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

27 Jun 2004

TL;DR: Empirically to what extent pure bottom-up attention can extract useful information about the location, size and shape of objects from images and how this information can be utilized to enable unsupervised learning of Objects from unlabeled images is investigated.

...read moreread less

Abstract: A key problem in learning multiple objects from unlabeled images is that it is a priori impossible to tell which part of the image corresponds to each individual object, and which part is irrelevant clutter which is not associated to the objects. We investigate empirically to what extent pure bottom-up attention can extract useful information about the location, size and shape of objects from images and demonstrate how this information can be utilized to enable unsupervised learning of objects from unlabeled images. Our experiments demonstrate that the proposed approach to using bottom-up attention is indeed useful for a variety of applications.

...read moreread less

Journal Article•DOI•

On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms

[...]

Kenji Yamanishi¹, Jun'ichi Takeuchi¹, Graham J. Williams², Peter A. Milne²•Institutions (2)

NEC¹, Commonwealth Scientific and Industrial Research Organisation²

01 May 2004-Data Mining and Knowledge Discovery

TL;DR: An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs.

...read moreread less

Abstract: Outlier detection is a fundamental issue in data mining, specifically in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter is an outlier detection engine addressing this problem from the viewpoint of statistical learning theory. This paper provides a theoretical basis for SmartSifter and empirically demonstrates its effectiveness. SmartSifter detects outliers in an on-line process through the on-line unsupervised learning of a probabilistic model (using a finite mixture model) of the information source. Each time a datum is input SmartSifter employs an on-line discounting learning algorithm to learn the probabilistic model. A score is given to the datum based on the learned model with a high score indicating a high possibility of being a statistical outlier. The novel features of SmartSifter are: (1) it is adaptive to non-stationary sources of datas (2) a score has a clear statistical/information-theoretic meanings (3) it is computationally inexpensives and (4) it can handle both categorical and continuous variables. An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs. Further experimental application has identified a number of meaningful rare cases in actual health insurance pathology data from Australia's Health Insurance Commission.

...read moreread less

Proceedings Article•DOI•

Active learning using pre-clustering

[...]

H.T. Nguyen¹, Arnold W. M. Smeulders¹•Institutions (1)

University of Amsterdam¹

04 Jul 2004

TL;DR: A formal framework that incorporates clustering into active learning with two-class active learning that allows to select the most representative samples as well as to avoid repeatedly labeling samples in the same cluster.

...read moreread less

Abstract: The paper is concerned with two-class active learning. While the common approach for collecting data in active learning is to select samples close to the classification boundary, better performance can be achieved by taking into account the prior data distribution. The main contribution of the paper is a formal framework that incorporates clustering into active learning. The algorithm first constructs a classifier on the set of the cluster representatives, and then propagates the classification decision to the other samples via a local noise model. The proposed model allows to select the most representative samples as well as to avoid repeatedly labeling samples in the same cluster. During the active learning process, the clustering is adjusted using the coarse-to-fine strategy in order to balance between the advantage of large clusters and the accuracy of the data representation. The results of experiments in image databases show a better performance of our algorithm compared to the current methods.

...read moreread less

Proceedings Article•DOI•

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency

[...]

Dan Klein¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

21 Jul 2004

TL;DR: This work presents a generative model for the unsupervised learning of dependency structures and describes the multiplicative combination of this dependency model with a model of linear constituency that works and is robust cross-linguistically.

...read moreread less

Abstract: We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically, being able to exploit either attachment or distributional regularities that are salient in the data.

...read moreread less

Journal Article•DOI•

Representation of Number in Animals and Humans: A Neural Model

[...]

Tom Verguts¹, Wim Fias¹•Institutions (1)

Ghent University¹

01 Nov 2004-Journal of Cognitive Neuroscience

TL;DR: This article addresses the representation of numerical information conveyed by nonsymbolic and symbolic stimuli and presents a concrete proposal on the linkage between higher order numerical cognition and more primitive numerical abilities and generates specific predictions on the neural substrate of number processing.

...read moreread less

Abstract: This article addresses the representation of numerical information conveyed by nonsymbolic and symbolic stimuli. In a first simulation study, we show how number-selective neurons develop when an initially uncommitted neural network is given nonsymbolic stimuli as input (e.g., collections of dots) under unsupervised learning. The resultant network is able to account for the distance and size effects, two ubiquitous effects in numerical cognition. Furthermore, the properties of the network units conform in detail to the characteristics of recently discovered number-selective neurons. In a second study, we simulate symbol learning by presenting symbolic and nonsymbolic input simultaneously. The same number-selective neurons learn to represent the numerical meaning of symbols. In doing so, they show properties reminiscent of the originally available number-selective neurons, but at the same time, the representational efficiency of the neurons is increased when presented with symbolic input. This finding presents a concrete proposal on the linkage between higher order numerical cognition and more primitive numerical abilities and generates specific predictions on the neural substrate of number processing.

...read moreread less

Proceedings Article•

A mixture model for clustering ensembles

[...]

Alexander Topchy¹, Anil K. Jain, William F. Punch•Institutions (1)

Michigan State University¹

01 Jan 2004

TL;DR: A probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings is offered and a combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm.

...read moreread less

Abstract: Clustering ensembles have emerged as a powerful method for improving both the robustness and the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. We offer a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. The excellent scalability of this algorithm and comprehensible underlying model are particularly important for clustering of large datasets. This study compares the performance of the EM consensus algorithm with other fusion approaches for clustering ensembles. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed method on large real-world datasets. keywords: unsupervised learning, clustering ensemble, consensus function, mixture model, EM algorithm.

...read moreread less

Book•

Introduction to Machine Learning (Adaptive Computation and Machine Learning)

[...]

Ethem Alpaydin

01 Oct 2004

Proceedings Article•DOI•

Bridging the gaps between cameras

[...]

Dimitrios Makris¹, Tim Ellis¹, J. Black¹•Institutions (1)

Kingston University¹

27 Jun 2004

TL;DR: The paper investigates the unsupervised learning of a model of activity for a multi-camera surveillance network that can be created from a large set of observations that enables the learning algorithm to establish links between camera views associated with an activity.

...read moreread less

Abstract: The paper investigates the unsupervised learning of a model of activity for a multi-camera surveillance network that can be created from a large set of observations. This enables the learning algorithm to establish links between camera views associated with an activity. The learning algorithm operates in a correspondence-free manner, exploiting the statistical consistency of the observation data. The derived model is used to automatically determine the topography of a network of cameras and to provide a means for tracking targets across the "blind" areas of the network. A theoretical justification and experimental validation of the methods are provided.

...read moreread less

Journal Article•DOI•

General conditions for predictivity in learning theory

[...]

Tomaso Poggio¹, Ryan Rifkin¹, Ryan Rifkin², Sayan Mukherjee¹, Sayan Mukherjee³, Partha Niyogi⁴ - Show less +2 more•Institutions (4)

McGovern Institute for Brain Research¹, Honda², Massachusetts Institute of Technology³, University of Chicago⁴

25 Mar 2004-Nature

TL;DR: Conditions for generalization in terms of a precise stability property of the learning process are provided: when the training set is perturbed by deleting one example, the learned hypothesis does not change much.

...read moreread less

Abstract: Developing theoretical foundations for learning is a key step towards understanding intelligence. 'Learning from examples' is a paradigm in which systems (natural or artificial) learn a functional relationship from a training set of examples. Within this paradigm, a learning algorithm is a map from the space of training sets to the hypothesis space of possible functional solutions. A central question for the theory is to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples. A milestone in learning theory was a characterization of conditions on the hypothesis space that ensure generalization for the natural class of empirical risk minimization (ERM) learning algorithms that are based on minimizing the error on the training set. Here we provide conditions for generalization in terms of a precise stability property of the learning process: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. This stability property stipulates conditions on the learning map rather than on the hypothesis space, subsumes the classical theory for ERM algorithms, and is applicable to more general algorithms. The surprising connection between stability and predictivity has implications for the foundations of learning theory and for the design of novel algorithms, and provides insights into problems as diverse as language learning and inverse problems in physics and engineering.

...read moreread less

Proceedings Article•DOI•

Unsupervised learning techniques for an intrusion detection system

[...]

Stefano Zanero¹, Sergio M. Savaresi¹•Institutions (1)

Polytechnic University of Milan¹

14 Mar 2004

TL;DR: A two-tier architecture is introduced: the first tier is an unsupervised clustering algorithm which reduces the network packets payload to a tractable size and the second tier is a traditional anomaly detection algorithm, whose efficiency is improved by the availability of data on the packet payload content.

...read moreread less

Abstract: With the continuous evolution of the types of attacks against computer networks, traditional intrusion detection systems, based on pattern matching and static signatures, are increasingly limited by their need of an up-to-date and comprehensive knowledge base. Data mining techniques have been successfully applied in host-based intrusion detection. Applying data mining techniques on raw network data, however, is made difficult by the sheer size of the input; this is usually avoided by discarding the network packet contents.In this paper, we introduce a two-tier architecture to overcome this problem: the first tier is an unsupervised clustering algorithm which reduces the network packets payload to a tractable size. The second tier is a traditional anomaly detection algorithm, whose efficiency is improved by the availability of data on the packet payload content.

...read moreread less

Journal Article•DOI•

Dynamic analysis of learning in behavioral experiments.

[...]

Anne C. Smith¹, Loren M. Frank¹, Loren M. Frank², Sylvia Wirth³, Marianna Yanike³, Dan Hu⁴, Yasuo Kubota⁴, Ann M. Graybiel⁴, Wendy A. Suzuki, Emery N. Brown¹ - Show less +6 more•Institutions (4)

Harvard University¹, University of California, San Francisco², New York University³, Massachusetts Institute of Technology⁴

14 Jan 2004-The Journal of Neuroscience

TL;DR: The state-space paradigm estimates learning curves for single animals, gives a precise definition of learning, and suggests a coherent statistical framework for the design and analysis of learning experiments that could reduce the number of animals and trials per animal that these studies require.

...read moreread less

Abstract: Understanding how an animal's ability to learn relates to neural activity or is altered by lesions, different attentional states, pharmacological interventions, or genetic manipulations are central questions in neuroscience. Although learning is a dynamic process, current analyses do not use dynamic estimation methods, require many trials across many animals to establish the occurrence of learning, and provide no consensus as how best to identify when learning has occurred. We develop a state-space model paradigm to characterize learning as the probability of a correct response as a function of trial number (learning curve). We compute the learning curve and its confidence intervals using a state-space smoothing algorithm and define the learning trial as the first trial on which there is reasonable certainty (>0.95) that a subject performs better than chance for the balance of the experiment. For a range of simulated learning experiments, the smoothing algorithm estimated learning curves with smaller mean integrated squared error and identified the learning trials with greater reliability than commonly used methods. The smoothing algorithm tracked easily the rapid learning of a monkey during a single session of an association learning experiment and identified learning 2 to 4 d earlier than accepted criteria for a rat in a 47 d procedural learning experiment. Our state-space paradigm estimates learning curves for single animals, gives a precise definition of learning, and suggests a coherent statistical framework for the design and analysis of learning experiments that could reduce the number of animals and trials per animal that these studies require.

...read moreread less

Journal Article•DOI•

Recursive unsupervised learning of finite mixture models

[...]

Z. Zivkovic¹, F. van der Heijden²•Institutions (2)

University of Amsterdam¹, University of Twente²

01 May 2004-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An online (recursive) algorithm is proposed that estimates the parameters of the mixture and that simultaneously selects the number of components to search for the maximum a posteriori (MAP) solution and to discard the irrelevant components.

...read moreread less

Abstract: There are two open problems when finite mixture densities are used to model multivariate data: the selection of the number of components and the initialization. In this paper, we propose an online (recursive) algorithm that estimates the parameters of the mixture and that simultaneously selects the number of components. The new algorithm starts with a large number of randomly initialized components. A prior is used as a bias for maximally structured models. A stochastic approximation recursive learning algorithm is proposed to search for the maximum a posteriori (MAP) solution and to discard the irrelevant components.

...read moreread less

Journal Article•DOI•

Predicting students' performance in distance learning using machine learning techniques

[...]

Sotiris Kotsiantis, Christos Pierrakeas, Panayiotis E. Pintelas

01 May 2004-Applied Artificial Intelligence

TL;DR: It was found that the Naïve Bayes algorithm is the most appropriate to be used for the construction of a software support tool, has more than satisfactory accuracy, its overall sensitivity is extremely satisfactory, and is the easiest algorithm to implement.

...read moreread less

Abstract: The ability to predict a student's performance could be useful in a great number of different ways associated with university-level distance learning. Students' key demographic characteristics and their marks on a few written assignments can constitute the training set for a supervised machine learning algorithm. The learning algorithm could then be able to predict the performance of new students, thus becoming a useful tool for identifying predicted poor performers. The scope of this work is to compare some of the state of the art learning algorithms. Two experiments have been conducted with six algorithms, which were trained using data sets provided by the Hellenic Open University. Among other significant conclusions, it was found that the Naive Bayes algorithm is the most appropriate to be used for the construction of a software support tool, has more than satisfactory accuracy, its overall sensitivity is extremely satisfactory, and is the easiest algorithm to implement.

...read moreread less

Proceedings Article•DOI•

Unsupervised learning of image manifolds by semidefinite programming

[...]

Kilian Q. Weinberger¹, Lawrence K. Saul¹•Institutions (1)

University of Pennsylvania¹

27 Jun 2004

TL;DR: The proposed algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold, and overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding.

...read moreread less

Abstract: Can we detect low dimensional structure in high dimensional data sets of images and video? The problem of dimensionality reduction arises often in computer vision and pattern recognition. In this paper, we propose a new solution to this problem based on semidefinite programming. Our algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold. It overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.

...read moreread less

Proceedings Article•DOI•

Dynamic abstraction in reinforcement learning via clustering

[...]

Shie Mannor¹, Ishai Menache², Amit Hoze², Uri Klein²•Institutions (2)

Massachusetts Institute of Technology¹, Technion – Israel Institute of Technology²

04 Jul 2004

TL;DR: This work considers a graph theoretic approach for automatic construction of options in a dynamic environment and considers building a map that includes preliminary indication of the location of "interesting" regions of the state space, where the value gradient is significant and additional exploration might be beneficial.

...read moreread less

Abstract: We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to different regions. Policies for reaching the different parts of the space are separately learned and added to the model in a form of options (macro-actions). The options are used for accelerating the Q-Learning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of "interesting" regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.

...read moreread less

Journal Article•DOI•

Unsupervised learning of prototypes and attribute weights

[...]

H. Frigui¹, Olfa Nasraoui¹•Institutions (1)

University of Memphis¹

01 Mar 2004-Pattern Recognition

TL;DR: New algorithms that perform clustering and feature weighting simultaneously and in an unsupervised manner are introduced and can be used in the subsequent steps of a learning system to improve its learning behavior.

...read moreread less

Proceedings Article•DOI•

Democratic co-learning

[...]

Yan Zhou¹, Sally A. Goldman²•Institutions (2)

University of South Alabama¹, University of Washington²

15 Nov 2004

TL;DR: This work presents democratic colearning in which multiple algorithms instead of multiple views enable learners to label data for each other, a new example selection method for active learning.

...read moreread less

Abstract: For many machine learning applications it is important to develop algorithms that use both labeled and unlabeled data. We present democratic colearning in which multiple algorithms instead of multiple views enable learners to label data for each other. Our technique leverages off the fact that different learning algorithms have different inductive biases and that better predictions can be made by the voted majority. We also present democratic priority sampling, a new example selection method for active learning.

...read moreread less

Journal Article•DOI•

Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application

[...]

Nizar Bouguila, Djemel Ziou, Jean Vaillancourt

01 Nov 2004-IEEE Transactions on Image Processing

TL;DR: An unsupervised algorithm for learning a finite mixture model from multivariate data based on the Dirichlet distribution, which offers high flexibility for modeling data.

...read moreread less

Abstract: This paper presents an unsupervised algorithm for learning a finite mixture model from multivariate data. This mixture model is based on the Dirichlet distribution, which offers high flexibility for modeling data. The proposed approach for estimating the parameters of a Dirichlet mixture is based on the maximum likelihood (ML) and Fisher scoring methods. Experimental results are presented for the following applications: estimation of artificial histograms, summarization of image databases for efficient retrieval, and human skin color modeling and its application to skin detection in multimedia databases.

...read moreread less

Collapse