scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2001"



Book ChapterDOI
05 Sep 2001
TL;DR: This paper presented an unsupervised learning algorithm for recognizing synonyms based on statistical data acquired by querying a web search engine, called Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words.
Abstract: This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).

1,232 citations


Journal ArticleDOI
TL;DR: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 Words to 500,000 words.
Abstract: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be developed by a human morphologist.In the final section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.

789 citations


Journal ArticleDOI
TL;DR: Unsupervised learning of higher-order statistics provides support for Barlow's theory of visual recognition, which posits that detecting “suspicious coincidences” of elements during recognition is a necessary prerequisite for efficient learning of new visual features.
Abstract: Three experiments investigated the ability of human observers to extract the joint and conditional probabilities of shape co-occurrences during passive viewing of complex visual scenes. Results indicated that statistical learning of shape conjunctions was both rapid and automatic, as subjects were not instructed to attend to any particularfeatures of the displays. Moreover, in addition to single-shape frequency, subjects acquired in parallel several different higher-order aspects of the statistical structure of the displays, including absolute shape-position relations in an array, shape-pair arrangements independent of position, and conditional probabilities of shape co-occurrences. Unsupervised learning of these higher-order statistics provides support for Barlow's theory of visual recognition, which posits that detecting "suspicious coincidences" of elements during recognition is a necessary prerequisite for efficient learning of new visual features.

694 citations


Proceedings ArticleDOI
29 Nov 2001
TL;DR: An approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999), and an insight into the interpretability of support vectors is given.
Abstract: Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given.

587 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this article, a statistical model for organizing image collections which integrates semantic information provided by associate text and image features is presented. But the model is not suitable for unsupervised learning for object recognition.
Abstract: We present a statistical model for organizing image collections which integrates semantic information provided by associate text and visual information provided by image features. The model is very promising for information retrieval tasks such as database browsing and searching for images based on text and/or image features. Furthermore, since the model learns relationships between text and image features, it can be used for novel applications such as associating words with pictures, and unsupervised learning for object recognition.

543 citations


Journal ArticleDOI
01 Dec 2001
TL;DR: This paper introduces evolving fuzzy neural networks (EFuNNs) as a means for the implementation of the evolving connectionist systems (ECOS) paradigm that is aimed at building online, adaptive intelligent systems that have both their structure and functionality evolving in time.
Abstract: This paper introduces evolving fuzzy neural networks (EFuNNs) as a means for the implementation of the evolving connectionist systems (ECOS) paradigm that is aimed at building online, adaptive intelligent systems that have both their structure and functionality evolving in time. EFuNNs evolve their structure and parameter values through incremental, hybrid supervised/unsupervised, online learning. They can accommodate new input data, including new features, new classes, etc., through local element tuning. New connections and new neurons are created during the operation of the system. EFuNNs can learn spatial-temporal sequences in an adaptive way through one pass learning and automatically adapt their parameter values as they operate. Fuzzy or crisp rules can be inserted and extracted at any time of the EFuNN operation. The characteristics of EFuNNs are illustrated on several case study data sets for time series prediction and spoken word classification. Their performance is compared with traditional connectionist methods and systems. The applicability of EFuNNs as general purpose online learning machines, what concerns systems that learn from large databases, life-long learning systems, and online adaptive systems in different areas of engineering are discussed.

493 citations


Journal ArticleDOI
TL;DR: It is argued that this new resource-based mechanism is a large step forward in making AISs a viable contender for effective unsupervised machine learning and allows for not just a one shot learning mechanism, but a continual learning model to be developed.
Abstract: This paper presents a resource limited artificial immune system (RLAIS) for data analysis. The work presented here builds upon previous work on artificial immune systems (AIS) for data analysis. A population control mechanism, inspired by the natural immune system, has been introduced to control population growth and allow termination of the learning algorithm. The new algorithm is presented, along with the immunological metaphors used as inspiration. Results are presented for Fisher Iris data set, where very successful results are obtained in identifying clusters within the data set. It is argued that this new resource-based mechanism is a large step forward in making AISs a viable contender for effective unsupervised machine learning and allows for not just a one shot learning mechanism, but a continual learning model to be developed.

349 citations


Journal ArticleDOI
TL;DR: The present work describes how SOM can be used for the study of ecological communities, and how it can perfectly complete classical techniques for exploring data and for achieving community ordination.

334 citations


Proceedings Article
04 Aug 2001
TL;DR: This work proposes a general class of models for classification and clustering in relational domains that capture probabilistic dependencies between related instances, and shows how to learn such models efficiently from data.
Abstract: Supervised and unsupervised learning methods have traditionally focused on data consisting of independent instances of a single type. However, many real-world domains are best described by relational models in which instances of multiple types are related to each other in complex ways. For example, in a scientific paper domain, papers are related to each other via citation, and are also related to their authors. In this case, the label of one entity (e.g., the topic of the paper) is often correlated with the labels of related entities. We propose a general class of models for classification and clustering in relational domains that capture probabilistic dependencies between related instances. We show how to learn such models efficiently from data. We present empirical results on two real world data sets. Our experiments in a transductive classification setting indicate that accuracy can be significantly improved by modeling relational dependencies. Our algorithm automatically induces a very natural behavior, where our knowledge about one instance helps us classify related ones, which in turn help us classify others. In an unsupervised setting, our models produced coherent clusters with a very natural interpretation, even for instance types that do not have any attributes.

292 citations


Proceedings ArticleDOI
12 Jun 2001
TL;DR: An overview of the research in real time data mining-based intrusion detection systems (IDS) and an architecture consisting of sensors, detectors, a data warehouse, and model generation components is presented that improves the efficiency and scalability of the IDS.
Abstract: We present an overview of our research in real time data mining-based intrusion detection systems (IDSs). We focus on issues related to deploying a data mining-based IDS in a real time environment. We describe our approaches to address three types of issues: accuracy, efficiency, and usability. To improve accuracy, data mining programs are used to analyze audit data and extract features that can distinguish normal activities from intrusions; we use artificial anomalies along with normal and/or intrusion data to produce more effective misuse and anomaly detection models. To improve efficiency, the computational costs of features are analyzed and a multiple-model cost-based approach is used to produce detection models with low cost and high accuracy. We also present a distributed architecture for evaluating cost-sensitive models in real-time. To improve usability, adaptive learning algorithms are used to facilitate model construction and incremental updates; unsupervised anomaly detection algorithms are used to reduce the reliance on labeled data. We also present an architecture consisting of sensors, detectors, a data warehouse, and model generation components. This architecture facilitates the sharing and storage of audit data and the distribution of new or updated models. This architecture also improves the efficiency and scalability of the IDS.

Journal ArticleDOI
TL;DR: This paper proposes a hierarchical reinforcement learning architecture that realizes practical learning speed in real hardware control tasks and applies it to a three-link, two-joint robot for the task of learning to stand up by trial and error.

Proceedings Article
04 Aug 2001
TL;DR: Experimental results show that active learning can substantially reduce the number of observations required to determine the structure of a domain.
Abstract: The task of causal structure discovery from empirical data is a fundamental problem in many areas. Experimental data is crucial for accomplishing this task. However, experiments are typically expensive, and must be selected with great care. This paper uses active learning to determine the experiments that are most informative towards uncovering the underlying structure. We formalize the causal learning task as that of learning the structure of a causal Bayesian network. We consider an active learner that is allowed to conduct experiments, where it intervenes in the domain by setting the values of certain variables. We provide a theoretical framework for the active learning problem, and an algorithm that actively chooses the experiments to perform based on the model learned so far. Experimental results show that active learning can substantially reduce the number of observations required to determine the structure of a domain.

Proceedings Article
03 Jan 2001
TL;DR: The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.
Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the "global coordination" of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model's parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.

Proceedings Article
28 Jun 2001
TL;DR: This paper proposes the use of probabilistic models not only for the attributes in a relational model, but for the relational structure itself, and proposes two mechanisms for modeling structural uncertainty: reference uncertainty and existence uncertainty.
Abstract: Most real-world data is stored in relational form. In contrast, most statistical learning methods work with “flat” data representations, forcing us to convert our data into a form that loses much of the relational structure. The recently introduced framework of probabilistic relational models (PRMs) allows us to represent probabilistic models over multiple entities that utilize the relations between them. In this paper, we propose the use of probabilistic models not only for the attributes in a relational model, but for the relational structure itself. We propose two mechanisms for modeling structural uncertainty: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict relational structure and, moreover, the observed relational structure can be used to provide better predictions for the attributes in the model.

Patent
Biebesheimer D1, Donn P. Jasura1, Keller N1, Daniel Oblinger1, Mark Podlaseck1, Stephen J. Rolando1 
07 Feb 2001
TL;DR: In this paper, a customer self-service system and method for performing resource search and selection is presented, which includes steps of providing an interface enabling entry of a query for a resource and specification of one or more user context elements, each element representing a context associated with the current user state and having context attributes and attribute values associated therewith.
Abstract: A customer self service system and method for performing resource search and selection. The method includes steps of providing an interface enabling entry of a query for a resource and specification of one or more user context elements, each element representing a context associated with the current user state and having context attributes and attribute values associated therewith; enabling user specification of relevant resource selection criteria for enabling expression of relevance of resource results in terms of user context; searching a resource database and generating a resource response set having resources that best match a user's query, user context attributes and user defined relevant resource selection criteria; presenting said resource response set to the user in a manner whereby a relevance of each of the resources being expressed in terms of user context in a manner optimized to facilitate resource selection; and, enabling continued user selection and modification of context attribute values to enable increased specificity and accuracy of a user's query to thereby result in improved selection logic and attainment of resource response sets best fitted to the query. More particularly, adaptive algorithms and supervised and unsupervised learning sub-processes are implemented to enable the self service resource search and selection system to learn from each and all users and make that learning operationally benefit all users over time.

Proceedings Article
01 Jan 2001
TL;DR: This study examines the learning behavior of co-training on natural language processing tasks that typically require large numbers of training instances to achieve usable performance levels and proposes a moderately supervised variant of cotraining in which a human corrects the mistakes made during automatic labeling.
Abstract: Co-Training is a weakly supervised learning paradigm in which the redundancy of the learning task is captured by training two classifiers using separate views of the same data. This enables bootstrapping from a small set of labeled training data via a large set of unlabeled data. This study examines the learning behavior of co-training on natural language processing tasks that typically require large numbers of training instances to achieve usable performance levels. Using base noun phrase bracketing as a case study, we find that co-training reduces by 36% the difference in error between co-trained classifiers and fully supervised classifiers trained on a labeled version of all available data. However, degradation in the quality of the bootstrapped data arises as an obstacle to further improvement. To address this, we propose a moderately supervised variant of cotraining in which a human corrects the mistakes made during automatic labeling. Our analysis suggests that corrected co-training and similar moderately supervised methods may help cotraining scale to large natural language learning tasks.

Patent
10 Aug 2001
TL;DR: In this paper, a method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps (SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents.
Abstract: A method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps(SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents in accordance with a degree of semantic similarity between entropy data extracted using entropy value and user profiles and query words given by a user, wherein the Bayesian SOM is a combination of Bayesian statistical technique and Kohonen network that is a type of an unsupervised learning.

Journal ArticleDOI
TL;DR: This work derives expectation-maximization algorithms for self-organizing maps with and without missing values from the link between vector quantization and mixture modeling and compares them with the elastic-net approach.
Abstract: Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive expectation-maximization (EM) algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net approach and explain why the former is better suited for the visualization of high-dimensional data. Several extensions and improvements are discussed. As an illustration we apply a self-organizing map based on a multinomial distribution to market basket analysis.

Journal ArticleDOI
TL;DR: By virtue of the RBF neural network, this proposed approach takes advantage of high learning convergence rate of weights in the hidden layer and output layer, natural unsupervised learning characteristics, modular structure, and universal approximation capability.
Abstract: This paper proposes a novel neural-network approach to blind source separation in nonlinear mixture. The approach utilizes a radial basis function (RBF) neural-network to approximate the inverse of the nonlinear mixing mapping which is assumed to exist and able to be approximated using an RBF network. A contrast function which consists of the mutual information and partial moments of the outputs of the separation system, is defined to separate the nonlinear mixture. The minimization of the contrast function results in the independence of the outputs with desirable moments such that the original sources are separated properly. Two learning algorithms for the parametric RBF network are developed by using the stochastic gradient descent method and an unsupervised clustering method. By virtue of the RBF neural network, this proposed approach takes advantage of high learning convergence rate of weights in the hidden layer and output layer, natural unsupervised learning characteristics, modular structure, and universal approximation capability. Simulation results are presented to demonstrate the feasibility, robustness, and computability of the proposed method.

Journal ArticleDOI
TL;DR: Numerical results show that the TNN is very effective in finding the optimal solutions of thresholding methods in an MSE sense and usually outperforms other noise reduction methods.
Abstract: In the paper, a type of thresholding neural network (TNN) is developed for adaptive noise reduction. New types of soft and hard thresholding functions are created to serve as the activation function of the TNN. Unlike the standard thresholding functions, the new thresholding functions are infinitely differentiable. By using the new thresholding functions, some gradient-based learning algorithms become possible or more effective. The optimal solution of the TNN in a mean square error (MSE) sense is discussed. It is proved that there is at most one optimal solution for the soft-thresholding TNN. General optimal performances of both soft and hard thresholding TNNs are analyzed and compared to the linear noise reduction method. Gradient-based adaptive learning algorithms are presented to seek the optimal solution for noise reduction. The algorithms include supervised and unsupervised batch learning as well as supervised and unsupervised stochastic learning. It is indicated that the TNN with the stochastic learning algorithms can be used as a novel nonlinear adaptive filter. It is proved that the stochastic learning algorithm is convergent in certain statistical sense in ideal conditions. Numerical results show that the TNN is very effective in finding the optimal solutions of thresholding methods in an MSE sense and usually outperforms other noise reduction methods. Especially, it is shown that the TNN-based nonlinear adaptive filtering outperforms the conventional linear adaptive filtering in both optimal solution and learning performance.

Journal ArticleDOI
TL;DR: The edited volume provides a sample of important works on unsupervised learning, which cut across the fields of neural networks, and some of the most influential titles of late.
Abstract: Unsupervised Learning: Foundations of Neural Computation is a collection of 21 papers published in the journal Neural Computation in the 10-year period since its founding in 1989 by Terrence Sejnowski Neural Computation has become the leading journal of its kind The editors of the book are Geoffrey Hinton and Terrence Sejnowski, two pioneers in neural networks The selected papers include some of the most influential titles of late, for example, "What Is the Goal of Sensory Coding" by David Field and "An Information-Maximization Approach to Blind Separation and Blind Deconvolution" by Anthony Bell and Terrence Sejnowski The edited volume provides a sample of important works on unsupervised learning, which cut across the fields of

Journal ArticleDOI
TL;DR: This paper describes a connectionist unsupervised approach to binary classification and compares its performance to that of its supervised counterpart, MLP, and concludes that autoassociator is more efficient than MLP in multi-modal domains, and it is more accurate thanMLP inMulti- modal domains for which the negative class creates a particularly strong need for specialization or the positive class creating a particularly weak needs for specialization.
Abstract: Binary classification is typically achieved by supervised learning methods. Nevertheless, it is also possible using unsupervised schemes. This paper describes a connectionist unsupervised approach to binary classification and compares its performance to that of its supervised counterpart. The approach consists of training an autoassociator to reconstruct the positive class of a domain at the output layer. After training, the autoassociator is used for classification, relying on the idea that if the network generalizes to a novel instance, then this instance must be positive, but that if generalization fails, then the instance must be negative. When tested on three real-world domains, the autoassociator proved more accurate at classification than its supervised counterpart, MLP, on two of these domains and as accurate on the third (Japkowicz, Myers, & Gluck, 1995). The paper seeks to generalize these results and concludes that, in addition to learning a concept in the absence of negative examples, 1) autoassociation is more efficient than MLP in multi-modal domains, and 2) it is more accurate than MLP in multi-modal domains for which the negative class creates a particularly strong need for specialization or the positive class creates a particularly weak need for specialization. In multi-modal domains for which the positive class creates a particularly strong need for specialization, on the other hand, MLP is more accurate than autoassociation.

Journal ArticleDOI
TL;DR: A neural-fuzzy technology-based classifier for the recognition of power quality disturbances that adopts neural networks in the architecture of frequency-sensitive competitive leaning and learning vector quantization is presented.
Abstract: This paper presents a neural-fuzzy technology-based classifier for the recognition of power quality disturbances. The classifier adopts neural networks in the architecture of frequency sensitive competitive learning and learning vector quantization (LVQ). With given size of codewords, the neural networks are trained to determine the optimal decision boundaries separating different categories of disturbances. To cope with the uncertainties in the involved pattern recognition, the neural network outputs, instead of being taken as the final classification, are used to activate the fuzzy-associative-memory (FAM) recalling for identifying the most possible type that the input waveform may belong to. Furthermore, the input waveforms are preprocessed by the wavelet transform for feature extraction so as to improve the classifier with respect to recognition accuracy and scheme simplicity. Each subband of the transform coefficients is then utilized to recognize the associated disturbances.

Proceedings Article
28 Jun 2001
TL;DR: In this paper, the authors presented extensions of k-nearest neighbors (k-NN), Citation-kNN, and the diverse density algorithm for the real-valued setting and study their performance on Boolean and realvalued data.
Abstract: The multiple-instance learning model has received much attention recently with a primary application area being that of drug activity prediction. Most prior work on multiple-instance learning has been for concept learning, yet for drug activity prediction, the label is a real-valued affinity measurement giving the binding strength. We present extensions of k-nearest neighbors (k-NN), Citation-kNN, and the diverse density algorithm for the real-valued setting and study their performance on Boolean and real-valued data. We also provide a method for generating chemically realistic artificial data.

Book
01 Jan 2001
TL;DR: Face Image Analysis by Unsupervised Learning explores adaptive approaches to image analysis that have roots in biological vision and/or learn about the image structure directly from the image ensemble.
Abstract: From the Publisher: "Face Image Analysis by Unsupervised Learning explores adaptive approaches to image analysis It draws upon principles of unsupervised learning and information theory to adapt processing to the immediate task environment In contrast to more traditional approaches to image analysis in which relevant structure is determined in advance and extracted using hand-engineered techniques, Face Image Analysis by Unsupervised Learning explores methods that have roots in biological vision and/or learn about the image structure directly from the image ensemble Particular attention is paid to unsupervised learning techniques for encoding the statistical dependencies in the image ensemble" "Face Image Analysis by Unsupervised Learning is suitable as a secondary text for a graduate level course and as a reference for researchers and practitioners in industry"--BOOK JACKET

Journal ArticleDOI
01 Jun 2001
TL;DR: It is argued that the reward-penalty and reward-inaction learning paradigms in conjunction with the continuous and discrete models of computation, lead to four versions of pursuit learning automata, and proves the E-optimality of the newly introduced algorithms, and presents a quantitative comparison between them.
Abstract: A learning automaton (LA) is an automaton that interacts with a random environment, having as its goal the task of learning the optimal action based on its acquired experience. Many learning automata (LAs) have been proposed, with the class of estimator algorithms being among the fastest ones, Thathachar and Sastry, through the pursuit algorithm, introduced the concept of learning algorithms that pursue the current optimal action, following a reward-penalty learning philosophy. Later, Oommen and Lanctot extended the pursuit algorithm into the discretized world by presenting the discretized pursuit algorithm, based on a reward-inaction learning philosophy. In this paper we argue that the reward-penalty and reward-inaction learning paradigms in conjunction with the continuous and discrete models of computation, lead to four versions of pursuit learning automata. We contend that a scheme that merges the pursuit concept with the most recent response of the environment, permits the algorithm to utilize the LAs long-term and short-term perspectives of the environment. In this paper, we present all four resultant pursuit algorithms, prove the E-optimality of the newly introduced algorithms, and present a quantitative comparison between them.

Journal ArticleDOI
TL;DR: It is argued that cerebellar motor learning is enhanced by a sparse code that simultaneously maximizes information transfer between mossy fibers and granule cells, minimizes redundancies between granule cell discharges, and re-codes the mossy fiber inputs with an adaptive resolution such that inputs corresponding to large errors are finely encoded.

Proceedings ArticleDOI
Kenji Yamanishi1, Jun'ichi Takeuchi1
26 Aug 2001
TL;DR: Applying of this framework to the network intrusion detection, it is demonstrated that it can significantly improve the accuracy of SmartSifter, and outlier filtering rules can help the user to discover a general pattern of an outlier group.
Abstract: This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

Journal ArticleDOI
TL;DR: An algorithm for finding principal manifolds that can be regularized in a variety of ways is proposed and bounds on the covering numbers are given which allows us to obtain nearly optimal learning rates for certain types of regularization operators.
Abstract: Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding.We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.