scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2007"


Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


Journal Article
TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.
Abstract: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single chapter cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.

2,535 citations


Proceedings ArticleDOI
Rajat Raina1, Alexis Battle1, Honglak Lee1, Benjamin Packer1, Andrew Y. Ng1 
20 Jun 2007
TL;DR: An approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data to form a succinct input representation and significantly improve classification performance.
Abstract: We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.

1,731 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: An unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions that alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.
Abstract: We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a feature-pooling layer that computes the max of each filter output within adjacent windows, and a point-wise sigmoid non-linearity. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64% error on MNIST, and 54% average recognition rate on Caltech 101 with 30 training samples per category. While the resulting architecture is similar to convolutional networks, the layer-wise unsupervised training procedure alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.

1,232 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: A series of experiments indicate that these models with deep architectures show promise in solving harder learning problems that exhibit many factors of variation.
Abstract: Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many factors of variation. These models are compared with well-established algorithms such as Support Vector Machines and single hidden-layer feed-forward neural networks.

1,122 citations


Proceedings Article
03 Dec 2007
TL;DR: An unsupervised learning model is presented that faithfully mimics certain properties of visual area V2 and the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses, suggesting that this sparse variant of deep belief networks holds promise for modeling more higher-order features.
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or "deep," structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear ("contour") features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.

1,048 citations


Journal ArticleDOI
TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
Abstract: A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

962 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory, and shows that existing powerful algorithms such as ReliefF and Laplacian Score are special cases of the proposed framework.
Abstract: Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.

857 citations


Proceedings Article
03 Dec 2007
TL;DR: This work proposes a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation, and describes a novel and efficient algorithm to learn sparse representations.
Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured.

852 citations


Proceedings Article
06 Jan 2007
TL;DR: This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions.
Abstract: Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert) In this paper we show how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions We present efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions Experimental results show strong improvement for our methods over previous heuristic-based approaches

663 citations


Journal ArticleDOI
TL;DR: The framework of multiobjective optimization is used to tackle the unsupervised learning problem, data clustering, following a formulation first proposed in the statistics literature and an evolutionary approach to the problem is developed.
Abstract: The framework of multiobjective optimization is used to tackle the unsupervised learning problem, data clustering, following a formulation first proposed in the statistics literature. The conceptual advantages of the multiobjective formulation are discussed and an evolutionary approach to the problem is developed. The resulting algorithm, multiobjective clustering with automatic k-determination, is compared with a number of well-established single-objective clustering algorithms, a modern ensemble technique, and two methods of model selection. The experiments demonstrate that the conceptual advantages of multiobjective clustering translate into practical and scalable performance benefits

Book
01 Aug 2007
TL;DR: This book is intended to be a guide to the art of self-consistency and should not be relied on as a substitute for professional advice on how to deal with ambiguity.
Abstract: All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

Journal ArticleDOI
TL;DR: This tutorial discusses the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data in the field of supervised learning in R, the open source data analysis and visualization language.
Abstract: The term machine learning refers to a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. Two facets of mechanization should be acknowledged when considering machine learning in broad terms. Firstly, it is intended that the classification and prediction tasks can be accomplished by a suitably programmed computing machine. That is, the product of machine learning is a classifier that can be feasibly used on available hardware. Secondly, it is intended that the creation of the classifier should itself be highly mechanized, and should not involve too much human input. This second facet is inevitably vague, but the basic objective is that the use of automatic algorithm construction methods can minimize the possibility that human biases could affect the selection and performance of the algorithm. Both the creation of the algorithm and its operation to classify objects or predict events are to be based on concrete, observable data. The history of relations between biology and the field of machine learning is long and complex. An early technique [1] for machine learning called the perceptron constituted an attempt to model actual neuronal behavior, and the field of artificial neural network (ANN) design emerged from this attempt. Early work on the analysis of translation initiation sequences [2] employed the perceptron to define criteria for start sites in Escherichia coli. Further artificial neural network architectures such as the adaptive resonance theory (ART) [3] and neocognitron [4] were inspired from the organization of the visual nervous system. In the intervening years, the flexibility of machine learning techniques has grown along with mathematical frameworks for measuring their reliability, and it is natural to hope that machine learning methods will improve the efficiency of discovery and understanding in the mounting volume and complexity of biological data. This tutorial is structured in four main components. Firstly, a brief section reviews definitions and mathematical prerequisites. Secondly, the field of supervised learning is described. Thirdly, methods of unsupervised learning are reviewed. Finally, a section reviews methods and examples as implemented in the open source data analysis and visualization language R (http://www.r-project.org).

10 Jun 2007
TL;DR: This paper describes various supervised machine learning classification techniques, and suggests possible bias combinations that have yet to be explored.
Abstract: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single chapter cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.

Journal ArticleDOI
TL;DR: A new SVM approach is proposed, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised S VM approach.

Proceedings ArticleDOI
06 Nov 2007
TL;DR: It is demonstrated that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes and an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset.
Abstract: This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: The alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images.
Abstract: Many recognition algorithms depend on careful positioning of an object into a canonical pose, so the position of features relative to a fixed coordinate system can be examined. Currently, this positioning is done either manually or by training a class-specialized learning algorithm with samples of the class that have been hand-labeled with parts or poses. In this paper, we describe a novel method to achieve this positioning using poorly aligned examples of a class with no additional labeling. Given a set of unaligned examplars of a class, such as faces, we automatically build an alignment mechanism, without any additional labeling of parts or poses in the data set. Using this alignment mechanism, new members of the class, such as faces resulting from a face detector, can be precisely aligned for the recognition process. Our alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images. We also demonstrate its use on an entirely different class of objects (cars), again without providing any information about parts or pose to the learning algorithm.

Journal ArticleDOI
TL;DR: Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes and is shown to perform very well compared to a widely known benchmark algorithm on Finnish data.
Abstract: We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs, is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. Morfessor is shown to perform very well compared to a widely known benchmark algorithm, in particular on Finnish data.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel unsupervised learning framework for activity perception to understand activities in complicated scenes from visual data using a hierarchical Bayesian model to connect three elements: low-level visual features, simple "atomic" activities, and multi-agent interactions.
Abstract: We propose a novel unsupervised learning framework for activity perception. To understand activities in complicated scenes from visual data, we propose a hierarchical Bayesian model to connect three elements: low-level visual features, simple "atomic" activities, and multi-agent interactions. Atomic activities are modeled as distributions over low-level visual features, and interactions are modeled as distributions over atomic activities. Our models improve existing language models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) by modeling interactions without supervision. Our data sets are challenging video sequences from crowded traffic scenes with many kinds of activities co-occurring. Our approach provides a summary of typical atomic activities and interactions in the scene. Unusual activities and interactions are found, with natural probabilistic explanations. Our method supports flexible high-level queries on activities and interactions using atomic activities as components.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: Non-metric similarities between pairs of images by matching SIFT features are derived and affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.
Abstract: Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, the most natural measures of similarity (e.g., number of matching SIFT features) between pairs of images or image parts is non-metric. Unsupervised categorization by identifying a subset of representative exemplars can be efficiently performed with the recently-proposed 'affinity propagation' algorithm. In contrast to k-centers clustering, which iteratively refines an initial randomly-chosen set of exemplars, affinity propagation simultaneously considers all data points as potential exemplars and iteratively exchanges messages between data points until a good solution emerges. When applied to the Olivetti face data set using a translation-invariant non-metric similarity, affinity propagation achieves a much lower reconstruction error and nearly halves the classification error rate, compared to state-of-the-art techniques. For the more challenging problem of unsupervised categorization of images from the CaltechlOl data set, we derived non-metric similarities between pairs of images by matching SIFT features. Affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.

Proceedings Article
01 Jun 2007
TL;DR: This model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE.
Abstract: Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.

Book ChapterDOI
TL;DR: This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.
Abstract: The uniformity of the cortical architecture and the ability of functions to move to different areas of cortex following early damage strongly suggest that there is a single basic learning algorithm for extracting underlying structure from richly structured, high-dimensional sensory data. There have been many attempts to design such an algorithm, but until recently they all suffered from serious computational weaknesses. This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.

Book
12 Oct 2007
TL;DR: This chapter discusses the development of Character Recognition, Evolution and Development, and some of the techniques used to achieve this goal, including Bayes Decision Theory, as well as some new methods based onributed graph matching.
Abstract: Figures. List of Tables. Preface. Acknowledgments. Acronyms. 1. Introduction: Character Recognition, Evolution and Development. 1.1 Generation and Recognition of Characters. 1.2 History of OCR. 1.3 Development of New Techniques. 1.4 Recent Trends and Movements. 1.5 Organization of the Remaining Chapters. References. 2. Tools for Image Pre-Processing. 2.1 Generic Form Processing System. 2.2 A Stroke Model for Complex Background Elimination. 2.2.1 Global Gray Level Thresholding. 2.2.2 Local Gray Level Thresholding. 2.2.3 Local Feature Thresholding-Stroke Based Model. 2.2.4 Choosing the Most Efficient Character Extraction Method. 2.2.5 Cleaning up Form Items Using Stroke Based Model. 2.3 A Scale-Space Approach for Visual Data Extraction. 2.3.1 Image Regularization. 2.3.2 Data Extraction. 2.3.3 Concluding Remarks. 2.4 Data Pre-Processing. 2.4.1 Smoothing and Noise Removal. 2.4.2 Skew Detection and Correction. 2.4.3 Slant Correction. 2.4.4 Character Normalization. 2.4.5 Contour Tracing/Analysis. 2.4.6 Thinning. 2.5 Chapter Summary. References 72. 3. Feature Extraction, Selection and Creation. 3.1 Feature Extraction. 3.1.1 Moments. 3.1.2 Histogram. 3.1.3 Direction Features. 3.1.4 Image Registration. 3.1.5 Hough Transform. 3.1.6 Line-Based Representation. 3.1.7 Fourier Descriptors. 3.1.8 Shape Approximation. 3.1.9 Topological Features. 3.1.10 Linear Transforms. 3.1.11 Kernels. 3.2 Feature Selection for Pattern Classification. 3.2.1 Review of Feature Selection Methods. 3.3 Feature Creation for Pattern Classification. 3.3.1 Categories of Feature Creation. 3.3.2 Review of Feature Creation Methods. 3.3.3 Future Trends. 3.4 Chapter Summary. References. 4. Pattern Classification Methods. 4.1 Overview of Classification Methods. 4.2 Statistical Methods. 4.2.1 Bayes Decision Theory. 4.2.2 Parametric Methods. 4.2.3 Non-ParametricMethods. 4.3 Artificial Neural Networks. 4.3.1 Single-Layer Neural Network. 4.3.2 Multilayer Perceptron. 4.3.3 Radial Basis Function Network. 4.3.4 Polynomial Network. 4.3.5 Unsupervised Learning. 4.3.6 Learning Vector Quantization. 4.4 Support Vector Machines. 4.4.1 Maximal Margin Classifier. 4.4.2 Soft Margin and Kernels. 4.4.3 Implementation Issues. 4.5 Structural Pattern Recognition. 4.5.1 Attributed String Matching. 4.5.2 Attributed Graph Matching. 4.6 Combining Multiple Classifiers. 4.6.1 Problem Formulation. 4.6.2 Combining Discrete Outputs. 4.6.3 Combining Continuous Outputs. 4.6.4 Dynamic Classifier Selection. 4.6.5 Ensemble Generation. 4.7 A Concrete Example. 4.8 Chapter Summary. References. 5. Word and String Recognition. 5.1 Introduction. 5.2 Character Segmentation. 5.2.1 Overview of Dissection Techniques. 5.2.2 Segmentation of Handwritten Digits. 5.3 Classification-Based String Recognition. 5.3.1 String Classification Model. 5.3.2 Classifier Design for String Recognition. 5.3.3 Search Strategies. 5.3.4 Strategies for Large Vocabulary. 5.4 HMM-Based Recognition. 5.4.1 Introduction to HMMs. 5.4.2 Theory and Implementation. 5.4.3 Application of HMMs to Text Recognition. 5.4.4 Implementation Issues. 5.4.5 Techniques for Improving HMMs' Performance. 5.4.6 Summary to HMM-Based Recognition. 5.5 Holistic Methods For Handwritten Word Recognition. 5.5.1 Introduction to Holistic Methods. 5.5.2 Overview of Holistic Methods. 5.5.3 Summary to Holistic Methods. 5.6 Chapter Summary. References. 6. Case Studies. 6.1 Automatically Generating Pattern Recognizers with Evolutionary Computation. 6.1.1 Motivation. 6.1.2 Introduction. 6.1.3 Hunters and Prey. 6.1.4 Genetic Algorithm. 6.1.5 Experiments. 6.1.6 Analysis. 6.1.7 Future Directions. 6.2 Offline Handwritten Chinese Character Recognition. 6.2.1 Related Works. 6.2.2 System Overview. 6.2.3 Character Normalization. 6.2.4 Direction Feature Extraction. 6.2.5 Classification Methods. 6.2.6 Experiments. 6.2.7 Concluding Remarks. 6.3 Segmentation and Recognition of Handwritten Dates on Canadian Bank Cheques. 6.3.1 Introduction. 6.3.2 System Architecture. 6.3.3 Date Image Segmentation. 6.3.4 Date Image Recognition. 6.3.5 Experimental Results. 6.3.6 Concluding Remarks. References.

Book
23 Aug 2007
TL;DR: Evolving Connectionist Methods for Unsupervised Learning, Feature Selection, Model Creation, and Model Validation and Brain Inspired Evolving Connectionist Models.
Abstract: Evolving Connectionist Methods.- Feature Selection, Model Creation, and Model Validation.- Evolving Connectionist Methods for Unsupervised Learning.- Evolving Connectionist Methods for Supervised Learning.- Brain Inspired Evolving Connectionist Models.- Evolving Neuro-Fuzzy Inference Models.- Population-Generation-Based Methods: Evolutionary Computation.- Evolving Integrated Multimodel Systems.- Evolving Intelligent Systems.- Adaptive Modelling and Knowledge Discovery in Bioinformatics.- Dynamic Modelling of Brain Functions and Cognitive Processes.- Modelling the Emergence of Acoustic Segments in Spoken Languages.- Evolving Intelligent Systems for Adaptive Speech Recognition.- Evolving Intelligent Systems for Adaptive Image Processing.- Evolving Intelligent Systems for Adaptive Multimodal Information Processing.- Evolving Intelligent Systems for Robotics and Decision Support.- What Is Next: Quantum Inspired Evolving Intelligent Systems?.

Proceedings Article
03 Dec 2007
TL;DR: A discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables to maximize the discrim inative classification performance of the target classifier, while also taking the unlabeled data into account.
Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms.

Book
01 Jun 2007
TL;DR: Introduction Learning and intelligence Machine learning basics Knowledge representation Learning as search Attribute quality measures Data pre-processing Constructive induction Symbolic learning Statistical learning
Abstract: Introduction Learning and intelligence Machine learning basics Knowledge representation Learning as search Attribute quality measures Data pre-processing Constructive induction Symbolic learning Statistical learning Artificial neural networks Cluster analysis Learning theory Computational learning theory Definitions References and index.

Journal ArticleDOI
TL;DR: The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances.
Abstract: We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LID. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks

Journal ArticleDOI
TL;DR: This article compares learning on a complex task with three function approximators, a cerebellar model arithmetic computer (CMAC), an artificial neural network (ANN), and a radial basis function (RBF), and empirically demonstrates that directly transferring the action-value function can lead to a dramatic speedup in learning with all three.
Abstract: Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. This empirical result has motivated the development of many methods that speed up reinforcement learning by modifying a task for the learner or helping the learner better generalize to novel situations. This article focuses on generalizing across tasks, thereby speeding up learning, via a novel form of transfer using handcoded task relationships. We compare learning on a complex task with three function approximators, a cerebellar model arithmetic computer (CMAC), an artificial neural network (ANN), and a radial basis function (RBF), and empirically demonstrate that directly transferring the action-value function can lead to a dramatic speedup in learning with all three. Using transfer via inter-task mapping (TVITM), agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup soccer Keepaway domain. This article contains and extends material published in two conference papers (Taylor and Stone, 2005; Taylor et al., 2005).

Proceedings ArticleDOI
28 Oct 2007
TL;DR: A novel maximum entropy based technique, iterative feature transformation (IFT), is introduced and it is shown how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners.
Abstract: The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. While previous work has studied the supervised version of this problem, we study the more challenging case of unsupervised transductive transfer learning, where no labeled data from the target domain are available at training. We describe some current state-of-the-art inductive and transductive approaches and then adapt these models to the problem of transfer learning for protein name extraction. In the process, we introduce a novel maximum entropy based technique, iterative feature transformation (IFT), and show that it achieves comparable performance with state-of-the-art transductive SVMs. We also show how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners.

Journal ArticleDOI
TL;DR: An algorithm, based on Expectation–Maximization, is presented here for learning the categories from a sequence of vowel tokens without receiving any category information with each vowel token, or knowing in advance the number of categories to learn, or having access to the entire data ensemble.
Abstract: Infants rapidly learn the sound categories of their native language, even though they do not receive explicit or focused training. Recent research suggests that this learning is due to infants' sensitivity to the distribution of speech sounds and that infant-directed speech contains the distributional information needed to form native-language vowel categories. An algorithm, based on Expectation–Maximization, is presented here for learning the categories from a sequence of vowel tokens without (i) receiving any category information with each vowel token, (ii) knowing in advance the number of categories to learn, or (iii) having access to the entire data ensemble. When exposed to vowel tokens drawn from either English or Japanese infant-directed speech, the algorithm successfully discovered the language-specific vowel categories (/i, i, e, e/ for English, /i, iː, e, eː/ for Japanese). A nonparametric version of the algorithm, closely related to neural network models based on topographic representation and competitive Hebbian learning, also was able to discover the vowel categories, albeit somewhat less reliably. These results reinforce the proposal that native-language speech categories are acquired through distributional learning and that such learning may be instantiated in a biologically plausible manner. language acquisition speech perception expectation maximization online learning