scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 1999"


Book ChapterDOI
David Heckerman1
01 Feb 1999
TL;DR: In this article, the authors discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models, including techniques for learning with incomplete data.
Abstract: A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

1,329 citations


Book
01 Jun 1999
TL;DR: In this article, an unsupervised learner brings to bear prior biases as to what aspects of the structure of the input should be captured in the output, which is called prior bias capture.
Abstract: Unsupervised learning studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. By contrast with SUPERVISED LEARNING or REINFORCEMENT LEARNING, there are no explicit target outputs or environmental evaluations associated with each input; rather the unsupervised learner brings to bear prior biases as to what aspects of the structure of the input should be captured in the output.

1,290 citations


01 Jan 1999
TL;DR: This paper surveys the existing theory and methods for independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation.
Abstract: A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the original data. Well-known linear transformation methods include, for example, principal component analysis, factor analysis, and projection pursuit. A recently developed linear transformation method is independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we survey the existing theory and methods for ICA.

1,231 citations


Book
01 Jun 1999
TL;DR: It is argued that the redundancy of sensory messages provides the knowledge incorporated in the maps or models and a representation whose elements are independent makes it possible to form associations with logical functions of the elements, not just with the elements themselves.
Abstract: What use can the brain make of the massive flow of sensory information that occurs without any associated rewards or punishments? This question is reviewed in the light of connectionist models of unsupervised learning and some older ideas, namely the cognitive maps and working models of Tolman and Craik, and the idea that redundancy is important for understanding perception (Attneave 1954), the physiology of sensory pathways (Barlow 1959), and pattern recognition (Watanabe 1960). It is argued that (1) The redundancy of sensory messages provides the knowledge incorporated in the maps or models. (2) Some of this knowledge can be obtained by observations of mean, variance, and covariance of sensory messages, and perhaps also by a method called minimum entropy coding. (3) Such knowledge may be incorporated in a model of what usually happens with which incoming messages are automatically compared, enabling unexpected discrepancies to be immediately identified. (4) Knowledge of the sort incorporated into such a filter is a necessary prerequisite of ordinary learning, and a representation whose elements are independent makes it possible to form associations with logical functions of the elements, not just with the elements themselves.

1,179 citations


Journal ArticleDOI
TL;DR: A new model for static data is introduced, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise, which shows how independent component analysis is also a variation of the same basic generative model.
Abstract: Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model.We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.

986 citations


Journal ArticleDOI
TL;DR: This paper investigates how the learning modules specialized for these three kinds of learning can be assembled into goal-oriented behaving systems and presents a novel view that their computational roles can be characterized by asking what are the "goals" of their computation.

734 citations


Journal ArticleDOI
TL;DR: The experimental results show that negative correlation learning can produce neural network ensembles with good generalisation ability.

708 citations


01 Jan 1999
TL;DR: It is demonstrated that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy.
Abstract: Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hidden Markov models for information extraction tasks, specifically focusing on how to learn model structure from data and how to make the best use of labeled and unlabeled data. We show that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy. Our models are applied to the task of extracting important fields from the headers of computer science research papers, and achieve an extraction accuracy of 92.9%.

449 citations


Book
01 Jan 1999
TL;DR: A kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces and shows that the limiting distribution of the value function estimate is a Gaussian process.
Abstract: We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.

419 citations


Journal ArticleDOI
TL;DR: Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.
Abstract: There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.

351 citations


Book
01 Jan 1999
TL;DR: This volume of Foundations of Neural Computation, on unsupervised learning algorithms, focuses on neural network learning algorithms that do not require an explicit teacher to extract an efficient internal representation of the statistical structure implicit in the inputs.
Abstract: Since its founding in 1989 by Terrence Sejnowski, Neural Computation has become the leading journal in the field. Foundations of Neural Computation collects, by topic, the most significant papers that have appeared in the journal over the past nine years. This volume of Foundations of Neural Computation, on unsupervised learning algorithms, focuses on neural network learning algorithms that do not require an explicit teacher. The goal of unsupervised learning is to extract an efficient internal representation of the statistical structure implicit in the inputs. These algorithms provide insights into the development of the cerebral cortex and implicit learning in humans. They are also of interest to engineers working in areas such as computer vision and speech recognition who seek efficient representations of raw input data.

Journal ArticleDOI
TL;DR: This work presents an algorithm combining variants of Winnow and weighted-majority voting, and applies it to a problem in the aforementioned class: context-sensitive spelling correction, and finds that WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition.
Abstract: A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. In the work reported here, we present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned conditions (2) When compared with other systems in the literature, WinSpell exhibits the highest performances (3) While several aspects of WinSpell‘s architecture contribute to its superiority over BaySpell, the primary factor is that it is able to learn a better linear separator than BaySpell learnss (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.

Proceedings Article
27 Jun 1999
TL;DR: This paper presents an algorithm for learning a value function that maps hyperlinks to future discounted reward using a naive Bayes text classifier and shows a threefold improvement in spidering efficiency over traditional breadth-first search, and up to a two-fold improvement over reinforcement learning with immediate reward.
Abstract: Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper argues that the creation of efficient web spiders is best framed and solved by reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision making. One strength of reinforcement learning is that it provides a formalism for measuring the utility of actions that give benefit only in the future. We present an algorithm for learning a value function that maps hyperlinks to future discounted reward using a naive Bayes text classifier. Experiments on two real-world spidering tasks show a threefold improvement in spidering efficiency over traditional breadth-first search, and up to a two-fold improvement over reinforcement learning with immediate reward only.

Book ChapterDOI
T. Villmann1
01 Jan 1999
TL;DR: Self-organizing map (SOMs) are special types of neural maps which have found a wide distribution and a mathematically exact definition is developed for this and ways of measuring the degree of topology preservation are shown.
Abstract: Publisher Summary Self-organizing map (SOMs) are special types of neural maps which have found a wide distribution. Neural maps constitute an important neural network paradigm. In brains, neural maps occur in all sensory modalities as well as in motor areas. In technical contexts, neural maps are utilized in the fashion of neighborhood preserving vector quantizers. In both cases, these networks project data from some possibly high-dimensional input space onto a position in some output space. To achieve this projection, neural maps are self-organized by unsupervised learning schemes. It also discusses the problem of topology preservation in self-organizing maps. A mathematically exact definition is developed for this and it show ways of measuring the degree of topology preservation. Finally, advanced learning scheme is also introduced for generating general hypercube structures for self-organizing maps which then yield improved topology preservation for the map.

Journal ArticleDOI
TL;DR: This paper addresses an important step toward the goal of automatic musical accompaniment-the segmentation problem, given a score to a piece of monophonic music and a sampled recording of a performance of that score, by designing a hidden Markov model for segmentation.
Abstract: In this paper, we address an important step toward our goal of automatic musical accompaniment-the segmentation problem. Given a score to a piece of monophonic music and a sampled recording of a performance of that score, we attempt to segment the data into a sequence of contiguous regions corresponding to the notes and rests in the score. Within the framework of a hidden Markov model, we model our prior knowledge, perform unsupervised learning of the data model parameters, and compute the segmentation that globally minimizes the posterior expected number of segmentation errors. We also show how to produce "online" estimates of score position. We present examples of our experimental results, and readers are encouraged to access actual sound data we have made available from these experiments.

Book ChapterDOI
01 Jan 1999
TL;DR: An unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus is described and compared to the Baum-Welch algorithm, used for unsuper supervised training of stochastic taggers.
Abstract: In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text1.

Proceedings Article
31 Jul 1999
TL;DR: This work presents a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN).
Abstract: We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original E algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.

Posted Content
TL;DR: This paper presented a model-based unsupervised algorithm for recovering word boundaries in a natural language text from which they have been deleted, which is derived from a probability model of the source that generated the text.
Abstract: This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, word-order, and word frequency can be replaced in a modular fashion. The model yields a language-independent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that this algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances. Keywords: Bayesian grammar induction, probability models, minimum description length (MDL), unsupervised learning, cognitive modeling, language acquisition, segmentation

Proceedings ArticleDOI
01 Apr 1999
TL;DR: The algorithmic details of TPOT-RL as well as empirical results demonstrating the effectiveness of the developed multi-agent learning approach with learned features are presented.
Abstract: We present a novel multi-agent learning paradigm called team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL introduces the use of action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space to aid higher-level reinforcement learning. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities. TPOT-RL is fully implemented and has been tested in the robotic soccer domain, a complex, multi-agent framework. This paper presents the algorithmic details of TPOT-RL as well as empirical results demonstrating the effectiveness of the developed multi-agent learning approach with learned features.

Journal ArticleDOI
01 Sep 1999
TL;DR: This study introduces unsupervised learning (clustering) where optimization is supported by the linguistic granules of context, thereby giving rise to so-called context-sensitive fuzzy clustering.
Abstract: The study is devoted to linguistic data mining, an endeavor that exploits the concepts, constructs, and mechanisms of fuzzy set theory. The roles of information granules, information granulation, and the techniques therein are discussed in detail. Particular attention is given to the manner in which these information granules are represented as fuzzy sets and manipulated according to the main mechanisms of fuzzy sets. We introduce unsupervised learning (clustering) where optimization is supported by the linguistic granules of context, thereby giving rise to so-called context-sensitive fuzzy clustering. The combination of neuro, evolutionary, and granular computing in the context of data mining is explored. Detailed numerical experiments using well-known datasets are also included and analyzed.

Proceedings Article
31 Jul 1999
TL;DR: This paper presents a novel statistical latent class model for text mining and interactive information access, called Cluster-Abstraction Model (CAM), which is purely data driven and utilizes contact-specific word occurrence statistics.
Abstract: This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster-Abstraction Model (CAM), is purely data driven and utilizes contact-specific word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation-Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The benefits of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally.

Journal ArticleDOI
TL;DR: In this paper, a hybrid short-term electrical load forecaster that is being evaluated by a power utility is documented in the online implementation and results from a Hybrid Short-Term Electrical Load Forecaster (HSEF) that is used to classify daily load patterns.
Abstract: The online implementation and results from a hybrid short-term electrical load forecaster that is being evaluated by a power utility are documented in this paper. This forecaster employs a new approach involving a parallel neural-fuzzy expert system, whereby Kohonen's self-organizing feature map with unsupervised learning, is used to classify daily load patterns. Post-processing of the neural network outputs is performed with a fuzzy expert system which successfully corrects the load deviations caused by the effects of weather and holiday activity. Being highly automated, little human interference is required during the process of load forecasting. A comparison made between this model and a regression-based model currently being used in the control centre has shown a marked improvement in load forecasting results.

Journal ArticleDOI
TL;DR: A new approach to learning Bayesian network structures based on the minimum description length (MDL) principle and evolutionary programming is developed, which employs a MDL metric and integrates a knowledge-guided genetic operator for the optimization in the search process.
Abstract: We have developed a new approach to learning Bayesian network structures based on the minimum description length (MDL) principle and evolutionary programming. It employs a MDL metric, which is founded on information theory, and integrates a knowledge-guided genetic operator for the optimization in the search process.

Book
15 Dec 1999
TL;DR: This book discusses current Approaches to Process Monitoring, Diagnosis and Control, and a method for Selection of Training / Test Data and Model Retraining for Supervised Learning for Operational Support.
Abstract: 1 Introduction.- 1.1 Current Approaches to Process Monitoring, Diagnosis and Control.- 1.2 Monitoring Charts for Statistical Quality Control.- 1.3 The Operating Window.- 1.4 State Space Based Process Monitoring and Control.- 1.5 Characteristics of Process Operational Data.- 1.6 System Requirement and Architecture.- 1.7 Outline of the Book.- 2 Data Mining and Knowledge Discovery - an Overview.- 2.1 Definition and Development.- 2.2 The KDD Process.- 2.3 Data Mining Techniques.- 2.4 Feature Selection with Data Mining.- 2.5 Final Remarks and Additional Resources.- 3 Data Pre-processing for Feature Extraction, Dimension Reduction and Concept Formation.- 3.1 Data Pre-processing.- 3.2 Use of Principal Component Analysis.- 3.3 Wavelet Analysis.- 3.4 Episode Approach.- 3.5 Summary.- 4 Multivariate Statistical Analysis for Data Analysis and Statistical Control.- 4.1 PCA for State Identification and Monitoring.- 4.2 Partial Least Squares (PLS).- 4.3 Variable Contribution Plots.- 4.4 Multiblock PCA and PLS.- 4.5 Batch Process Monitoring Using Multiway PCA.- 4.6 Nonlinear PCA.- 4.7 Operational Strategy Development and Product Design - an Industrial Case Study.- 4.8 General Observations.- 5 Supervised Learning for Operational Support.- 5.1 Feedforward Neural Networks.- 5.2 Variable Selection and Feature Extraction for FFNN Inputs.- 5.3 Model Validation and Confidence Bounds.- 5.4 Application of FFNN to Process Fault Diagnosis.- 5.5 Fuzzy Neural Networks.- 5.6 Fuzzy Set Covering Method.- 5.7 Fuzzy Signed Digraphs.- 5.8 Case Studies.- 5.9 General Observations.- 6 Unsupervised Learning for Operational State Identification.- 6.1 Supervised vs. Unsupervised Learning.- 6.2 Adaptive Resonance Theory.- 6.3 A Framework for Integrating Wavelet Feature Extraction and ART2.- 6.4 Application of ARTnet to the FCC Process.- 6.5 Bayesian Automatic Classification.- 6.6 Application of AutoClass to the FCC Process.- 6.7 General Comments.- 7 Inductive Learning for Conceptual Clustering and Real-time Process Monitoring.- 7.1 Inductive Learning.- 7.2 IL for Knowledge Discovery from Averaged Data.- 7.3 IL for Conceptual Clustering and Real-time Monitoring.- 7.4 Application to the Refinery MTBE Process.- 7.5 General Review.- 8 Automatic Extraction of Knowledge Rules from Process Operational Data.- 8.1 Rules Generation Using Fuzzy Set Operation.- 8.2 Rules Generation from Neural Networks.- 8.3 Rules Generation Using Rough Set Method.- 8.4 A Fuzzy Neural Network Method for Rules Extraction.- 8.5 Discussion.- 9 Inferential Models and Software Sensors.- 9.1 Feedforward Neural Networks as Software Sensors.- 9.2 A Method for Selection of Training / Test Data and Model Retraining.- 9.3 An Industrial Case Study.- 9.4 Dimension Reduction of Input Variables.- 9.5 Dynamic Neural Networks as Inferential Models.- 9.6 Summary.- 10 Concluding Remarks.- Appendix A The Continuous Stirred Tank Reactor (CSTR).- Appendix B The Residue Fluid Catalytic Cracking (R-FCC) Process.- Appendix C The Methyl Tertiary Butyl Ether (MTBE) Process.- References.

Book ChapterDOI
06 Dec 1999
TL;DR: This work addresses the problem of learning with the help of positive and unlabeled data given a small number of labeled examples and presents both theoretical and empirical arguments showing that learning algorithms can be improved by the use of both unlabeling and positive data.
Abstract: In many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorithms can be improved by the use of both unlabeled and positive data. As an illustrating problem, we consider the learning algorithm from statistics for monotone conjunctions in the presence of classiffication noise and give empirical evidence of our assumptions. We give theoretical results for the improvement of Statistical Query learning algorithms from positive and unlabeled data. Lastly, we apply these ideas to tree induction algorithms. We modify the code of C4.5 to get an algorithm which takes as input a set LAB of labeled examples, a set POS of positive examples and a set UNL of unlabeled data and which uses these three sets to construct the decision tree. We provide experimental results based on data taken from UCI repository which confirm the relevance of this approach.

Journal ArticleDOI
TL;DR: The Meta-AQUA system is presented, an implemented multistrategy learner that operates in the domain of story understanding and it is concluded that explicit representation and sequencing of learning goals is necessary for avoiding negative interactions between learning algorithms that can lead to less effective learning.

01 Jan 1999
TL;DR: This work presents a decision tree based approach to function approximation in reinforcement learning and finds that the decision tree can provide better learning performance than the neural network function approximation and can solve large problems that are infeasible using table lookup.
Abstract: The goal in reinforcement learning is to learn the value of taking each action from each possible state in order to maximize the total reward. In scaling reinforcement learning to problems with large numbers of states and/or actions, the representation of the value function becomes critical. We present a decision tree based approach to function approximation in reinforcement learning. We compare our approach with table lookup and a neural network function approximator on three problems: the well known mountain car and pole balance problems as well as a simulated automobile race car. We find that the decision tree can provide better learning performance than the neural network function approximation and can solve large problems that are infeasible using table lookup.

01 Jan 1999
TL;DR: A railroad car dumper, suitable for dumping cars of a unit train, is disclosed and lateral shifting of the dumper frame while the carriage and tracks remain fixed permits a locomotive, larger than the cars to be dumped, to pass through the shifted frame.
Abstract: A railroad car dumper, suitable for dumping cars of a unit train, is disclosed. The dumper has a frame and a carriage, with tracks on the carriage to receive a car from adjacent tracks. The frame has a sidewall to support a car on the tracks during dumping with the car couplers on the axis of rotation of the dumper. The dumper frame is shiftable laterally, while the carriage remains fixed to maintain alignment of the carriage tracks with the adjacent tracks. Lateral shifting of the dumper frame while the carriage and tracks remain fixed permits a locomotive, larger than the cars to be dumped, to pass through the shifted frame.

Journal ArticleDOI
TL;DR: Various neural network architectures and associated adaptive learning algorithms are discussed for handling the cases where the number of sources is unknown, and techniques include estimation of thenumber of sources, redundancy removal among the outputs of the networks, and extraction of the sources one at a time.

Proceedings Article
27 Jun 1999
TL;DR: The analysis shows that the worst-case (expected) regret for the methods is almost optimal: the upper bounds grow with the number m of trials and the number n of alternatives like O(m 3=4 n 1=2 ) and O( m 4=5 n 2=5 ), and the lower bound is.
Abstract: We consider the problem of maximizing the total number of successes while learning about a probability function determining the likelihood of a success. In particular, we consider the case in which the probability function is represented by a linear function of the attribute vector associated with each action/choice. In the scenario we consider, learning proceeds in trials and in each trial, the algorithm is given a number of alternatives to choose from, each having an attribute vector associated with it, and for the alternative it selects it gets either a success or a failure with probability determined by applying a xed but unknown linear success probability function to the attribute vector. Our algorithms consist of a learning method like the Widrow-Ho rule and a probabilistic selection strategy which work together to resolve the so-called exploration-exploitation tradeo . We analyze the performance of these methods by proving bounds on the worst-case regret, or how many less successes they expect to get as compared to the ideal (but unrealistic) strategy that knows the target probability function. Our analysis shows that the worst-case (expected) regret for our methods is almost optimal: the upper bounds grow with the number m of trials and the number n of alternatives like O(m 3=4 n 1=2 ) and O(m 4=5 n 2=5 ), and the lower bound is