scispace - formally typeset
Search or ask a question
Author

Thomas G. Dietterich

Bio: Thomas G. Dietterich is an academic researcher from Oregon State University. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 74, co-authored 279 publications receiving 51935 citations. Previous affiliations of Thomas G. Dietterich include University of Wyoming & Stanford University.


Papers
More filters
Journal ArticleDOI
TL;DR: The Active Anomaly Discovery (AAD) algorithm is described, which incorporates feedback from an expert user that labels a queried data instance as an anomaly or nominal point and approximations are presented that make the AAD algorithm much more computationally efficient while maintaining a desirable level of performance.
Abstract: Unsupervised anomaly detection algorithms search for outliers and then predict that these outliers are the anomalies. When deployed, however, these algorithms are often criticized for high false-positive and high false-negative rates. One main cause of poor performance is that not all outliers are anomalies and not all anomalies are outliers. In this article, we describe the Active Anomaly Discovery (AAD) algorithm, which incorporates feedback from an expert user that labels a queried data instance as an anomaly or nominal point. This feedback is intended to adjust the anomaly detector so that the outliers it discovers are more in tune with the expert user’s semantic understanding of the anomalies. The AAD algorithm is based on a weighted ensemble of anomaly detectors. When it receives a label from the user, it adjusts the weights on each individual ensemble member such that the anomalies rank higher in terms of their anomaly score than the outliers. The AAD approach is designed to operate in an interactive data exploration loop. In each iteration of this loop, our algorithm first selects a data instance to present to the expert as a potential anomaly and then the expert labels the instance as an anomaly or as a nominal data point. When it receives the instance label, the algorithm updates its internal model and the loop continues until a budget of B queries is spent. The goal of our approach is to maximize the total number of true anomalies in the B instances presented to the expert. We show that the AAD method performs well and in some cases doubles the number of true anomalies found compared to previous methods. In addition we present approximations that make the AAD algorithm much more computationally efficient while maintaining a desirable level of performance.

10 citations

Proceedings Article
01 Jan 2017
TL;DR: Three new algorithms, designed to complement an existing POMDP solver and be able to approximately solve N-POMDPs, are proposed, based on a general approach that is called α-min-2.
Abstract: In many fields in computational sustainability, applications of POMDPs are inhibited by the complexity of the optimal solution. One way of delivering simple solutions is to represent the policy with a small number of α-vectors. We would like to find the best possible policy that can be expressed using a fixed number N of α-vectors. We call this the N-POMDP problem. The existing solver α-min approximately solves finite-horizon POMDPs with a controllable number of α-vectors. However α-min is a greedy algorithm without performance guarantees, and it is rather slow. This paper proposes three new algorithms, based on a general approach that we call α-min-2. These three algorithms are able to approximately solve N-POMDPs. α-min-2-fast (heuristic) and α-min-2-p (with performance guarantees) are designed to complement an existing POMDP solver, while α-min-2-solve (heuristic) is a solver itself. Complexity results are provided for each of the algorithms, and they are tested on well-known benchmarks. These new algorithms will help users to interpret solutions to POMDP problems in computational sustainability.

10 citations

Book
01 Jan 2003
TL;DR: A decision-theoretic approach to DLFT is described in which historical test data is mined to create a probabilistic model of patterns of die failure and this model is combined with greedy value-of-information computations to decide in real time which die to test next and when to stop testing.
Abstract: We describe an application of machine learning and decision analysis to the problem of die-level functional tests in integrated circuit manufacturing. Integrated circuits (ICs) are fabricated on large wafers that can hold hundreds of individual chips (die). In current practice, large and expensive machines test each of these die to check that they are functioning properly (die-level functional test or DLFT), and then the wafers are cut up, and the good die are assembled into packages and connected to the package pins. Finally, the resulting packages are tested to ensure that the final product is functioning correctly. The purpose of the die-level functional test is to avoid the expense of packaging bad die and to provide rapid feedback to the fabrication process by detecting die failures. The challenge for a decision-theoretic approach is to reduce the amount of DLFT (and the associated costs) while still providing process feedback. We describe a decision-theoretic approach to DLFT in which historical test data is mined to create a probabilistic model of patterns of die failure. This model is combined with greedy value-of-information computations to decide in real time which die to test next and when to stop testing. We report the results of several experiments that demonstrate the ability of this procedure to make good testing decisions, to make good stopping decisions, and to detect anomalous die. Based on experiments with historical test data from Hewlett-Packard, the resulting system has the potential to improve profits on mature IC products.

9 citations

Proceedings ArticleDOI
02 Dec 2013
TL;DR: While locally optimal CRF inference may be sufficient for images of natural scenes, the results demonstrate that CRF with graph cuts performs poorly on the nematocyst images, and that HC-Search outperforms CRF without graph cuts, suggesting biological images of flexible objects present new challenges requiring further ad- vances of, or alternatives to existing methods.
Abstract: This paper presents a learning approach for detecting nematocysts in Scanning Electron Microscope (SEM) images. The image dataset was collected and made available to us by biologists for the purposes of morphological studies of corals, jellyfish, and other species in the phylum Cnidaria. Challenges for computer vision presented by this biological domain are rarely seen in general images of natural scenes. We formulate nematocyst detection as labeling of a regular grid of image patches. This structured prediction problem is specified within two frameworks: CRF and HC-Search. The CRF uses graph cuts for inference. The HC-Search approach is based on search in the space of outputs. It uses a learned heuristic function (H) to uncover high-quality candidate labelings of image patches, and then uses a learned cost function (C) to select the final prediction among the candidates. While locally optimal CRF inference may be sufficient for images of natural scenes, our results demonstrate that CRF with graph cuts performs poorly on the nematocyst images, and that HC-Search outperforms CRF with graph cuts. This suggests biological images of flexible objects present new challenges requiring further ad- vances of, or alternatives to existing methods.

9 citations

Proceedings ArticleDOI
04 Jun 2012
TL;DR: A novel approach to modeling the migration of birds is described and a major challenge for all of these methods is to scale up to large, spatially-distributed systems.
Abstract: To avoid ecological collapse, we must manage Earth's ecosystems sustainably. Viewed as a control problem, the two central challenges of ecosystem management are to acquire a model of the system that is sufficient to guide good decision making and then optimize the control policy against that model. This paper describes three efforts aimed at addressing the first of these challenges—machine learning methods for modeling ecosystems. The first effort focuses on automated quality control of environmental sensor data. Next, we consider the problem of learning species distribution models from citizen science observational data. Finally, we describe a novel approach to modeling the migration of birds. A major challenge for all of these methods is to scale up to large, spatially-distributed systems.

9 citations


Cited by
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations