scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2016"


Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper describes an Active Anomaly Discovery method for incorporating expert feedback to adjust the anomaly detector so that the outliers it discovers are more in tune with the expert user's semantic understanding of the anomalies.
Abstract: Unsupervised anomaly detection algorithms search for outliers and then predict that these outliers are the anomalies. When deployed, however, these algorithms are often criticized for high false positive and high false negative rates. One cause of poor performance is that not all outliers are anomalies and not all anomalies are outliers. In this paper, we describe an Active Anomaly Discovery (AAD) method for incorporating expert feedback to adjust the anomaly detector so that the outliers it discovers are more in tune with the expert user's semantic understanding of the anomalies. The AAD approach is designed to operate in an interactive data exploration loop. In each iteration of this loop, our algorithm first selects a data instance to present to the expert as a potential anomaly and then the expert labels the instance as an anomaly or as a nominal data point. Our algorithm updates its internal model with the instance label and the loop continues until a budget of B queries is spent. The goal of our approach is to maximize the total number of true anomalies in the B instances presented to the expert. We show that when compared to other state-of-the-art algorithms, AAD is consistently one of the best performers.

138 citations


Proceedings Article
25 Jun 2016
TL;DR: A Probably Approximately Correct (PAC) framework for anomaly detection based on the identification of rare patterns is introduced and sample complexity results that relate the complexity of the pattern space to the data requirements needed for PAC guarantees are developed.
Abstract: Anomaly detection is a fundamental problem for which a wide variety of algorithms have been developed. However, compared to supervised learning, there has been very little work aimed at understanding the sample complexity of anomaly detection. In this paper, we take a step in this direction by introducing a Probably Approximately Correct (PAC) framework for anomaly detection based on the identification of rare patterns. In analogy with the PAC framework for supervised learning, we develop sample complexity results that relate the complexity of the pattern space to the data requirements needed for PAC guarantees. We instantiate the general result for a number of pattern spaces, some of which are implicit in current state-of-the-art anomaly detectors. Finally, we design a new simple anomaly detection algorithm motivated by our analysis and show experimentally on several benchmark problems that it is competitive with a state-of-the-art detector using the same pattern space.

9 citations


Proceedings Article
09 Jul 2016
TL;DR: This paper introduces a new approach, Transductive Top K (TTK), that seeks to minimize the hinge loss over all training instances under the constraint that exactly $k$ test instances are predicted as positive.
Abstract: Consider a binary classification problem in which the learner is given a labeled training set, an unlabeled test set, and is restricted to choosing exactly k test points to output as positive predictions. Problems of this kind--transductive precision@ k-- arise in many applications. Previous methods solve these problems in two separate steps, learning the model and selecting k test instances by thresholding their scores. In this way, model training is not aware of the constraint of choosing k test instances as positive in the test phase. This paper shows the importance of incorporating the knowledge of k into the learning process and introduces a new approach, Transductive Top K (TTK), that seeks to minimize the hinge loss over all training instances under the constraint that exactly k test instances are predicted as positive. The paper presents two optimization methods for this challenging problem. Experiments and analysis confirm the benefit of incoporating k in the learning process. In our experimental evaluations, the performance of TTK matches or exceeds existing state-of-the-art methods on 7 benchmark datasets for binary classification and 3 reserve design problem instances.

8 citations


Dataset
10 Jun 2016
TL;DR: This research attacked the mode confusion problem by developing a modeling framework called “model-agnostic reinforcement learning” which automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging individual neurons.
Abstract: Benchmarks are derived from several data sets found at the UC Irvine Machine Learning Repository: https://archive.ics.uci.edu/ml/index.html

6 citations


Journal ArticleDOI
11 Jan 2016
TL;DR: These are boom times for AI, with millions of people routinely use AI-based systems that the founders of the field would hail as miraculous and a palpable sense of excitement about impending applications of AI technologies.
Abstract: These are boom times for AI. Articles celebrating the success of AI research appear frequently in the international press. Every day, millions of people routinely use AI-based systems that the founders of the field would hail as miraculous. And there is a palpable sense of excitement about impending applications of AI technologies.

2 citations