scispace - formally typeset
Search or ask a question

Showing papers by "Mark Hall published in 2017"


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter only considers basic, principled, versions of learning algorithms, leaving advanced features that are necessary for real-world deployment for later, including k-means and hierarchical clustering.
Abstract: s Now we plunge into the world of actual machine learning algorithms. This chapter only considers basic, principled, versions of learning algorithms, leaving advanced features that are necessary for real-world deployment for later. A rudimentary rule learning algorithm simply picks a single attribute to make predictions; the well-known “Naive Bayes” method for probabilistic classification uses all the attributes instead, equally weighted. Next we discuss the standard “divide-and-conquer” algorithm for learning decision trees, and the “separate-and-conquer” algorithm for learning decision rules. Then we show how to efficiently mine a dataset for association rules: the seminal Apriori algorithm. Linear models are next: linear regression for numeric prediction and logistic regression for classification. We also consider two mistake-driven algorithms for learning linear classifiers: the classical perceptron and the Winnow method. This is followed by sections on instance-based learning and clustering. For instance-based learning, we discuss data structures for fast nearest-neighbor search; for clustering, we present two classic algorithms that have stood the test of time: k-means and hierarchical clustering. The final section departs from the standard instance-based setting considered in most of the book and provides an introduction to algorithms for multi-instance learning, where an “example” consists of an entire bag of instances, rather than a single one.

26 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This section discusses how the quality of predictions can be measured reliably, and considers the basic train-test setup for estimating predictive accuracy, before moving on to more sophisticated variants known as “cross-validation” and the “bootstrap” method.
Abstract: The success of machine learning in practical applications hinges on proper evaluation. This section discusses how the quality of predictions can be measured reliably. We consider the basic train-test setup for estimating predictive accuracy, before moving on to more sophisticated variants known as “cross-validation” and the “bootstrap” method. We also discuss the importance of proper parameter tuning when applying and evaluating machine learning, and explain how to use statistical significance tests when comparing the performance of two learning algorithms in a particular application domain. As well as basic classification accuracy, we consider other measures for evaluating the quality of probability estimates, learning and prediction with misclassification costs, and measures for evaluating numeric prediction schemes. The final section discusses model selection, which is the process of determining an appropriate model complexity, using the compression-based minimum description length principle, on the one hand, and evaluation on a validation set, on the other.

22 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter revisiting the basic instance-based learning method of nearest-neighbor classification and considering how it can be made more robust and storage efficient by generalizing both exemplars and distance functions and an alternative method for tackling learning problems with complex relationships.
Abstract: We begin by revisiting the basic instance-based learning method of nearest-neighbor classification and considering how it can be made more robust and storage efficient by generalizing both exemplars and distance functions. We then discuss two well-known approaches for generalizing linear models that go beyond modeling linear relationships between the inputs and the outputs. The first is based on the so-called kernel trick, which implicitly creates a high-dimensional feature space and models linear relationships in this extended space. We discuss support vector machines for classification and regression, kernel ridge regression, and kernel perceptrons. The second approach is based on applying simple linear models in a network structure that includes nonlinear transformations. This yields neural networks, and we discuss the classical multilayer perceptron. The final part of the chapter discusses an alternative method for tackling learning problems with complex relationships: building linear models that are local in the sense that they only apply to a small part of the input space. We consider model trees, which are decision trees with linear regression models at the leaf nodes, and locally weighted linear regression, which combines instance-based learning and linear regression.

6 citations


Book ChapterDOI
01 Jan 2017
TL;DR: The last section of this chapter switches to unsupervised learning of rule sets by investigating how a special-purpose data structure can be constructed to accelerate the process of finding association rules.
Abstract: This chapter explains practical decision tree and rule learning methods, and also considers more advanced approaches for generating association rules. The basic algorithms for learning classification trees and rules presented in Chapter 4, Algorithms: the basic methods, are extended to make them applicable to real-world problems that contain numeric attributes, noise, and missing values. We discuss the seminal C4.5 algorithm for decision tree learning, consider an alternative pruning method implemented in the CART tree learning algorithm, and discuss the incremental reduced-error pruning method for growing and pruning classification rules, leading up to the RIPPER and PART algorithms for rule induction. We also briefly consider rule sets with exceptions. The last section of this chapter switches to unsupervised learning of rule sets by investigating how a special-purpose data structure can be constructed to accelerate the process of finding association rules. More specifically, we consider frequent-pattern trees and how they can be used to efficiently search for frequent item sets.

5 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter discusses how to combine clustering with classification; more specifically, how mixture model clustering using expectation maximization can be combined with Naive Bayes to blend information from both labeled and unlabeled data.
Abstract: In many practical applications, labeled data is rare or costly to obtain. “Semisupervised” learning exploits unlabeled data to improve the performance of supervised learning. We first discuss how to combine clustering with classification; more specifically, how mixture model clustering using expectation maximization can be combined with Naive Bayes to blend information from both labeled and unlabeled data. Next, we discuss how a generative approach for learning from unlabeled data, such as the one based on fitting a mixture model, can be combined with discriminative learning from labeled data. Following that, we consider the “cotraining” method for semisupervised learning, which can be applied when two different views are available for the same data. These two views normally correspond to two distinct sets of attributes for the same instances. Finally, we see how cotraining can be combined with expectation maximization in the so called “Co-EM” method to yield further improvements in predictive performance in some scenarios. In addition to semi-supervised learning, this chapter also discusses sophisticated techniques for multi-instance learning, which improve on the simple methods presented in Chapter 4, Algorithms: the basic methods.

4 citations


Book ChapterDOI
01 Jan 2017
TL;DR: The types of output that can be generated by machine learning are reviewed, including linear models such as those produced by linear regression and flat clusters and dendrograms, and the realm of inductive logic programming is briefly explored.
Abstract: Having examined the input to machine learning, we move on to review the types of output that can be generated. We first discuss decision tables, which are perhaps the most basic form of knowledge representation, before considering linear models such as those produced by linear regression. Next we explain decision trees, the most widely used kind of knowledge representation in classic machine learning, before looking at rule sets, which are a popular alternative. We consider classification rules, association rules, and rules with exceptions. We briefly venture into the realm of inductive logic programming, which allows for more complex rules than the practical learning techniques covered in this book. Following rules, we discuss instance-based learning and rectangular generalizations. The final section covers the basic types of output generated by clustering techniques: flat clusters and dendrograms.

2 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter explains what kind of structure is required in the input data when applying the machine learning techniques covered in the book, and establishes the terminology that will be used.
Abstract: Machine learning requires something to learn from: data. This chapter explains what kind of structure is required in the input data when applying the machine learning techniques covered in the book, and establishes the terminology that will be used. First, we explain what is meant by learning a concept from data, and describe the types of machine learning that will be considered: classification learning, association learning, clustering, and numeric prediction. We go on to explain what sort of examples a learning algorithm can be given to learn a concept from. “Examples” are described by attributes, and we continue by reviewing the types of attribute that are used. The final section of this chapter describes the most labor-intensive part of applying data mining in practice: getting the data ready for learning. We discuss the attribute-relation file format data format used by the Weka machine learning workbench that accompanies this book, and tackle issues such as sparse data, choice of appropriate attribute types, inaccurate and missing values, and getting to know your data by visualization.

1 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter introduces a few of them: text mining, including document classification and clustering; web Mining, including wrapper induction and the page-rank method used for web search; computer vision, including both object and face recognition; speech recognition; and natural language processing and understanding.
Abstract: Recent years have seen machine learning techniques become prominent in unexpectedly diverse application areas. This chapter introduces a few of them: text mining, including document classification and clustering; web mining, including wrapper induction and the page-rank method used for web search; computer vision, including both object and face recognition; speech recognition; and natural language processing and understanding. Deep learning has made inroads in all these areas and we draw connections to the material covered in Chapter 10, Deep learning. We also consider some other issues that are relevant in practical applications. Applying machine learning to data mining often involves careful choice of learning algorithm and algorithm parameters. Many practical datasets are truly massive and cannot be tackled with standard algorithms designed for small-to-medium size data. In some real-world scenarios, data arrives in a stream, requiring the ability to constantly and quickly update the model and respond to changes in the nature of the data. Often, domain expertise is present in the form of background knowledge that can be used to aid the learning algorithm in finding good concept descriptions. Some applications, particularly in the cyber-security area, involve adversarial situations, where the learning algorithm is confronted with training data that is designed to be misleading. Machine learning techniques are beginning to creep into our daily environment, and we end by glimpsing a future of ubiquitous data mining.

1 citations