scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine learning

01 Dec 1996-ACM Computing Surveys-Vol. 28, Iss: 4
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
Citations
More filters
Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

17,177 citations

Journal ArticleDOI
28 Jul 2006-Science
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

16,717 citations

Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations


Cites background or methods from "Machine learning"

  • ...Learning hierarchical representations through deep SL, UL, RL Many methods of Good Old-Fashioned Artificial Intelligence (GOFAI) (Nilsson, 1980) as well as more recent approaches to AI (Russell, Norvig, Canny, Malik, & Edwards, 1995) and Machine Learning (Mitchell, 1997) learn hierarchies of more and more abstract data representations....

    [...]

  • ...This work also introduced the MNIST data set of handwritten digits (LeCun et al., 1989), which over time has become perhaps the most famous benchmark ofMachine Learning....

    [...]

  • ..., 1995) and Machine Learning (Mitchell, 1997) learn hierarchies of more and more abstract data representations....

    [...]

  • ...…through deep SL, UL, RL Many methods of Good Old-Fashioned Artificial Intelligence (GOFAI) (Nilsson, 1980) as well as more recent approaches to AI (Russell, Norvig, Canny, Malik, & Edwards, 1995) and Machine Learning (Mitchell, 1997) learn hierarchies of more and more abstract data representations....

    [...]

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations

References
More filters
Posted Content
TL;DR: This work used machine learning classification to derive peptide detection probabilities that are used to predict the number of trypic peptides to be observed, which can serve to estimate the absolutely abundance of protein with highly accuracy.
Abstract: The ultimate target of proteomics identification is to identify and quantify the protein in the organism. Mass spectrometry (MS) based on label-free protein quantitation has mainly focused on analysis of peptide spectral counts and ion peak heights. Using several observed peptides (proteotypic) can identify the origin protein. However, each peptide's possibility to be detected was severely influenced by the peptide physicochemical properties, which confounded the results of MS accounting. Using about a million peptide identification generated by four different kinds of proteomic platforms, we successfully identified >16,000 proteotypic peptides. We used machine learning classification to derive peptide detection probabilities that are used to predict the number of trypic peptides to be observed, which can serve to estimate the absolutely abundance of protein with highly accuracy. We used the data of peptides (provides by CAS lab) to derive the best model from different kinds of methods. We first employed SVM and Random Forest classifier to identify the proteotypic and unobserved peptides, and then searched the best parameter for better prediction results. Considering the excellent performance of our model, we can calculate the absolutely estimation of protein abundance.

3 citations

Proceedings ArticleDOI
12 Nov 2012
TL;DR: For optimizing the parameters discriminatively, the formulation of maximum margin Bayesian network classifiers to missing features and latent variables is extended and the advantage of these classifiers over classifiers with generatively optimized parameters is demonstrated in experiments.
Abstract: The Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) records hydroacoustic data to detect nuclear explosions1. This enables verification of the Comprehensive Nuclear-Test-Ban Treaty once it has entered into force. The detection can be considered as a classification problem discriminating noise-like, earthquake-caused and explosion-like data. Classification of the recorded data is challenging because it suffers from large amounts of missing features. While the classification performance of support vector machines has been evaluated, no such results for Bayesian network classifiers are available. We provide these results using classifiers with generatively and discriminatively optimized parameters and employing different imputation methods. In case of discriminatively optimized parameters, Bayesian network classifiers slightly outperform support vector machines. For optimizing the parameters discriminatively, we extend the formulation of maximum margin Bayesian network classifiers to missing features and latent variables. The advantage of these classifiers over classifiers with generatively optimized parameters is demonstrated in experiments.

3 citations

Proceedings ArticleDOI
Chunhua Tian1, Hao Zhang1, Feng Li1, Tie Liu1
22 Jul 2009
TL;DR: Rule based optimization approach is proposed in this paper and has been successfully applied in a North American airline company and can be applied to other operational planning and scheduling problems with complex business logic.
Abstract: For a multiple-segment flight of passenger aircraft, the primary objective of commodity load planning is to improve the operational efficiency of the whole journey. Unfortunately the objective can not be measured due to the unavailability of down-line stations shipment information. Traditional optimization algorithm can not be directly applied. Rule based optimization approach is proposed in this paper. High-level business logic is captured as a rule flow. Optimization algorithms are embedded as the action of the rule flow to specific decisions such as bin assign. With this approach, a loading plan can be adaptive to business changes without application code changes. Based on such design, a visual rule editor and rule engine are developed. The system has been successfully applied in a North American airline company. The rule based optimization approach and tool can also be applied to other operational planning and scheduling problems with complex business logic.

3 citations

Posted Content
TL;DR: In this article, the authors propose a novel learning paradigm for entity resolution, called gradual machine learning, which aims to enable effective machine labeling without the requirement for manual labeling effort, which is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.
Abstract: Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high-quality labeled data usually require expensive manual work, and are therefore not readily available in many real scenarios. In this paper, we propose a novel learning paradigm for ER, called gradual machine learning, which aims to enable effective machine labeling without the requirement for manual labeling effort. It begins with some easy instances in a task, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances by iterative factor graph inference. In gradual machine learning, the hard instances in a task are gradually labeled in small stages based on the estimated evidential certainty provided by the labeled easier instances. Our extensive experiments on real data have shown that the performance of the proposed approach is considerably better than its unsupervised alternatives, and highly competitive compared to the state-of-the-art supervised techniques. Using ER as a test case, we demonstrate that gradual machine learning is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.

3 citations

Proceedings ArticleDOI
11 Jun 2007
TL;DR: This tutorial will cover various generic models of the structure of complex networks, and the probabilistic dependencies among networked entities, and discuss practical examples where these models have been or could be used in order to improve understanding or to improve performance.
Abstract: Complex networks connect businesses, consumers, and the artifacts they create, such as pages, products, and accounts. Modeling these networks can help focus on the characteristics that will be useful for understanding or predicting important ecommerce phenomena, such as product demand or illicit behavior. Our tutorial will cover various generic models of (i) the structure of complex networks, and (ii) the probabilistic dependencies among networked entities. We will then discuss practical examples where the different types of models have been or could be used in order to improve understanding or to improve performance. For example, models of the structure of social networks can improve the theoretical understanding of network effects (Sundararajan, 2007). Modeling the structure of co-purchase networks can help explain demand patterns in electronic commerce. Models of networked probabilistic dependencies can improve prediction tasks including the targeting of advertisements/offers (Hill, Provost and Volinsky, 2006), the detection of illicit behavior (such as fraud), and the identification of interesting Web pages. The tutorial will be largely self-contained. Attendees will be assumed to know basic economics, probability and statistics.

3 citations

Trending Questions (3)
What is the machine learning?

Machine learning is the study of methods for programming computers to learn.

What is the machine learning ?

Machine learning is the study of methods for programming computers to learn.

What is mechine learning?

Machine learning is the study of methods for programming computers to learn.