Showing papers on "Statistical learning theory published in 2011"

PDF

Open Access

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Journal Article•DOI•

Improvements on Twin Support Vector Machines

[...]

Yuan-Hai Shao, Chunhua Zhang¹, Xiao-Bo Wang², Nai-Yang Deng•Institutions (2)

Renmin University of China¹, Tsinghua University²

01 Jun 2011-IEEE Transactions on Neural Networks

TL;DR: An improved version of the TBSVM is proposed, named twin bounded support vector machines (TBSVM), based on TWSVM, that the structural risk minimization principle is implemented by introducing the regularization term.

...read moreread less

Abstract: For classification problems, the generalized eigenvalue proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) are regarded as milestones in the development of the powerful SVMs, as they use the nonparallel hyperplane classifiers. In this brief, we propose an improved version, named twin bounded support vector machines (TBSVM), based on TWSVM. The significant advantage of our TBSVM over TWSVM is that the structural risk minimization principle is implemented by introducing the regularization term. This embodies the marrow of statistical learning theory, so this modification can improve the performance of classification. In addition, the successive overrelaxation technique is used to solve the optimization problems to speed up the training procedure. Experimental results show the effectiveness of our method in both computation time and classification accuracy, and therefore confirm the above conclusion further.

...read moreread less

476 citations

Book Chapter•DOI•

Review of Input Variable Selection Methods for Artificial Neural Networks

[...]

Robert May, Graeme C. Dandy, Holger R. Maier

11 Apr 2011

TL;DR: Ann models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN.

...read moreread less

Abstract: The choice of input variables is a fundamental, and yet crucial consideration in identifying the optimal functional form of statistical models. The task of selecting input variables is common to the development of all statistical models, and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output. In the case of parametric, or semi-parametric empirical models, the difficulty of the input variable selection task is somewhat alleviated by the a priori assumption of the functional form of the model, which is based on some physical interpretation of the underlying system or process being modelled. However, in the case of artificial neural networks (ANNs), and other similarly data-driven statistical modelling approaches, there is no such assumption made regarding the structure of the model. Instead, the input variables are selected from the available data, and the model is developed subsequently. The difficulty of selecting input variables arises due to (i) the number of available variables, which may be very large; (ii) correlations between potential input variables, which creates redundancy; and (iii) variables that have little or no predictive power. Variable subset selection has been a longstanding issue in fields of applied statistics dealing with inference and linear regression (Miller, 1984), and the advent of ANN models has only served to create new challenges in this field. The non-linearity, inherent complexity and non-parametric nature of ANN regression make it difficult to apply many existing analytical variable selection methods. The difficulty of selecting input variables is further exacerbated during ANN development, since the task of selecting inputs is often delegated to the ANN during the learning phase of development. A popular notion is that an ANN is adequately capable of identifying redundant and noise variables during training, and that the trained network will use only the salient input variables. ANN architectures can be built with arbitrary flexibility and can be successfully trained using any combination of input variables (assuming they are good predictors). Consequently, allowances are often made for a large number of input variables, with the belief that the ability to incorporate such flexibility and redundancy creates a more robust model. Such pragmatism is perhaps symptomatic of the popularisation of ANN models through machine learning, rather than statistical learning theory. ANN models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN. 1

...read moreread less

328 citations

Book Chapter•DOI•

Learning from Data

[...]

Mehmed Kantardzic¹•Institutions (1)

University of Louisville¹

05 Oct 2011

TL;DR: This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Learning Methods Common Learning Tasks Model Estimation Review Questions and Problems References for further study.

...read moreread less

Abstract: This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Learning Methods Common Learning Tasks Model Estimation Review Questions and Problems References for further study

...read moreread less

245 citations

Book Chapter•DOI•

Statistical Learning Theory: Models, Concepts, and Results

[...]

U von Luxburg¹, Bernhard Schölkopf¹•Institutions (1)

Max Planck Society¹

01 May 2011

TL;DR: The statistical learning theory as discussed by the authors is regarded as one of the most beautifully developed branches of artificial intelligence, and it provides the theoretical basis for many of today's machine learning algorithms, such as classification.

...read moreread less

Abstract: Publisher Summary Statistical learning theory is regarded as one of the most beautifully developed branches of artificial intelligence. It provides the theoretical basis for many of today's machine learning algorithms. The theory helps to explore what permits to draw valid conclusions from empirical data. This chapter provides an overview of the key ideas and insights of statistical learning theory. The statistical learning theory begins with a class of hypotheses and uses empirical data to select one hypothesis from the class. If the data generating mechanism is benign, then it is observed that the difference between the training error and test error of a hypothesis from the class is small. The statistical learning theory generally avoids metaphysical statements about aspects of the true underlying dependency, and thus is precise by referring to the difference between training and test error. The chapter also describes some other variants of machine learning.

...read moreread less

205 citations

Journal Article•DOI•

Machine learning modelling for predicting soil liquefaction susceptibility

[...]

Pijush Samui¹, T. G. Sitharam²•Institutions (2)

VIT University¹, Indian Institute of Science²

03 Jan 2011-Natural Hazards and Earth System Sciences

TL;DR: This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake and highlights the capability of the SVM over the ANN models.

...read moreread less

Abstract: . This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

...read moreread less

155 citations

Book•

Linguistic Nativism and the Poverty of the Stimulus

[...]

Alexander Clark, Shalom Lappin

25 Jan 2011

TL;DR: In this article, a theory-internal APS is proposed to determine the nature of primary Linguistic Data, which is based on the positive-evidence-only APS.

...read moreread less

Abstract: Preface. 1 Introduction: Nativism in Linguistic Theory. 1.1 Historical Development. 1.2 The Rationalist-Empiricist Debate. 1.3 Nativism and Cognitive Modularity. 1.4 Connectionism, Nonmodularity, and Antinativism. 1.5 Adaptation and the Evolution of Natural Language. 1.6 Summary and Conclusions. 2 Clarifying the Argument from the Poverty of the Stimulus. 2.1 Formulating the APS. 2.2 Empiricist Learning versus Nativist Learning. 2.3 Our Version of the APS. 2.4 A Theory-Internal APS. 2.5 Evidence for the APS: Auxiliary Inversion as a Paradigm Case. 2.6 Debate on the PLD. 2.7 Learning Theory and Indispensable Data. 2.8 A Second Empirical Case: Anaphoric One. 2.9 Summary and Conclusions. 3 The Stimulus: Determining the Nature of Primary Linguistic Data. 3.1 Primary Linguistic Data. 3.2 Negative Evidence. 3.3 Semantic, Contextual, and Extralinguistic Evidence. 3.4 Prosodic Information. 3.5 Summary and Conclusions. 4 Learning in the Limit: The Gold Paradigm. 4.1 Formal Models of Language Acquisition. 4.2 Mathematical Models of Learnability. 4.3 The Gold Paradigm of Learnability. 4.4 Critique of the Positive-Evidence-Only APS in IIL. 4.5 Proper Positive Results. 4.6 Variants of the Gold Model. 4.7 Implications of Gold's Results for Linguistic Nativism. 4.8 Summary and Conclusions. 5 Probabilistic Learning Theory for Language Acquisition. 5.1 Chomsky's View of Statistical Learning. 5.2 Basic Assumptions of Statistical Learning Theory. 5.3 Learning Distributions. 5.4 Probabilistic Versions of the IIL Framework. 5.5 PAC Learning. 5.6 Consequences of PAC Learnability. 5.7 Problems with the Standard Model. 5.8 Summary and Conclusions. 6 A Formal Model of Indirect Negative Evidence. 6.1 Introduction. 6.2. From Low Probability to Ungrammaticality. 6.3 Modeling the DDA. 6.4 Applying the Functional Lower Bound. 6.5 Summary and Conclusions. 7 Computational Complexity and Efficient Learning. 7.1 Basic Concepts of Complexity 7.2 Efficient Learning. 7.3 Negative Results. 7.4 Interpreting Hardness Results. 7.5 Summary and Conclusions. 8 Positive Results in Efficient Learning. 8.1 Regular Languages. 8.2 Distributional Methods. 8.3 Distributional Learning of Context-Free Languages. 8.4 Lattice-Based Formalisms. 8.5 Arguments against Distributional Learning. 8.6 Summary and Conclusions. 9 Grammar Induction through Implemented Machine Learning. 9.1 Supervised Learning. 9.2Unsupervised Learning. 9.3 Summary and Conclusions. 10 Parameters in Linguistic Theory and Probabilistic Language Models. 10.1 Learnability of Parametric Models of Syntax. 10.2 UG Parameters and Language Variation. 10.3 Parameters in Probabilistic Language Models. 10.4 Inferring Constraints on Hypothesis Spaces with Hierarchical Bayesian Models. 10.5 Summary and Conclusions. 11 A Brief Look at Some Biological and Psychological Evidence. 11.1 Developmental Arguments. 11.2 Genetic Factors: Inherited Language Disorders. 11.3 Experimental Learning of Artificial Languages. 11.4 Summary and Conclusions. 12 Conclusion. 12.1 Summary. 12.2 Conclusions. References. Author Index. Subject Index.

...read moreread less

151 citations

Journal Article•

An Overview on Theory and Algorithm of Support Vector Machines

[...]

QI Bing-juan

01 Jan 2011-Journal of the University of Electronic Science and Technology of China

TL;DR: The theoretical basis of support vector machines (SVM) is described systematically, the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly areums up, and the research and development prospects of SVM are pointed out.

...read moreread less

Abstract: Statistical learning theory is the statistical theory of smallsample,and it focuses on the statistical law and the nature of learning of small samples.Support vector machine is a new machine learning method based on statistical learning theory,and it has become the research field of machine learning because of its excellent performance.This paper describes the theoretical basis of support vector machines(SVM) systematically,sums up the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly,and finally points out the research and development prospects of support vector machine.

...read moreread less

144 citations

Introduction to Online Optimization

[...]

Sébastien Bubeck

14 Dec 2011

TL;DR: One of the standard and thoroughly studied models for learning is the framework of statistical learning theory as mentioned in this paper, and we start by briefly reviewing this model, which is the most widely used model for learning.

...read moreread less

Abstract: In a world where automatic data collection becomes ubiquitous, statisticians must update their paradigms to cope with new problems. Whether we discuss the Internet network, consumer data sets, or financial market, a common feature emerges: huge amounts of dynamic data that need to be understood and quickly processed. This state of affair is dramatically different from the classical statistical problems, with many observations and few variables of interest. Over the past decades, learning theory tried to address this issue. One of the standard and thoroughly studied models for learning is the framework of statistical learning theory. We start by briefly reviewing this model.

...read moreread less

137 citations

Journal Article•DOI•

Rates of convergence in active learning

[...]

Steve Hanneke

01 Feb 2011-Annals of Statistics

TL;DR: In this article, the authors studied the general problem of model selection for active learning with a nested hierarchy of hypothesis classes and proposed an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions.

...read moreread less

Abstract: We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.

...read moreread less

114 citations

Journal Article•DOI•

Evolutionary model selection in a wavelet-based support vector machine for automated seizure detection

[...]

M. Zavar¹, Saeed Rahati¹, Mohammad-R. Akbarzadeh-T², H. Ghasemifard³•Institutions (3)

Islamic Azad University¹, Ferdowsi University of Mashhad², Mashhad University of Medical Sciences³

01 Sep 2011-Expert Systems With Applications

TL;DR: An evolutionary scheme searches for optimal kernel types and parameters for automated seizure detection and considers the Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction.

...read moreread less

Abstract: Support vector machines (SVM) have in recent years been gainfully used in various pattern recognition applications. Based on statistical learning theory, this paradigm promises strong robustness to noise and generalization to unseen data. As in any classification technique, appropriate choice of the kernels and input features play an important role in SVM performance. In this study, an evolutionary scheme searches for optimal kernel types and parameters for automated seizure detection. We consider the Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction. The classification accuracy of this approach is examined by applying the MIT (Massachusetts Institute of Technology) dataset and comparing results with the SVM. The MIT-BIH dataset has the electrocardiographic (ECG) changes in patients with partial epilepsy which two types ECG beats (partial epilepsy and normal). A comparison of results shows that performance of the evolutionary scheme outweighs that of support vector machine. In the best condition, the accuracy rate of the proposed approaches reaches 100% for specificity and 96.29% for sensitivity.

...read moreread less

A Review of Technologies on Random Forests

[...]

Fang Kuang-nana

01 Jan 2011

TL;DR: This paper introduces the concept of random forest and the latest research, then provides some important aspects of applications in economics, and a summary is given in the final section.

...read moreread less

Abstract: Random Forests is a statistical learning theory,using bootsrap re-sampling method form sample sets,and then combining the tree predictors by majority voting so that each tree is grown using a new bootstrap training set.It is widely applied in medicine,bioinformatics,economics and other fields,because of its high prediction accuracy,good tolerance of noisy data,and the law of large numbers they do not overfit.In this paper we first introduce the concept of random forest and the latest research,then provide some important aspects of applications in economics,and a summary is given in the final section.

...read moreread less

Book•

An Elementary Introduction to Statistical Learning Theory

[...]

Sanjeev R. Kulkarni, Gilbert Harman

02 Aug 2011

TL;DR: An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduate levels and serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.

...read moreread less

Abstract: A thought-provoking look at statistical learning theory and its role in understanding human learning and inductive reasoningA joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Elementary Introduction to Statistical Learning Theory is a comprehensive and accessible primer on the rapidly evolving fields of statistical pattern recognition and statistical learning theory. Explaining these areas at a level and in a way that is not often found in other books on the topic, the authors present the basic theory behind contemporary machine learning and uniquely utilize its foundations as a framework for philosophical thinking about inductive inference.Promoting the fundamental goal of statistical learning, knowing what is achievable and what is not, this book demonstrates the value of a systematic methodology when used along with the needed techniques for evaluating the performance of a learning system. First, an introduction to machine learning is presented that includes brief discussions of applications such as image recognition, speech recognition, medical diagnostics, and statistical arbitrage. To enhance accessibility, two chapters on relevant aspects of probability theory are provided. Subsequent chapters feature coverage of topics such as the pattern recognition problem, optimal Bayes decision rule, the nearest neighbor rule, kernel rules, neural networks, support vector machines, and boosting.Appendices throughout the book explore the relationship between the discussed material and related topics from mathematics, philosophy, psychology, and statistics, drawing insightful connections between problems in these areas and statistical learning theory. All chapters conclude with a summary section, a set of practice questions, and a reference sections that supplies historical notes and additional resources for further study.An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduatelevels. It also serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.

...read moreread less

Proceedings Article•

Sequential Event Prediction with Association Rules

[...]

Cynthia Rudin¹, Benjamin Letham¹, Ansaf Salleb-Aouissi², Eugene Kogan³, David Madigan² - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Columbia University², Harvard University³

21 Dec 2011

TL;DR: The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis, and presents two simple algorithms that incorporate association rules, and provides generalization guarantees on these algorithms based on algorithmic stability analysis from statisticallearning theory.

...read moreread less

Abstract: We consider a supervised learning problem in which data are revealed sequentially and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules, and provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an \adjusted condence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

...read moreread less

Journal Article•DOI•

A high-dimensional Wilks phenomenon

[...]

Stéphane Boucheron¹, Pascal Massart²•Institutions (2)

Paris Diderot University¹, University of Paris-Sud²

01 Aug 2011-Probability Theory and Related Fields

TL;DR: In this article, a non-asymptotic version of the Wilks phenomenon in bounded contrast optimization procedures is introduced, where the difference between the empirical risk of the minimizer of the true risk in the model and the minimum of the empirically defined empirical risk (the excess empirical risk) satisfies a Bernstein-like inequality.

...read moreread less

Abstract: A theorem by Wilks asserts that in smooth parametric density estimation the difference between the maximum likelihood and the likelihood of the sampling distribution converges toward a Chi-square distribution where the number of degrees of freedom coincides with the model dimension. This observation is at the core of some goodness-of-fit testing procedures and of some classical model selection methods. This paper describes a non-asymptotic version of the Wilks phenomenon in bounded contrast optimization procedures. Using concentration inequalities for general functions of independent random variables, it proves that in bounded contrast minimization (as for example in Statistical Learning Theory), the difference between the empirical risk of the minimizer of the true risk in the model and the minimum of the empirical risk (the excess empirical risk) satisfies a Bernstein-like inequality where the variance term reflects the dimension of the model and the scale term reflects the noise conditions. From a mathematical statistics viewpoint, the significance of this result comes from the recent observation that when using model selection via penalization, the excess empirical risk represents a minimum penalty if non-asymptotic guarantees concerning prediction error are to be provided. From the perspective of empirical process theory, this paper describes a concentration inequality for the supremum of a bounded non-centered (actually non-positive) empirical process. Combining the now classical analysis of M-estimation (building on Talagrand’s inequality for suprema of empirical processes) and versatile moment inequalities for functions of independent random variables, this paper develops a genuine Bernstein-like inequality that seems beyond the reach of traditional tools.

...read moreread less

Journal Article•DOI•

Statistical learning theory: a tutorial

[...]

Sanjeev R. Kulkarni¹, Gilbert Harman¹•Institutions (1)

Princeton University¹

01 Nov 2011-Wiley Interdisciplinary Reviews: Computational Statistics

TL;DR: In this article, the authors provide a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning.

...read moreread less

Abstract: In this article, we provide a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning. We focus on the problem of two-class pattern classification for various reasons. This problem is rich enough to capture many of the interesting aspects that are present in the cases of more than two classes and in the problem of estimation, and many of the results can be extended to these cases. Focusing on two-class pattern classification simplifies our discussion, and yet it is directly applicable to a wide range of practical settings. We begin with a description of the two-class pattern recognition problem. We then discuss various classical and state-of-the-art approaches to this problem, with a focus on fundamental formulations, algorithms, and theoretical results. In particular, we describe nearest neighbor methods, kernel methods, multilayer perceptrons, Vapnik-Chervonenkis theory, support vector machines, and boosting. WIREs Comp Stat 2011 3 543-556 DOI: 10.1002/wics.179

...read moreread less

Journal Article•DOI•

Research on bearing life prediction based on support vector machine and its application

[...]

Chuang Sun, Zhousuo Zhang, Zhengjia He¹•Institutions (1)

Xi'an Jiaotong University¹

19 Jul 2011

TL;DR: The proposed SVM-based model for bearing life prediction is applied to life prediction of a bearing, and the result shows the proposed model is of high precision.

...read moreread less

Abstract: Life prediction of rolling element bearing is the urgent demand in engineering practice, and the effective life prediction technique is beneficial to predictive maintenance. Support vector machine (SVM) is a novel machine learning method based on statistical learning theory, and is of advantage in prediction. This paper develops SVM-based model for bearing life prediction. The inputs of the model are features of bearing vibration signal and the output is the bearing running time-bearing failure time ratio. The model is built base on a few failed bearing data, and it can fuse information of the predicted bearing. So it is of advantage to bearing life prediction in practice. The model is applied to life prediction of a bearing, and the result shows the proposed model is of high precision.

...read moreread less

Statistical learning theory: at utorial

[...]

Sanjeev R. Kulkarni, Gilbert Harman

01 Jan 2011

TL;DR: This article provides a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning, and focuses on the problem of two‐class pattern classification.

...read moreread less

Proceedings Article•

Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity

[...]

Peter Grünwald¹•Institutions (1)

Centrum Wiskunde & Informatica¹

21 Dec 2011

TL;DR: In this paper, the authors extend Bayesian MAP and MDL by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case.

...read moreread less

Abstract: We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting \safe" estimator continues to achieve good rates with wrong models. Moreover, when applied to classication and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g., Tsybakov’s conditions, and reveals new situations in which we can penalize by ( logprior)=n rather than p ( logprior)=n.

...read moreread less

Proceedings Article•DOI•

Ranking from pairs and triplets: information quality, evaluation methods and query complexity

[...]

Kira Radinsky¹, Nir Ailon¹•Institutions (1)

Technion – Israel Institute of Technology¹

09 Feb 2011

TL;DR: It is shown that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O( nC2) pairs uniformly at random, for any C > 0.

...read moreread less

Abstract: Obtaining judgments from human raters is a vital part in the design of search engines' evaluation. Today, a discrepancy exists between judgment acquisition from raters (training phase) and use of the responses for retrieval evaluation (evaluation phase). This discrepancy is due to the inconsistency between the representation of the information in both phases. During training, raters are requested to provide a relevance score for an individual result in the context of a query, whereas the evaluation is performed on ordered lists of search results, with the results' relative position (compared to other results) taken into account. As an alternative to the practice of learning to rank using relevance judgments for individual search results, more and more focus has recently been diverted to the theory and practice of learning from answers to combinatorial questions about sets of search results. That is, users, during training, are asked to rank small sets (typically pairs).Human rater responses to questions about the relevance of individual results are first compared to their responses to questions about the relevance of pairs of results. We empirically show that neither type of response can be deduced from the other, and that the added context created when results are shown together changes the raters' evaluation process. Since pairwise judgments are directly related to ranking, we conclude they are more accurate for that purpose. We go beyond pairs to show that triplets do not contain significantly more information than pairs for the purpose of measuring statistical preference. These two results establish good stability properties of pairwise comparisons for the purpose of learning to rank. We further analyze different scenarios, in which results of varying quality are added as "decoys".A recurring source of worry in papers focusing on pairwise comparison is the quadratic number of pairs in a set of results. Which preferences do we choose to solicit from paid raters? Can we provably eliminate a quadratic cost? We employ results from statistical learning theory to show that the quadratic cost can be provably eliminated in certain cases. More precisely, we show that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O(nC2) pairs uniformly at random, for any C > 0. We also present an active learning algorithm which samples the pairs adaptively, and conjecture that it provides additional improvement.

...read moreread less

Book•DOI•

An Elementary Introduction to Statistical Learning Theory: Kulkarni/Statistical Learning Theory

[...]

Sanjeev R. Kulkarni, Gilbert Harman

06 Jun 2011

Journal Article•DOI•

Least square support vector machine and relevance vector machine for evaluating seismic liquefaction potential using SPT

[...]

Pijush Samui¹•Institutions (1)

VIT University¹

01 Apr 2011-Natural Hazards

TL;DR: The study shows that RVM is the best model for the prediction of liquefaction potential of soil is based on SPT data.

...read moreread less

Abstract: The determination of liquefaction potential of soil is an imperative task in earthquake geotechnical engineering. The current research aims at proposing least square support vector machine (LSSVM) and relevance vector machine (RVM) as novel classification techniques for the determination of liquefaction potential of soil from actual standard penetration test (SPT) data. The LSSVM is a statistical learning method that has a self-contained basis of statistical learning theory and excellent learning performance. RVM is based on a Bayesian formulation. It can generalize well and provide inferences at low computational cost. Both models give probabilistic output. A comparative study has been also done between developed two models and artificial neural network model. The study shows that RVM is the best model for the prediction of liquefaction potential of soil is based on SPT data.

...read moreread less

Journal Article•DOI•

A new feature extraction method for odour classification

[...]

Bernd Ehret¹, Konstantin Safenreiter¹, Frank P Lorenz¹, Joachim Biermann¹•Institutions (1)

Fraunhofer Society¹

15 Nov 2011-Sensors and Actuators B-chemical

TL;DR: The classification results based on the new features have been compared with the classification based on a conventional method for feature extraction, and it was proved that the recognition rate of the substances used with the new feature type is higher.

...read moreread less

Abstract: A new method for real time classification of volatile chemical substance traces is presented. The method is based on electrochemical signals of an array of semiconductor gas sensors. In these sensor signals characteristic patterns of different substances are hidden. There are non-linear correlative relationships between the measured sensor signals and the chemical substances which are treated using two methods derived from statistical learning theory (Support Vector Machine – SVM, Maximum Likelihood Estimation – MLE) for the detection of the substance characteristics in the sensor signals. A key criterion for the presented pattern recognition is a newly developed type of features, which is specially adapted to the low frequency signals of semiconductor sensors. The presented features are based on the evaluation of the range of the transient response in the sensor signals in the frequency domain. To derive the new features, both real measurement data and synthetic generated signals were used. In the experiments the focus was set on the creation of reproducible sensor signals to get characteristic signal patterns. Synthetic signals were derived from a Gaussian Plume Model. With the new features, training data sets were calculated using the classification methods SVM and MLE. With these training data sets new sensor measurements may be assigned to the substances which are to be sought. The advantage of the presented method is that no feature reduction is needed and no loss of information occurs in the learning process. The classification results based on the new features have been compared with the classification based on a conventional method for feature extraction. It was proved that the recognition rate of the substances used with the new feature type is higher. The substance classification is primarily limited by the sensitivity of the semiconductor sensors, because sufficiently large sensor signals must have been provided to obtain appropriate substance patterns. At the present stage of development the method presented is suitable for the classification of substance groups, such as nitro aromatics or alcohols, but not for specific substances.

...read moreread less

Book•

The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar

[...]

Partha Niyogi

26 Sep 2011

TL;DR: The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework to analyze both kinds of learning problems.

...read moreread less

Abstract: From the Publisher: Among other topics, The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework. The first concerns the problem of learning functional mappings using neural networks, followed by learning natural language grammars in the principles and parameters tradition of Chomsky. These two learning problems are seemingly very different. Neural networks are real-valued, infinite-dimensional, continuous mappings. On the other hand, grammars are boolean-valued, finite-dimensional, discrete (symbolic) mappings. Furthermore the research communities that work in the two areas almost never overlap. The book's objective is to bridge this gap. It uses the formal techniques developed in statistical learning theory and theoretical computer science over the last decade to analyze both kinds of learning problems. By asking the same question - how much information does it take to learn - of both problems, it highlights their similarities and differences. Specific results include model selection in neural networks, active learning, language learning and evolutionary models of language change.

...read moreread less

Prediction of Rainfall Using Support Vector Machine and Relevance Vector Machine

[...]

Pijush Samui, Venkata Ravibabu Mandla, Arun Krishna, Tarun Teja

01 Jan 2011

TL;DR: This study shows the RVM is more robust model than the SVM for prediction of rainfall in Vellore (India), and uses SVM and RVM as a regression technique.

...read moreread less

Abstract: This article adopts Support Vector Machine (SVM) and Relevance Vector Machine (RVM) for prediction of rainfall in Vellore (India). SVM is firmly based on the theory of statistical learning theory. RVM is a probabilistic basis model. SVM and RVM use air temperature (T), sunshine, humidity and wind speed (V a) as input variables. This article uses SVM and RVM as a regression technique. Equations have been also developed for prediction of rainfall. The developed RVM gives variance of the predicted rainfall. This study shows the RVM is more robust model than the SVM.

...read moreread less

Proceedings Article•DOI•

Application research of support vector machine in E-Learning for personality

[...]

Wen Gong¹, Wansen Wang¹•Institutions (1)

Capital Normal University¹

13 Oct 2011

TL;DR: Through the analysis of the Emotion and recognition interaction of the personalized E-Learning based on statistical learning theory and support vector machine technology, it demonstrates the correctness and feasibility using support vectors machine to build learning styles.

...read moreread less

Abstract: In order to accurately build the learner's learning style in E-Learning, according to the needs and preferences to provide personalized learning materials and harmonious human-computer interaction environment. This paper combines Felder-Silverman learning style with support vector machine technology, and use machine learning technologies for learners to build dynamic learning style. Through the analysis of the Emotion and recognition interaction of the personalized E-Learning based on statistical learning theory and support vector machine technology, it demonstrates the correctness and feasibility using support vector machine to build learning styles. The combination of support vector machine, emotion and recognition interaction in the personalized E-Learning makes great contribution to build human-computer interaction environment.

...read moreread less

Proceedings Article•DOI•

Use of Support Vector Machine, decision tree and Naive Bayesian techniques for wind speed classification

[...]

Patil SangitaB, Surekha R. Deshmukh

01 Dec 2011

TL;DR: Support Vector Machine (SVM) which is quite a new method and used in this work can overcome deficiencies and provide efficient and powerful classification algorithms that are capable of dealing with high-dimensional input features and with theoretical bounds on the generalization error and sparseness of the solution provided by statistical learning theory.

...read moreread less

Abstract: In the latest years, pattern recognition, data mining, decision making, and networking have been used as new technologies for automatic classification problems. Classification techniques are needed to predict group membership for data instances. This entire advance tends to process raw data and extract information to obtain knowledge in order to make decisions and solve problems with less human aid. Many of the studies proposed in the literature are based on artificial intelligence (AI) techniques such as Artificial Neural Network (ANN), Fuzzy Logic (FL), Expert System (ES), etc. These techniques use feature vectors derived from disturbance waveforms to classify events. ANN has attracted a great deal of attention among these techniques because of their ability to handle noisy data and their learning capabilities. The disadvantage of neural networks is that they are notoriously slow, especially in the training phase but also in the application phase. Another significant disadvantage of neural networks is that it is very difficult to determine how the net is making its decision. Support Vector Machine (SVM) which is quite a new method and used in this work can overcome these deficiencies and provide efficient and powerful classification algorithms that are capable of dealing with high-dimensional input features and with theoretical bounds on the generalization error and sparseness of the solution provided by statistical learning theory.

...read moreread less

Journal Article•DOI•

Direct Parallel Perceptrons (DPPs): Fast Analytical Calculation of the Parallel Perceptrons Weights With Margin Control for Classification Tasks

[...]

Manuel Fernández-Delgado, Jorge Ribeiro, E. Cernadas¹, S. B. Ameneiro•Institutions (1)

University of Santiago de Compostela¹

01 Nov 2011-IEEE Transactions on Neural Networks

TL;DR: This paper proposes an analytical closed-form expression to calculate the PPs' weights for classification tasks, which directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization.

...read moreread less

Abstract: Parallel perceptrons (PPs) are very simple and efficient committee machines (a single layer of perceptrons with threshold activation functions and binary outputs, and a majority voting decision scheme), which nevertheless behave as universal approximators. The parallel delta (P-Delta) rule is an effective training algorithm, which, following the ideas of statistical learning theory used by the support vector machine (SVM), raises its generalization ability by maximizing the difference between the perceptron activations for the training patterns and the activation threshold (which corresponds to the separating hyperplane). In this paper, we propose an analytical closed-form expression to calculate the PPs' weights for classification tasks. Our method, called Direct Parallel Perceptrons (DPPs), directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization. The calculated weights globally minimize an error function which simultaneously takes into account the training error and the classification margin. Given its analytical and noniterative nature, DPPs are computationally much more efficient than other related approaches (P-Delta and SVM), and its computational complexity is linear in the input dimensionality. Therefore, DPPs are very appealing, in terms of time complexity and memory consumption, and are very easy to use for high-dimensional classification tasks. On real benchmark datasets with two and multiple classes, DPPs are competitive with SVM and other approaches but they also allow online learning and, as opposed to most of them, have no tunable parameters.

...read moreread less

Book Chapter•DOI•

An Enhanced Support Vector Machines Model for Classification and Rule Generation

[...]

Ping-Feng Pai¹, Ming-Fu Hsu¹•Institutions (1)

National Chi Nan University¹

01 Jan 2011

TL;DR: An enhanced support vector machines (ESVM) model is proposed which can integrate the abilities of data preprocessing, parameter selection and rule generation into a SVM model; and apply the ESVM model to solve real world problems.

...read moreread less

Abstract: Based on statistical learning theory, support vector machines (SVM) model is an emerging machine learning technique solving classification problems with small sampling, non-linearity and high dimension. Data preprocessing, parameter selection, and rule generation influence performance of SVM models a lot. Thus, the main purpose of this chapter is to propose an enhanced support vector machines (ESVM) model which can integrate the abilities of data preprocessing, parameter selection and rule generation into a SVM model; and apply the ESVM model to solve real world problems. The structure of this chapter is organized as follows. Section 11.1 presents the purpose of classification and the basic concept of SVM models. Sections 11.2 and 11.3 introduce data preprocessing techniques, metaheuristics for selecting SVM models. Rule extraction of SVM models is addressed in Section 11.4. An enhanced SVM scheme and numerical results are illustrated in Section 11.5 and 11.6. Conclusions are made in Section 11.7.

...read moreread less

Comparing performance and robustness of SVM and ANN for fault diagnosis in a centrifugal pump

[...]

Morteza Saberi, A. Azadeh, Abtin Nourmohammadzadeh

01 Jan 2011

TL;DR: This work considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration, and confirmed the superiority of SVM with some specific kernel functions.

...read moreread less

Abstract: Fault detection and diagnosis has an effective role for the safe operation and long life of systems. Condition monitoring is an appropriate way of the maintenance techniques which is applicable in the fault diagnosis of rotating machinery faults. We considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration. The SVM method is based on statistical learning theory (SLT) and powerful for the problem with small sampling, nonlinear and high dimension. (L.V. Ganyun et al 2005). The SVM classifying is implemented with 4 kernel functions and the results of them are compared. We use an Artificial Neural Network (ANN) as the second classifying method to have comparison among the performance of two methods. After applying the two methods to our data set we make the data set noisy and again we try our SVMs and ANN to compare their robustness in noisy conditions and the results obtained from two methods confirmed the superiority of SVM with some specific kernel functions.

...read moreread less