scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2011"


Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations


Journal ArticleDOI
TL;DR: An improved version of the TBSVM is proposed, named twin bounded support vector machines (TBSVM), based on TWSVM, that the structural risk minimization principle is implemented by introducing the regularization term.
Abstract: For classification problems, the generalized eigenvalue proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) are regarded as milestones in the development of the powerful SVMs, as they use the nonparallel hyperplane classifiers. In this brief, we propose an improved version, named twin bounded support vector machines (TBSVM), based on TWSVM. The significant advantage of our TBSVM over TWSVM is that the structural risk minimization principle is implemented by introducing the regularization term. This embodies the marrow of statistical learning theory, so this modification can improve the performance of classification. In addition, the successive overrelaxation technique is used to solve the optimization problems to speed up the training procedure. Experimental results show the effectiveness of our method in both computation time and classification accuracy, and therefore confirm the above conclusion further.

476 citations


Book ChapterDOI
11 Apr 2011
TL;DR: Ann models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN.
Abstract: The choice of input variables is a fundamental, and yet crucial consideration in identifying the optimal functional form of statistical models. The task of selecting input variables is common to the development of all statistical models, and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output. In the case of parametric, or semi-parametric empirical models, the difficulty of the input variable selection task is somewhat alleviated by the a priori assumption of the functional form of the model, which is based on some physical interpretation of the underlying system or process being modelled. However, in the case of artificial neural networks (ANNs), and other similarly data-driven statistical modelling approaches, there is no such assumption made regarding the structure of the model. Instead, the input variables are selected from the available data, and the model is developed subsequently. The difficulty of selecting input variables arises due to (i) the number of available variables, which may be very large; (ii) correlations between potential input variables, which creates redundancy; and (iii) variables that have little or no predictive power. Variable subset selection has been a longstanding issue in fields of applied statistics dealing with inference and linear regression (Miller, 1984), and the advent of ANN models has only served to create new challenges in this field. The non-linearity, inherent complexity and non-parametric nature of ANN regression make it difficult to apply many existing analytical variable selection methods. The difficulty of selecting input variables is further exacerbated during ANN development, since the task of selecting inputs is often delegated to the ANN during the learning phase of development. A popular notion is that an ANN is adequately capable of identifying redundant and noise variables during training, and that the trained network will use only the salient input variables. ANN architectures can be built with arbitrary flexibility and can be successfully trained using any combination of input variables (assuming they are good predictors). Consequently, allowances are often made for a large number of input variables, with the belief that the ability to incorporate such flexibility and redundancy creates a more robust model. Such pragmatism is perhaps symptomatic of the popularisation of ANN models through machine learning, rather than statistical learning theory. ANN models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN. 1

328 citations


Book ChapterDOI
05 Oct 2011
TL;DR: This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Learning Methods Common Learning Tasks Model Estimation Review Questions and Problems References for further study.
Abstract: This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Learning Methods Common Learning Tasks Model Estimation Review Questions and Problems References for further study

245 citations


Book ChapterDOI
01 May 2011
TL;DR: The statistical learning theory as discussed by the authors is regarded as one of the most beautifully developed branches of artificial intelligence, and it provides the theoretical basis for many of today's machine learning algorithms, such as classification.
Abstract: Publisher Summary Statistical learning theory is regarded as one of the most beautifully developed branches of artificial intelligence. It provides the theoretical basis for many of today's machine learning algorithms. The theory helps to explore what permits to draw valid conclusions from empirical data. This chapter provides an overview of the key ideas and insights of statistical learning theory. The statistical learning theory begins with a class of hypotheses and uses empirical data to select one hypothesis from the class. If the data generating mechanism is benign, then it is observed that the difference between the training error and test error of a hypothesis from the class is small. The statistical learning theory generally avoids metaphysical statements about aspects of the true underlying dependency, and thus is precise by referring to the difference between training and test error. The chapter also describes some other variants of machine learning.

205 citations


Journal ArticleDOI
TL;DR: This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake and highlights the capability of the SVM over the ANN models.
Abstract: . This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

155 citations


Book
25 Jan 2011
TL;DR: In this article, a theory-internal APS is proposed to determine the nature of primary Linguistic Data, which is based on the positive-evidence-only APS.
Abstract: Preface. 1 Introduction: Nativism in Linguistic Theory. 1.1 Historical Development. 1.2 The Rationalist-Empiricist Debate. 1.3 Nativism and Cognitive Modularity. 1.4 Connectionism, Nonmodularity, and Antinativism. 1.5 Adaptation and the Evolution of Natural Language. 1.6 Summary and Conclusions. 2 Clarifying the Argument from the Poverty of the Stimulus. 2.1 Formulating the APS. 2.2 Empiricist Learning versus Nativist Learning. 2.3 Our Version of the APS. 2.4 A Theory-Internal APS. 2.5 Evidence for the APS: Auxiliary Inversion as a Paradigm Case. 2.6 Debate on the PLD. 2.7 Learning Theory and Indispensable Data. 2.8 A Second Empirical Case: Anaphoric One. 2.9 Summary and Conclusions. 3 The Stimulus: Determining the Nature of Primary Linguistic Data. 3.1 Primary Linguistic Data. 3.2 Negative Evidence. 3.3 Semantic, Contextual, and Extralinguistic Evidence. 3.4 Prosodic Information. 3.5 Summary and Conclusions. 4 Learning in the Limit: The Gold Paradigm. 4.1 Formal Models of Language Acquisition. 4.2 Mathematical Models of Learnability. 4.3 The Gold Paradigm of Learnability. 4.4 Critique of the Positive-Evidence-Only APS in IIL. 4.5 Proper Positive Results. 4.6 Variants of the Gold Model. 4.7 Implications of Gold's Results for Linguistic Nativism. 4.8 Summary and Conclusions. 5 Probabilistic Learning Theory for Language Acquisition. 5.1 Chomsky's View of Statistical Learning. 5.2 Basic Assumptions of Statistical Learning Theory. 5.3 Learning Distributions. 5.4 Probabilistic Versions of the IIL Framework. 5.5 PAC Learning. 5.6 Consequences of PAC Learnability. 5.7 Problems with the Standard Model. 5.8 Summary and Conclusions. 6 A Formal Model of Indirect Negative Evidence. 6.1 Introduction. 6.2. From Low Probability to Ungrammaticality. 6.3 Modeling the DDA. 6.4 Applying the Functional Lower Bound. 6.5 Summary and Conclusions. 7 Computational Complexity and Efficient Learning. 7.1 Basic Concepts of Complexity 7.2 Efficient Learning. 7.3 Negative Results. 7.4 Interpreting Hardness Results. 7.5 Summary and Conclusions. 8 Positive Results in Efficient Learning. 8.1 Regular Languages. 8.2 Distributional Methods. 8.3 Distributional Learning of Context-Free Languages. 8.4 Lattice-Based Formalisms. 8.5 Arguments against Distributional Learning. 8.6 Summary and Conclusions. 9 Grammar Induction through Implemented Machine Learning. 9.1 Supervised Learning. 9.2Unsupervised Learning. 9.3 Summary and Conclusions. 10 Parameters in Linguistic Theory and Probabilistic Language Models. 10.1 Learnability of Parametric Models of Syntax. 10.2 UG Parameters and Language Variation. 10.3 Parameters in Probabilistic Language Models. 10.4 Inferring Constraints on Hypothesis Spaces with Hierarchical Bayesian Models. 10.5 Summary and Conclusions. 11 A Brief Look at Some Biological and Psychological Evidence. 11.1 Developmental Arguments. 11.2 Genetic Factors: Inherited Language Disorders. 11.3 Experimental Learning of Artificial Languages. 11.4 Summary and Conclusions. 12 Conclusion. 12.1 Summary. 12.2 Conclusions. References. Author Index. Subject Index.

151 citations


Journal Article
TL;DR: The theoretical basis of support vector machines (SVM) is described systematically, the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly areums up, and the research and development prospects of SVM are pointed out.
Abstract: Statistical learning theory is the statistical theory of smallsample,and it focuses on the statistical law and the nature of learning of small samples.Support vector machine is a new machine learning method based on statistical learning theory,and it has become the research field of machine learning because of its excellent performance.This paper describes the theoretical basis of support vector machines(SVM) systematically,sums up the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly,and finally points out the research and development prospects of support vector machine.

144 citations


14 Dec 2011
TL;DR: One of the standard and thoroughly studied models for learning is the framework of statistical learning theory as mentioned in this paper, and we start by briefly reviewing this model, which is the most widely used model for learning.
Abstract: In a world where automatic data collection becomes ubiquitous, statisticians must update their paradigms to cope with new problems. Whether we discuss the Internet network, consumer data sets, or financial market, a common feature emerges: huge amounts of dynamic data that need to be understood and quickly processed. This state of affair is dramatically different from the classical statistical problems, with many observations and few variables of interest. Over the past decades, learning theory tried to address this issue. One of the standard and thoroughly studied models for learning is the framework of statistical learning theory. We start by briefly reviewing this model.

137 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the general problem of model selection for active learning with a nested hierarchy of hypothesis classes and proposed an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions.
Abstract: We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.

114 citations


Journal ArticleDOI
TL;DR: An evolutionary scheme searches for optimal kernel types and parameters for automated seizure detection and considers the Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction.
Abstract: Support vector machines (SVM) have in recent years been gainfully used in various pattern recognition applications. Based on statistical learning theory, this paradigm promises strong robustness to noise and generalization to unseen data. As in any classification technique, appropriate choice of the kernels and input features play an important role in SVM performance. In this study, an evolutionary scheme searches for optimal kernel types and parameters for automated seizure detection. We consider the Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction. The classification accuracy of this approach is examined by applying the MIT (Massachusetts Institute of Technology) dataset and comparing results with the SVM. The MIT-BIH dataset has the electrocardiographic (ECG) changes in patients with partial epilepsy which two types ECG beats (partial epilepsy and normal). A comparison of results shows that performance of the evolutionary scheme outweighs that of support vector machine. In the best condition, the accuracy rate of the proposed approaches reaches 100% for specificity and 96.29% for sensitivity.

01 Jan 2011
TL;DR: This paper introduces the concept of random forest and the latest research, then provides some important aspects of applications in economics, and a summary is given in the final section.
Abstract: Random Forests is a statistical learning theory,using bootsrap re-sampling method form sample sets,and then combining the tree predictors by majority voting so that each tree is grown using a new bootstrap training set.It is widely applied in medicine,bioinformatics,economics and other fields,because of its high prediction accuracy,good tolerance of noisy data,and the law of large numbers they do not overfit.In this paper we first introduce the concept of random forest and the latest research,then provide some important aspects of applications in economics,and a summary is given in the final section.

Book
02 Aug 2011
TL;DR: An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduate levels and serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.
Abstract: A thought-provoking look at statistical learning theory and its role in understanding human learning and inductive reasoningA joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Elementary Introduction to Statistical Learning Theory is a comprehensive and accessible primer on the rapidly evolving fields of statistical pattern recognition and statistical learning theory. Explaining these areas at a level and in a way that is not often found in other books on the topic, the authors present the basic theory behind contemporary machine learning and uniquely utilize its foundations as a framework for philosophical thinking about inductive inference.Promoting the fundamental goal of statistical learning, knowing what is achievable and what is not, this book demonstrates the value of a systematic methodology when used along with the needed techniques for evaluating the performance of a learning system. First, an introduction to machine learning is presented that includes brief discussions of applications such as image recognition, speech recognition, medical diagnostics, and statistical arbitrage. To enhance accessibility, two chapters on relevant aspects of probability theory are provided. Subsequent chapters feature coverage of topics such as the pattern recognition problem, optimal Bayes decision rule, the nearest neighbor rule, kernel rules, neural networks, support vector machines, and boosting.Appendices throughout the book explore the relationship between the discussed material and related topics from mathematics, philosophy, psychology, and statistics, drawing insightful connections between problems in these areas and statistical learning theory. All chapters conclude with a summary section, a set of practice questions, and a reference sections that supplies historical notes and additional resources for further study.An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduatelevels. It also serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.

Proceedings Article
21 Dec 2011
TL;DR: The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis, and presents two simple algorithms that incorporate association rules, and provides generalization guarantees on these algorithms based on algorithmic stability analysis from statisticallearning theory.
Abstract: We consider a supervised learning problem in which data are revealed sequentially and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules, and provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an \adjusted condence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

Journal ArticleDOI
TL;DR: In this article, a non-asymptotic version of the Wilks phenomenon in bounded contrast optimization procedures is introduced, where the difference between the empirical risk of the minimizer of the true risk in the model and the minimum of the empirically defined empirical risk (the excess empirical risk) satisfies a Bernstein-like inequality.
Abstract: A theorem by Wilks asserts that in smooth parametric density estimation the difference between the maximum likelihood and the likelihood of the sampling distribution converges toward a Chi-square distribution where the number of degrees of freedom coincides with the model dimension. This observation is at the core of some goodness-of-fit testing procedures and of some classical model selection methods. This paper describes a non-asymptotic version of the Wilks phenomenon in bounded contrast optimization procedures. Using concentration inequalities for general functions of independent random variables, it proves that in bounded contrast minimization (as for example in Statistical Learning Theory), the difference between the empirical risk of the minimizer of the true risk in the model and the minimum of the empirical risk (the excess empirical risk) satisfies a Bernstein-like inequality where the variance term reflects the dimension of the model and the scale term reflects the noise conditions. From a mathematical statistics viewpoint, the significance of this result comes from the recent observation that when using model selection via penalization, the excess empirical risk represents a minimum penalty if non-asymptotic guarantees concerning prediction error are to be provided. From the perspective of empirical process theory, this paper describes a concentration inequality for the supremum of a bounded non-centered (actually non-positive) empirical process. Combining the now classical analysis of M-estimation (building on Talagrand’s inequality for suprema of empirical processes) and versatile moment inequalities for functions of independent random variables, this paper develops a genuine Bernstein-like inequality that seems beyond the reach of traditional tools.

Journal ArticleDOI
TL;DR: In this article, the authors provide a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning.
Abstract: In this article, we provide a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning. We focus on the problem of two-class pattern classification for various reasons. This problem is rich enough to capture many of the interesting aspects that are present in the cases of more than two classes and in the problem of estimation, and many of the results can be extended to these cases. Focusing on two-class pattern classification simplifies our discussion, and yet it is directly applicable to a wide range of practical settings. We begin with a description of the two-class pattern recognition problem. We then discuss various classical and state-of-the-art approaches to this problem, with a focus on fundamental formulations, algorithms, and theoretical results. In particular, we describe nearest neighbor methods, kernel methods, multilayer perceptrons, Vapnik-Chervonenkis theory, support vector machines, and boosting. WIREs Comp Stat 2011 3 543-556 DOI: 10.1002/wics.179

Journal ArticleDOI
19 Jul 2011
TL;DR: The proposed SVM-based model for bearing life prediction is applied to life prediction of a bearing, and the result shows the proposed model is of high precision.
Abstract: Life prediction of rolling element bearing is the urgent demand in engineering practice, and the effective life prediction technique is beneficial to predictive maintenance. Support vector machine (SVM) is a novel machine learning method based on statistical learning theory, and is of advantage in prediction. This paper develops SVM-based model for bearing life prediction. The inputs of the model are features of bearing vibration signal and the output is the bearing running time-bearing failure time ratio. The model is built base on a few failed bearing data, and it can fuse information of the predicted bearing. So it is of advantage to bearing life prediction in practice. The model is applied to life prediction of a bearing, and the result shows the proposed model is of high precision.

01 Jan 2011
TL;DR: This article provides a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning, and focuses on the problem of two‐class pattern classification.
Abstract: In this article, we provide a tutorial overview of some aspects of statistical learning theory, which also goes by other names such as statistical pattern recognition, nonparametric classification and estimation, and supervised learning. We focus on the problem of two-class pattern classification for various reasons. This problem is rich enough to capture many of the interesting aspects that are present in the cases of more than two classes and in the problem of estimation, and many of the results can be extended to these cases. Focusing on two-class pattern classification simplifies our discussion, and yet it is directly applicable to a wide range of practical settings. We begin with a description of the two-class pattern recognition problem. We then discuss various classical and state-of-the-art approaches to this problem, with a focus on fundamental formulations, algorithms, and theoretical results. In particular, we describe nearest neighbor methods, kernel methods, multilayer perceptrons, Vapnik‐Chervonenkis theory, support vector machines,

Proceedings Article
21 Dec 2011
TL;DR: In this paper, the authors extend Bayesian MAP and MDL by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case.
Abstract: We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting \safe" estimator continues to achieve good rates with wrong models. Moreover, when applied to classication and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g., Tsybakov’s conditions, and reveals new situations in which we can penalize by ( logprior)=n rather than p ( logprior)=n.

Proceedings ArticleDOI
09 Feb 2011
TL;DR: It is shown that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O( nC2) pairs uniformly at random, for any C > 0.
Abstract: Obtaining judgments from human raters is a vital part in the design of search engines' evaluation. Today, a discrepancy exists between judgment acquisition from raters (training phase) and use of the responses for retrieval evaluation (evaluation phase). This discrepancy is due to the inconsistency between the representation of the information in both phases. During training, raters are requested to provide a relevance score for an individual result in the context of a query, whereas the evaluation is performed on ordered lists of search results, with the results' relative position (compared to other results) taken into account. As an alternative to the practice of learning to rank using relevance judgments for individual search results, more and more focus has recently been diverted to the theory and practice of learning from answers to combinatorial questions about sets of search results. That is, users, during training, are asked to rank small sets (typically pairs).Human rater responses to questions about the relevance of individual results are first compared to their responses to questions about the relevance of pairs of results. We empirically show that neither type of response can be deduced from the other, and that the added context created when results are shown together changes the raters' evaluation process. Since pairwise judgments are directly related to ranking, we conclude they are more accurate for that purpose. We go beyond pairs to show that triplets do not contain significantly more information than pairs for the purpose of measuring statistical preference. These two results establish good stability properties of pairwise comparisons for the purpose of learning to rank. We further analyze different scenarios, in which results of varying quality are added as "decoys".A recurring source of worry in papers focusing on pairwise comparison is the quadratic number of pairs in a set of results. Which preferences do we choose to solicit from paid raters? Can we provably eliminate a quadratic cost? We employ results from statistical learning theory to show that the quadratic cost can be provably eliminated in certain cases. More precisely, we show that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O(nC2) pairs uniformly at random, for any C > 0. We also present an active learning algorithm which samples the pairs adaptively, and conjecture that it provides additional improvement.


Journal ArticleDOI
Pijush Samui1
TL;DR: The study shows that RVM is the best model for the prediction of liquefaction potential of soil is based on SPT data.
Abstract: The determination of liquefaction potential of soil is an imperative task in earthquake geotechnical engineering. The current research aims at proposing least square support vector machine (LSSVM) and relevance vector machine (RVM) as novel classification techniques for the determination of liquefaction potential of soil from actual standard penetration test (SPT) data. The LSSVM is a statistical learning method that has a self-contained basis of statistical learning theory and excellent learning performance. RVM is based on a Bayesian formulation. It can generalize well and provide inferences at low computational cost. Both models give probabilistic output. A comparative study has been also done between developed two models and artificial neural network model. The study shows that RVM is the best model for the prediction of liquefaction potential of soil is based on SPT data.

Journal ArticleDOI
TL;DR: The classification results based on the new features have been compared with the classification based on a conventional method for feature extraction, and it was proved that the recognition rate of the substances used with the new feature type is higher.
Abstract: A new method for real time classification of volatile chemical substance traces is presented. The method is based on electrochemical signals of an array of semiconductor gas sensors. In these sensor signals characteristic patterns of different substances are hidden. There are non-linear correlative relationships between the measured sensor signals and the chemical substances which are treated using two methods derived from statistical learning theory (Support Vector Machine – SVM, Maximum Likelihood Estimation – MLE) for the detection of the substance characteristics in the sensor signals. A key criterion for the presented pattern recognition is a newly developed type of features, which is specially adapted to the low frequency signals of semiconductor sensors. The presented features are based on the evaluation of the range of the transient response in the sensor signals in the frequency domain. To derive the new features, both real measurement data and synthetic generated signals were used. In the experiments the focus was set on the creation of reproducible sensor signals to get characteristic signal patterns. Synthetic signals were derived from a Gaussian Plume Model. With the new features, training data sets were calculated using the classification methods SVM and MLE. With these training data sets new sensor measurements may be assigned to the substances which are to be sought. The advantage of the presented method is that no feature reduction is needed and no loss of information occurs in the learning process. The classification results based on the new features have been compared with the classification based on a conventional method for feature extraction. It was proved that the recognition rate of the substances used with the new feature type is higher. The substance classification is primarily limited by the sensitivity of the semiconductor sensors, because sufficiently large sensor signals must have been provided to obtain appropriate substance patterns. At the present stage of development the method presented is suitable for the classification of substance groups, such as nitro aromatics or alcohols, but not for specific substances.

Book
26 Sep 2011
TL;DR: The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework to analyze both kinds of learning problems.
Abstract: From the Publisher: Among other topics, The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework. The first concerns the problem of learning functional mappings using neural networks, followed by learning natural language grammars in the principles and parameters tradition of Chomsky. These two learning problems are seemingly very different. Neural networks are real-valued, infinite-dimensional, continuous mappings. On the other hand, grammars are boolean-valued, finite-dimensional, discrete (symbolic) mappings. Furthermore the research communities that work in the two areas almost never overlap. The book's objective is to bridge this gap. It uses the formal techniques developed in statistical learning theory and theoretical computer science over the last decade to analyze both kinds of learning problems. By asking the same question - how much information does it take to learn - of both problems, it highlights their similarities and differences. Specific results include model selection in neural networks, active learning, language learning and evolutionary models of language change.

01 Jan 2011
TL;DR: This study shows the RVM is more robust model than the SVM for prediction of rainfall in Vellore (India), and uses SVM and RVM as a regression technique.
Abstract: This article adopts Support Vector Machine (SVM) and Relevance Vector Machine (RVM) for prediction of rainfall in Vellore (India). SVM is firmly based on the theory of statistical learning theory. RVM is a probabilistic basis model. SVM and RVM use air temperature (T), sunshine, humidity and wind speed (V a) as input variables. This article uses SVM and RVM as a regression technique. Equations have been also developed for prediction of rainfall. The developed RVM gives variance of the predicted rainfall. This study shows the RVM is more robust model than the SVM.

Proceedings ArticleDOI
13 Oct 2011
TL;DR: Through the analysis of the Emotion and recognition interaction of the personalized E-Learning based on statistical learning theory and support vector machine technology, it demonstrates the correctness and feasibility using support vectors machine to build learning styles.
Abstract: In order to accurately build the learner's learning style in E-Learning, according to the needs and preferences to provide personalized learning materials and harmonious human-computer interaction environment. This paper combines Felder-Silverman learning style with support vector machine technology, and use machine learning technologies for learners to build dynamic learning style. Through the analysis of the Emotion and recognition interaction of the personalized E-Learning based on statistical learning theory and support vector machine technology, it demonstrates the correctness and feasibility using support vector machine to build learning styles. The combination of support vector machine, emotion and recognition interaction in the personalized E-Learning makes great contribution to build human-computer interaction environment.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: Support Vector Machine (SVM) which is quite a new method and used in this work can overcome deficiencies and provide efficient and powerful classification algorithms that are capable of dealing with high-dimensional input features and with theoretical bounds on the generalization error and sparseness of the solution provided by statistical learning theory.
Abstract: In the latest years, pattern recognition, data mining, decision making, and networking have been used as new technologies for automatic classification problems. Classification techniques are needed to predict group membership for data instances. This entire advance tends to process raw data and extract information to obtain knowledge in order to make decisions and solve problems with less human aid. Many of the studies proposed in the literature are based on artificial intelligence (AI) techniques such as Artificial Neural Network (ANN), Fuzzy Logic (FL), Expert System (ES), etc. These techniques use feature vectors derived from disturbance waveforms to classify events. ANN has attracted a great deal of attention among these techniques because of their ability to handle noisy data and their learning capabilities. The disadvantage of neural networks is that they are notoriously slow, especially in the training phase but also in the application phase. Another significant disadvantage of neural networks is that it is very difficult to determine how the net is making its decision. Support Vector Machine (SVM) which is quite a new method and used in this work can overcome these deficiencies and provide efficient and powerful classification algorithms that are capable of dealing with high-dimensional input features and with theoretical bounds on the generalization error and sparseness of the solution provided by statistical learning theory.

Journal ArticleDOI
TL;DR: This paper proposes an analytical closed-form expression to calculate the PPs' weights for classification tasks, which directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization.
Abstract: Parallel perceptrons (PPs) are very simple and efficient committee machines (a single layer of perceptrons with threshold activation functions and binary outputs, and a majority voting decision scheme), which nevertheless behave as universal approximators. The parallel delta (P-Delta) rule is an effective training algorithm, which, following the ideas of statistical learning theory used by the support vector machine (SVM), raises its generalization ability by maximizing the difference between the perceptron activations for the training patterns and the activation threshold (which corresponds to the separating hyperplane). In this paper, we propose an analytical closed-form expression to calculate the PPs' weights for classification tasks. Our method, called Direct Parallel Perceptrons (DPPs), directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization. The calculated weights globally minimize an error function which simultaneously takes into account the training error and the classification margin. Given its analytical and noniterative nature, DPPs are computationally much more efficient than other related approaches (P-Delta and SVM), and its computational complexity is linear in the input dimensionality. Therefore, DPPs are very appealing, in terms of time complexity and memory consumption, and are very easy to use for high-dimensional classification tasks. On real benchmark datasets with two and multiple classes, DPPs are competitive with SVM and other approaches but they also allow online learning and, as opposed to most of them, have no tunable parameters.

Book ChapterDOI
01 Jan 2011
TL;DR: An enhanced support vector machines (ESVM) model is proposed which can integrate the abilities of data preprocessing, parameter selection and rule generation into a SVM model; and apply the ESVM model to solve real world problems.
Abstract: Based on statistical learning theory, support vector machines (SVM) model is an emerging machine learning technique solving classification problems with small sampling, non-linearity and high dimension. Data preprocessing, parameter selection, and rule generation influence performance of SVM models a lot. Thus, the main purpose of this chapter is to propose an enhanced support vector machines (ESVM) model which can integrate the abilities of data preprocessing, parameter selection and rule generation into a SVM model; and apply the ESVM model to solve real world problems. The structure of this chapter is organized as follows. Section 11.1 presents the purpose of classification and the basic concept of SVM models. Sections 11.2 and 11.3 introduce data preprocessing techniques, metaheuristics for selecting SVM models. Rule extraction of SVM models is addressed in Section 11.4. An enhanced SVM scheme and numerical results are illustrated in Section 11.5 and 11.6. Conclusions are made in Section 11.7.

01 Jan 2011
TL;DR: This work considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration, and confirmed the superiority of SVM with some specific kernel functions.
Abstract: Fault detection and diagnosis has an effective role for the safe operation and long life of systems. Condition monitoring is an appropriate way of the maintenance techniques which is applicable in the fault diagnosis of rotating machinery faults. We considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration. The SVM method is based on statistical learning theory (SLT) and powerful for the problem with small sampling, nonlinear and high dimension. (L.V. Ganyun et al 2005). The SVM classifying is implemented with 4 kernel functions and the results of them are compared. We use an Artificial Neural Network (ANN) as the second classifying method to have comparison among the performance of two methods. After applying the two methods to our data set we make the data set noisy and again we try our SVMs and ANN to compare their robustness in noisy conditions and the results obtained from two methods confirmed the superiority of SVM with some specific kernel functions.