Showing papers on "Statistical learning theory published in 2008"

PDF

Open Access

Journal Article•DOI•

Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China

[...]

X. Yao¹, L.G. Tham², Fuchu Dai¹•Institutions (2)

Chinese Academy of Sciences¹, University of Hong Kong²

01 Nov 2008-Geomorphology

TL;DR: An overview of the SVM, both one-class and two-class SVM methods, is first presented followed by its use in landslide susceptibility mapping, where it is concluded that two- class SVM possesses better prediction efficiency than logistic regression and one- Class SVM.

...read moreread less

450 citations

Journal Article•DOI•

Multispectral landuse classification using neural networks and support vector machines: one or the other, or both?

[...]

Barnali M. Dixon¹, N. Candade¹•Institutions (1)

University of South Florida¹

15 Feb 2008-Journal of remote sensing

TL;DR: This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote‐sensing image classification.

...read moreread less

Abstract: Land use classification is an important part of many remote sensing applications. A lot of research has gone into the application of statistical and neural network classifiers to remote-sensing images. This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote-sensing image classification. Standard classifiers such as Artificial Neural Network (ANN) need a number of training samples that exponentially increase with the dimension of the input feature space. With a limited number of training samples, the classification rate thus decreases as the dimensionality increases. SVMs are independent of the dimensionality of feature space as the main idea behind this classification technique is to separate the classes with a surface that maximizes the margin between them, using boundary pixels to create the decision surface. Results from SVMs are compared with traditional Maximum Likelihood Classification (MLC) and an ANN classifier. The findings suggest that the ANN and SVM classifiers perform better than the traditional MLC. The SVM and the ANN show comparable results. However, accuracy is dependent on factors such as the number of hidden nodes (in the case of ANN) and kernel parameters (in the case of SVM). The training time taken by the SVM is several magnitudes less.

...read moreread less

276 citations

Book•DOI•

Recent Advances in Learning and Control

[...]

Vincent D. Blondel, Stephen Boyd, Hidenori Kimura

01 Jan 2008

TL;DR: This paper presents Statistical Learning Theory, a Pack-based Strategy for Uncertain Feasibility and Optimization Problems, and Behaviors Described by Rational Symbols and the Parametrization of the Stabilizing Controllers.

...read moreread less

Abstract: Statistical Learning Theory: A Pack-based Strategy for Uncertain Feasibility and Optimization Problems.- UAV Formation Control: Theory and Application.- Electrical and Mechanical Passive Network Synthesis.- Output Synchronization of Nonlinear Systems with Relative Degree One.- On the Computation of Optimal Transport Maps Using Gradient Flows and Multiresolution Analysis.- Realistic Anchor Positioning for Sensor Localization.- Graph Implementations for Nonsmooth Convex Programs.- When Is a Linear Continuous-time System Easy or Hard to Control in Practice?.- Metrics and Morphing of Power Spectra.- A New Type of Neural Computation.- Getting Mobile Autonomous Robots to Form a Prescribed Geometric Arrangement.- Convex Optimization in Infinite Dimensional Spaces.- The Servomechanism Problem for SISO Positive LTI Systems.- Passivity-based Stability of Interconnection Structures.- Identification of Linear Continuous-time Systems Based on Iterative Learning Control.- A Pontryagin Maximum Principle for Systems of Flows.- Safe Operation and Control of Diesel Particulate Filters Using Level Set Methods.- Robust Control of Smart Material-based Actuators.- Behaviors Described by Rational Symbols and the Parametrization of the Stabilizing Controllers.

...read moreread less

262 citations

Journal Article•DOI•

Approximation and learning by greedy algorithms

[...]

Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, Ronald A. DeVore

12 Mar 2008-arXiv: Statistics Theory

TL;DR: This work improves on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm, and proves convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary.

...read moreread less

Abstract: We consider the problem of approximating a given element $f$ from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118--2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

...read moreread less

248 citations

Journal Article•DOI•

Approximation and learning by greedy algorithms

[...]

Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, Ronald A. DeVore

01 Feb 2008-Annals of Statistics

TL;DR: In this article, the authors consider the problem of approximating a given element f from a Hilbert space by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory.

...read moreread less

Abstract: We consider the problem of approximating a given element f from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118–2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

...read moreread less

239 citations

Journal Article•DOI•

An Objective Analysis of Support Vector Machine Based Classification for Remote Sensing

[...]

Thomas Oommen¹, Debasmita Misra², Navin K. C. Twarakavi³, Anupma Prakash², Bhaskar Sahoo², Sukumar Bandopadhyay² - Show less +2 more•Institutions (3)

Tufts University¹, University of Alaska Fairbanks², University of California, Riverside³

14 Mar 2008-Mathematical Geosciences

TL;DR: A comparative analysis of SVC with the Maximum Likelihood Classification (MLC) method, which is the most popular conventional supervised classification technique, illustrated that SVC improved the classification accuracy, was robust and did not suffer from dimensionality issues such as the Hughes Effect.

...read moreread less

Abstract: Accurate thematic classification is one of the most commonly desired outputs from remote sensing images Recent research efforts to improve the reliability and accuracy of image classification have led to the introduction of the Support Vector Classification (SVC) scheme SVC is a new generation of supervised learning method based on the principle of statistical learning theory, which is designed to decrease uncertainty in the model structure and the fitness of data We have presented a comparative analysis of SVC with the Maximum Likelihood Classification (MLC) method, which is the most popular conventional supervised classification technique SVC is an optimization technique in which the classification accuracy heavily relies on identifying the optimal parameters Using a case study, we verify a method to obtain these optimal parameters such that SVC can be applied efficiently We use multispectral and hyperspectral images to develop thematic classes of known lithologic units in order to compare the classification accuracy of both the methods We have varied the training to testing data proportions to assess the relative robustness and the optimal training sample requirement of both the methods to achieve comparable levels of accuracy The results of our study illustrated that SVC improved the classification accuracy, was robust and did not suffer from dimensionality issues such as the Hughes Effect

...read moreread less

185 citations

Journal Article•DOI•

Classifying Single-Trial EEG During Motor Imagery by Iterative Spatio-Spectral Patterns Learning (ISSPL)

[...]

Wei Wu¹, Xiaorong Gao¹, Bo Hong¹, Shangkai Gao¹•Institutions (1)

Tsinghua University¹

16 May 2008-IEEE Transactions on Biomedical Engineering

TL;DR: Experimental results on two datasets show that the proposed algorithm can correctly identify the discriminative frequency bands, demonstrating the algorithm's superiority over contemporary approaches in classification performance.

...read moreread less

Abstract: In most current motor-imagery-based brain-computer interfaces (BCIs), machine learning is carried out in two consecutive stages: feature extraction and feature classification. Feature extraction has focused on automatic learning of spatial filters, with little or no attention being paid to optimization of parameters for temporal filters that still require time-consuming, ad hoc manual tuning. In this paper, we present a new algorithm termed iterative spatio-spectral patterns learning (ISSPL) that employs statistical learning theory to perform automatic learning of spatio-spectral filters. In ISSPL, spectral filters and the classifier are simultaneously parameterized for optimization to achieve good generalization performance. A detailed derivation and theoretical analysis of ISSPL are given. Experimental results on two datasets show that the proposed algorithm can correctly identify the discriminative frequency bands, demonstrating the algorithm's superiority over contemporary approaches in classification performance.

...read moreread less

168 citations

Journal Article•DOI•

Property Testing: A Learning Theory Perspective

[...]

Dana Ron¹•Institutions (1)

Tel Aviv University¹

01 Mar 2008

TL;DR: This survey takes the learning-theory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community, and covers results forTesting algebraic properties of function such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more.

...read moreread less

Abstract: Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be performed by observing only a very small part of the object, in particular by querying the object, and the algorithm is allowed a small failure probability. One view of property testing is as a relaxation of learning the object (obtaining an approximate representation of the object). Thus property testing algorithms can serve as a preliminary step to learning. That is, they can be applied in order to select, very efficiently, what hypothesis class to use for learning. This survey takes the learning-theory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community. In particular, we cover results for testing algebraic properties of functions such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more.

...read moreread less

157 citations

Journal Article•DOI•

Support Vector classifiers for Land Cover Classification

[...]

Mahesh Pal, Paul M. Mather¹•Institutions (1)

University of Nottingham¹

15 Feb 2008-arXiv: Neural and Evolutionary Computing

TL;DR: Results suggest that the infinite ensemble approach provides a significant increase in the classification accuracy in comparison to the radial basis function kernel‐based support vector machines.

...read moreread less

Abstract: Much research effort in the past ten years has been devoted to analysis of theperformance of artificial neural networks in image classification (Benediktsson etal., 1990; Heermann and Khazenie, 1992). The preferred algorithm is feed-forward multi-layer perceptron using back-propagation, due to its ability to handleany kind of numerical data, and to its freedom from distributional assumptions.Although neural networks may generally be used to classify data at least asaccurately as statistical classification approaches a number of studies havereported that users of neural classifiers have problems in setting the choice ofvarious parameters during training (Wilkinson, 1997). The choice of architectureof the network, the sample size for training, learning algorithms, and number ofiterations required for training are some of these problems. A new classificationsystem based on statistical learning theory (Vapnik, 1995), called the supportvector machine has recently been applied to the problem of remote sensing dataclassification (Huang et al., 2002; Zhu and Blumberg, 2002; Gualtieri and Cromp,1998). This technique is said to be independent of the dimensionality of featurespace as the main idea behind this classification technique is to separate theclasses with a surface that maximise the margin between them, using boundarypixels to create the decision surface. The data points that are closest to thehyperplane are termed "support vectors". The number of support vectors is thussmall as they are points close to the class boundaries (Vapnik, 1995). One majoradvantage of support vector classifiers is the use of quadratic programming, whichprovides global minima only. The absence of local minima is a significantdifference from the neural network classifiers.

...read moreread less

89 citations

Journal Article•DOI•

Feature selection for the SVM: An application to hypertension diagnosis

[...]

Chao-Ton Su¹, Chien-Hsin Yang²•Institutions (2)

National Tsing Hua University¹, National Chiao Tung University²

01 Jan 2008-Expert Systems With Applications

TL;DR: Implementation results show that the performance of combined kernel approach is better than the single kernel approach, and SVM based method was found to have a better performance based on two epidemiological indices such as sensitivity and specificity.

...read moreread less

Abstract: A support vector machine (SVM) is a novel classifier based on the statistical learning theory. To increase the performance of classification, the approach of SVM with kernel is usually used in classification tasks. In this study, we first attempted to investigate the performance of SVM with kernel. Several kernel functions, polynomial, RBF, summation, and multiplication were employed in the SVM and the feature selection approach developed [Hermes, L., & Buhmann, J. M. (2000). Feature selection for support vector machines. In Proceedings of the international conference on pattern recognition (ICPR'00) (Vol. 2, pp. 716-719)] was utilized to determine the important features. Then, a hypertension diagnosis case was implemented and 13 anthropometrical factors related to hypertension were selected. Implementation results show that the performance of combined kernel approach is better than the single kernel approach. Compared with backpropagation neural network method, SVM based method was found to have a better performance based on two epidemiological indices such as sensitivity and specificity.

...read moreread less

86 citations

Journal Article•DOI•

A learning theory approach to system identification and stochastic adaptive control

[...]

Mathukumalli Vidyasagar¹, Rajeeva L. Karandikar²•Institutions (2)

Tata Consultancy Services¹, Indian Statistical Institute²

01 Mar 2008-Journal of Process Control

TL;DR: This chapter presents an approach to system identification based on viewing identification as a problem in statistical learning theory, and a result is derived showing that in the case of systems with fading memory, it is possible to combine standard results in statisticallearning theory with some fading memory arguments to obtain finite time estimates of the desired kind.

...read moreread less

Journal Article•DOI•

Vapnik's learning theory applied to energy consumption forecasts in residential buildings

[...]

Florence Lai¹, Frédéric Magoulès¹, Fred Lherminier•Institutions (1)

École Centrale Paris¹

01 Oct 2008-International Journal of Computer Mathematics

TL;DR: An introduction to the use of support vector (SV) learning machines used as a data mining tool applied to buildings energy consumption data from a measurement campaign and introduces a perturbation in one of the influencing variables to detect a model change.

...read moreread less

Abstract: For the purpose of energy conservation, we present in this paper an introduction to the use of support vector (SV) learning machines used as a data mining tool applied to buildings energy consumption data from a measurement campaign. Experiments using a SVM-based software tool for the prediction of the electrical consumption of a residential building is performed. The data included 1 year and 3 months of daily recordings of electrical consumption and climate data such as temperatures and humidities. The learning stage was done for a first part of the data and the predictions were done for the last month. Performances of the model and contributions of significant factors were also derived. The results show good performances for the model. The second experiment consists of model re-estimations on a 1-year daily recording dataset lagged at 1-day time intervals in such a way that we derive temporal series of influencing factors weights along with model performance criteria. Finally, we introduce a perturbation in one of the influencing variables to detect a model change. Comparing contributing weights with and without the perturbation, the sudden contributing weight change could have diagnosed the perturbation. The important point is the ease of the production of many models. This method announces future research work in the exploitation of possibilities of this 'model factory'.

...read moreread less

Journal Article•DOI•

Eliciting Consumer Preferences Using Robust Adaptive Choice Questionnaires

[...]

Jacob Abernethy, Theodoros Evgeniou¹, Olivier Toubia², Jean-Philippe Vert³•Institutions (3)

INSEAD¹, Columbia University², PSL Research University³

01 Feb 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a framework for designing adaptive choice-based conjoint questionnaires that are robust to response error and formalizes within this framework the polyhedral methods recently proposed in marketing.

...read moreread less

Abstract: We propose a framework for designing adaptive choice-based conjoint questionnaires that are robust to response error. It is developed based on a combination of experimental design and statistical learning theory principles. We implement and test a specific case of this framework using Regularization Networks. We also formalize within this framework the polyhedral methods recently proposed in marketing. We use simulations as well as an online market research experiment with 500 participants to compare the proposed method to benchmark methods. Both experiments show that the proposed adaptive questionnaires outperform existing ones in most cases. This work also indicates the potential of using machine learning methods in marketing.

...read moreread less

Journal Article•DOI•

Improved Risk Tail Bounds for On-Line Algorithms

[...]

Nicolò Cesa-Bianchi, Claudio Gentile

01 Jan 2008-IEEE Transactions on Information Theory

TL;DR: In this paper, tight bounds on the risk of models in the ensemble generated by incremental training of an arbitrary learning algorithm are derived based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments, and improves on previous bounds published by the same authors.

...read moreread less

Abstract: Tight bounds are derived on the risk of models in the ensemble generated by incremental training of an arbitrary learning algorithm. The result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments, and improves on previous bounds published by the same authors.

...read moreread less

Proceedings Article•

Human Active Learning

[...]

Rui Castro¹, Charles W. Kalish², Robert Nowak², Ruichen Qian², Timothy T. Rogers², Xiaojin Zhu² - Show less +2 more•Institutions (2)

Columbia University¹, University of Wisconsin-Madison²

08 Dec 2008

TL;DR: This first quantitative study comparing human category learning in active versus passive settings indicates that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory.

...read moreread less

Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings.

...read moreread less

Journal Article•DOI•

Prediction of friction capacity of driven piles in clay using the support vector machine

[...]

Pijush Samui¹•Institutions (1)

Indian Institute of Science¹

14 Mar 2008-Canadian Geotechnical Journal

TL;DR: The study shows that SVM has the potential to be a useful and practical tool for prediction of friction capacity of driven piles in clay and is proven to be better than ANN model.

...read moreread less

Abstract: The support vector machine (SVM) is an emerging machine learning technique where prediction error and model complexity are simultaneously minimized This paper examines the potential of SVM to predict the friction capacity of driven piles in clay This SVM is firmly based on the statistical learning theory and uses the regression technique by introducing accuracy (e) insensitive loss function The results are compared with those from a widely used artificial neural network (ANN) model Overall, the SVM showed good performance and is proven to be better than ANN model A sensitivity analysis has been also performed to investigate the importance of the input parameters The study shows that SVM has the potential to be a useful and practical tool for prediction of friction capacity of driven piles in clay

...read moreread less

Proceedings Article•DOI•

Robust learning of discriminative projection for multicategory classification on the Stiefel manifold

[...]

Duc-Son Pham¹, Svetha Venkatesh¹•Institutions (1)

Curtin University¹

23 Jun 2008

TL;DR: A framework formulated under statistical learning theory that facilitates robust learning of a discriminative projection is proposed and the experimental results suggest that the proposed method outperforms some recent regularized techniques when the number of training samples is small.

...read moreread less

Abstract: Learning a robust projection with a small number of training samples is still a challenging problem in face recognition, especially when the unseen faces have extreme variation in pose, illumination, and facial expression. To address this problem, we propose a framework formulated under statistical learning theory that facilitates robust learning of a discriminative projection. Dimensionality reduction using the projection matrix is combined with a linear classifier in the regularized framework of lasso regression. The projection matrix in conjunction with the classifier parameters are then found by solving an optimization problem over the Stiefel manifold. The experimental results on standard face databases suggest that the proposed method outperforms some recent regularized techniques when the number of training samples is small.

...read moreread less

Journal Article•DOI•

Discriminant Learning Analysis

[...]

Jing Peng¹, Peng Zhang, Norbert Riedel²•Institutions (2)

Montclair State University¹, Tulane University²

01 Dec 2008

TL;DR: This paper proposes to address the small sample size (SSS) problem in the framework of statistical learning theory by compute linear discriminants by regularized least squares regression, where the singularity problem is resolved.

...read moreread less

Abstract: Linear discriminant analysis (LDA) as a dimension reduction method is widely used in classification such as face recognition. However, it suffers from the small sample size (SSS) problem when data dimensionality is greater than the sample size, as in images where features are high dimensional and correlated. In this paper, we propose to address the SSS problem in the framework of statistical learning theory. We compute linear discriminants by regularized least squares regression, where the singularity problem is resolved. The resulting discriminants are complete in that they include both regular and irregular information. We show that our proposal and its nonlinear extension belong to the same framework where powerful classifiers such as support vector machines are formulated. In addition, our approach allows us to establish an error bound for LDA. Finally, our experiments validate our theoretical analysis results.

...read moreread less

Electrical Power Load Forecasting using Hybrid Self-Organizing Maps and Support Vector Machines

[...]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed

01 Jan 2008

TL;DR: A hybrid artificial intelligence scheme based on self-organizing maps (SOMs) and support vector machines (SVMs) based on statistical learning theory gives far better prediction accuracy for mid-term electricity load forecasting compared to previous research findings.

...read moreread less

Abstract: Forecasting of future electricity demand is very important for decision making in power system operation and planning. In recent years, due to the privatization and deregulation of the power industry, accurate forecasting of future electricity demand has become an important research area for secure operation, management of modern power systems and electricity production in the power generation sector. This paper presents a novel approach for mid-term electricity load forecasting. It uses a hybrid artificial intelligence scheme based on self-organizing maps (SOMs) and support vector machines (SVMs). According to the similarity degree of time series input samples, the SOM is used as a filtering scheme to cluster historical electricity load data into two subsets using the Kohonen rule in an unsupervised manner. As a novel learning machine, the SVM based on statistical learning theory is used for prediction, using support vector regression (SVR). Two epsilon-SVRs are employed to fit the training data of each SOM clustered subset individually in a supervised manner for load prediction. The proposed hybrid SOM-SVR model is evaluated in MATLAB on the electricity load dataset used in the European Network on Intelligent Technologies (EUNITE) competition, arranged by the Eastern Slovakian Electricity Corporation. This proposed model is robust with different data types and can deal well with non- stationarity of load series. Practical application results show that this hybrid technique gives far better prediction accuracy for mid-term electricity load forecasting compared to previous research findings.

...read moreread less

Journal Article•DOI•

Mim Capacitor Modeling by Support Vector Regression

[...]

Zuchong Yang¹, Tao Yang¹, Yingli Liu¹, Shijiao Han¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Jan 2008-Journal of Electromagnetic Waves and Applications

TL;DR: The support vector regression (SVR) method is introduced to model the MIM capacitor and can provides results approaching the accuracy of the EM simulated results without increasing the analysis time significantly, which proves the validity of the method.

...read moreread less

Abstract: The support vector regression (SVR) method is introduced to model the MIM capacitor in this paper. SVM is a type of learning machine based on the statistical learning theory, which implements the s...

...read moreread less

Journal Article•DOI•

Training Hard-Margin Support Vector Machines Using Greedy Stagewise Algorithm

[...]

Liefeng Bo¹, Ling Wang¹, Licheng Jiao¹•Institutions (1)

Xidian University¹

01 Aug 2008-IEEE Transactions on Neural Networks

TL;DR: This paper presents an alternative method, greedy stagewise algorithm for SVMs, named GS-SVMs, that can be faster than LIBSVM 2.83 without sacrificing the accuracy and employs statistical learning theory to analyze the empirical results, which shows that its success lies in that its early stopping rule can act as an implicit regularization term.

...read moreread less

Abstract: Hard-margin support vector machines (HM-SVMs) suffer from getting overfitting in the presence of noise. Soft-margin SVMs deal with this problem by introducing a regularization term and obtain a state-of-the-art performance. However, this disposal leads to a relatively high computational cost. In this paper, an alternative method, greedy stagewise algorithm for SVMs, named GS-SVMs, is presented to cope with the overfitting of HM-SVMs without employing the regularization term. The most attractive property of GS-SVMs is that its computational complexity in the worst case only scales quadratically with the size of training samples. Experiments on the large data sets with up to 400 000 training samples demonstrate that GS-SVMs can be faster than LIBSVM 2.83 without sacrificing the accuracy. Finally, we employ statistical learning theory to analyze the empirical results, which shows that the success of GS-SVMs lies in that its early stopping rule can act as an implicit regularization term.

...read moreread less

Proceedings Article•DOI•

Support vectors pre-extracting for support vector machine based on K nearest neighbour method

[...]

Li Zhang, Ning Ye, Weida Zhou, Licheng Jiao

20 Jun 2008

TL;DR: K-nearest neighbour method is used to extract a boundary vector set which may contain SVs, which reduces the training samples, speeds up the training of support vector machine.

...read moreread less

Abstract: Support vector machine, a universal method for learning from data, gains its development based on statistical learning theory. It shows many advantages in solving nonlinearly small sample and high dimensional problems of pattern recognition. Only a part of samples or support vectors (SVs) plays an important role in the final decision function. But SVs could not be obtained in advance until a quadratic programming is performed. In this paper, we use K-nearest neighbour method to extract a boundary vector set which may contain SVs. The number of the boundary set is smaller than the whole training set. Consequently it reduces the training samples, speeds up the training of support vector machine.

...read moreread less

New theoretical frameworks for machine learning

[...]

Avrim Blum¹, Maria-Florina Balcan¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2008

TL;DR: This thesis introduces a new discriminative model (a PAC or Statistical Learning Theory style model) for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community.

...read moreread less

Abstract: This thesis has two primary thrusts. The first is developing new models and algorithms for important modern and classic learning problems. The second is establishing new connections between Machine Learning and Algorithmic Game Theory. The formulation of the PAC learning model by Valiant [201] and the Statistical Learning Theory framework by Vapnik [203] have been instrumental in the development of machine learning and the design and analysis of algorithms for supervised learning. However, while extremely influential, these models do not capture or explain other important classic learning paradigms such as Clustering, nor do they capture important emerging learning paradigms such as Semi-Supervised Learning and other ways of incorporating unlabeled data in the learning process. In this thesis, we develop the first analog of these general discriminative models to the problems of Semi-Supervised Learning and Clustering, and we analyze both their algorithmic and sample complexity implications. We also provide the first generalization of the well-established theory of learning with kernel functions to case of general pairwise similarity functions and in addition provide new positive theoretical results for Active Learning Finally, this dissertation presents new applications of techniques from Machine Learning to Algorithmic Game Theory, which has been a major area of research at the intersection of Computer Science and Economics. In machine learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many contemporary applications. As a result, a number of different semi-supervised learning methods such as Co-training, transductive SVM, or graph based methods have been developed. However, the underlying assumptions of these methods are often quite distinct and not captured by standard theoretical models. This thesis introduces a new discriminative model (a PAC or Statistical Learning Theory style model) for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help in the semi-supervised learning setting, in which one can analyze both sample-complexity and algorithmic issues. In particular, our model allows us to address in a unified way key issues such as "Under what conditions will unlabeled data help and by how much?" and "How much data should I expect to need in order to perform well?". Another important part of this thesis is Active Learning for which we provide several new theoretical results. In particular, this dissertation includes the first active learning algorithm which works in the presence of arbitrary forms of noise, as well as a few margin based active learning algorithms. In the context of Kernel methods (another flourishing area of machine learning research), this thesis shows how Random Projection techniques can be used to convert a given kernel function into an explicit, distribution dependent set of features, which can then be fed into more general (not necessarily kernelizable) learning algorithms In addition, this work shows how such methods can be extended to more general pairwise similarity functions and also gives a formal theory that matches the standard intuition that a good kernel function is one that acts as a good measure of similarity. We thus strictly generalize and simplify the existing theory of kernel methods. Our approach brings a new perspective as well as a much simpler explanation for the effectiveness of kernel methods, which can help in the design of good kernel functions for new learning problems. We also show how we can use this perspective to help thinking about Clustering in a novel way. While the study of clustering is centered around an intuitively compelling goal (and it has been a major tool in manydifferent fields), reasoning about it in a generic and unified way has been difficult, in part due to the lack of a general theoretical framework along the lines we have for supervised classification. In our work we develop the first general discriminative clustering framework for analyzing accuracy without probabilistic assumptions. This dissertation also contributes with new connections between Machine Learning and Mechanism Design. Specifically, this thesis presents the first general framework in which machine learning methods can be used for reducing mechanism design problems to standard algorithmic questions for a wide range of revenue maximization problems in an unlimited supply setting. Our results substantially generalize the previous work based on random sampling mechanisms–both by broadening the applicability of such mechanisms and by simplifying the analysis. From a learning perspective, these settings present several unique challenges: the loss function is discontinuous and asymmetric, and the range of bidders' valuations may be large.

...read moreread less

Book Chapter•DOI•

Statistical Learning Theory: A Pack-based Strategy for Uncertain Feasibility and Optimization Problems

[...]

Teodoro Alamo¹, Roberto Tempo, F Eduardo¹•Institutions (1)

University of Seville¹

01 Jan 2008-Lecture Notes in Control and Information Sciences

TL;DR: It is shown that the required sample size is inversely proportional to the accuracy for fixed confidence, which is a significant improvement when compared to the existing bounds which depend on 1/∈2 ln 1/ ∈2.

...read moreread less

Abstract: In this paper, a new powerful technique, denoted as pack-based strategy is introduced in the context of statistical learning theory. This strategy allows us to derive bounds on the number of required samples that are manageable for “reasonable” values of probabilistic confidence and accuracy. Using this technique for feasibility and optimization problems involving Boolean expressions consisting of polynomials, we prove that the number of required samples grows with the accuracy parameter ∈ as 1/∈ ln 1/∈. This is a significant improvement when compared to the existing bounds which depend on 1/∈2 ln 1/∈2. We also apply this strategy to convex optimization problems. In this case, we show that the required sample size is inversely proportional to the accuracy for fixed confidence.

...read moreread less

Proceedings Article•DOI•

A comparative study of RBF neural network and SVM classification techniques performed on real data for drinking water quality

[...]

M. Bouamar, M. Ladjal

20 Jul 2008

TL;DR: A comparative study of two techniques resulting from the field of the artificial intelligence namely: RBF neural network (RBF-NN) and support vector machine (SVM) is presented, developed from the statistical learning theory.

...read moreread less

Abstract: The control and monitoring of drinking water is becoming more and more interesting because of its effects on human life. Many techniques were developed in this field in order to ameliorate this process control attending to rigorous follow-ups of the quality of this vital resource. Several methods were implemented to achieve this goal. In this paper, a comparative study of two techniques resulting from the field of the artificial intelligence namely: RBF neural network (RBF-NN) and support vector machine (SVM), is presented. Developed from the statistical learning theory, these methods display optimal training performances and generalization in many fields of application, among others the field of pattern recognition. Applied as classification tools, these techniques should ensure within a multi-sensor monitoring system, a direct and quasi permanent control of water quality. In order to evaluate their performances, a simulation using real data, corresponding to the recognition rate, the training time, and the robustness, is carried out. To validate their functionalities, an application is presented.

...read moreread less

Journal Article•DOI•

The theoretical foundations of statistical learning theory based on fuzzy number samples

[...]

Ming-Hu Ha¹, Jing Tian¹•Institutions (1)

Hebei University¹

01 Aug 2008-Information Sciences

TL;DR: The concepts of fuzzy expected risk functional, fuzzy empiricalrisk functional and fuzzy empirical risk minimization principle are redefined and the key theorem of learning theory based on fuzzy number samples is proved.

...read moreread less

Proceedings Article•DOI•

Application of Particle Swarm Optimization-Based Support Vector Machine in Fault Diagnosis of Turbo-Generator

[...]

Sheng-wei Fei¹, Chengliang Liu¹, Qingbing Zeng¹, Yubin Miao¹•Institutions (1)

Shanghai Jiao Tong University¹

20 Dec 2008

TL;DR: The proposed PSO-SVM model is applied to fault diagnosis of turbo-generator, among which PSO is used to determine free parameters of support vector machine, and is validated by the results of fault diagnosis examples.

...read moreread less

Abstract: Support vector machine (SVM) is a new machine learning method based on statistical learning theory, which is a powerful tool for solving the problem with small sample, nonlinear and high dimension. However, the practicability of SVM is affected due to the difficulty of selecting appropriate SVM parameters. Particle swarm optimization (PSO) is a new optimization method, which is motivated by social behavior of bird flocking or fish schooling. The optimization method not only has strong global search capability, but also is very easy to implement. Thus, in the study, the proposed PSO-SVM model is applied to fault diagnosis of turbo-generator, among which PSO is used to determine free parameters of support vector machine. Finally, the effectiveness and correctness of this method are validated by the results of fault diagnosis examples. Consequently, PSO-SVM is a proper method in fault diagnosis of turbo-generator.

...read moreread less

Proceedings Article•DOI•

Application of RS-SVM in Construction Project Cost Forecasting

[...]

Feng Kong, Xiao-juan Wu, Li-ya Cai

18 Nov 2008

TL;DR: The research results show that the prediction accuracy of RS-SVM is better than that of standard SVM, and the theory of the Rough Set for good performance in attribute reduction is introduced.

...read moreread less

Abstract: Evaluation of construction projects is an important task for management of construction projects.An accurate forecast is required to enable supporting the investment decision and to ensure the project's feasible at the minimal cost. So controlling and rationally determining the project cost plays the most important roles in the budget management of the construction project. Ways and means have been explored to satisfy the requirements for prediction of construction projects. Recently a novel regression technique, called Support Vector Machines (SVM), based on the statistical learning theory is exploded in this paper for the prediction of construction project cost. Nevertheless, The standard SVM still has some difficults in attribute reduction and precision of prediction. This paper introduced the theory of the Rough Set (RS) for good performance in attribute reduction, considered and extracted substances components of construction project as parameters, and seted up the Model of the Construction Project Cost Forecasting based on the RS-SVM. The research results show that the prediction accuracy of RS-SVM is better than that of standard SVM.

...read moreread less

Journal Article•

Combination of Support Vector Machine and Evidence Theory in Information Fusion

[...]

LI Shao-hong¹•Institutions (1)

Beihang University¹

01 Jan 2008-Chinese Journal of Sensors and Actuators

TL;DR: Simulation results show that multi-sensor information fusion can be realized and the error rate can be greatly lowered through the algorithm proposed in this paper.

...read moreread less

Abstract: DS evidence theory is an important method in the field of multi-sensor information fusion, but its advantage is not fully utilized because its BPA is difficult to obtain. SVM is a new learning algorithm based on the statistical learning theory. However, its hard decision output does not adequately facilitate multi-sensor information fusion. In this paper, in order to apply SVM to information fusion, a two-class SVM with BPA output is proposed. By analyzing the essence and deficiency of the Platt’s model, the BPA is obtained through use of the lower bound of the SVM precision to weight the Platt’s probability model, which achieves the combination of SVM and the evidence theory in the information fusion. The simulation results show that multi-sensor information fusion can be realized and the error rate can be greatly lowered through the algorithm proposed in this paper.

...read moreread less

Journal Article•

The Key Theorem and the Bounds on the Rate of Uniform Convergence of Statistical Learning Theory on Quasi-Probability Spaces

[...]

Ha Ming

01 Jan 2008-Chinese Journal of Computers

TL;DR: In this paper, the key theorem of learning theory on quasi-probability spaces is proved, and the bounds on the rate of uniform convergence of learning process on a quasi-random variable and its distribution function, expected value and variance are presented.

...read moreread less

Abstract: Some properties of quasi-probability are further discussed.The definitions and properties of quasi-random variable and its distribution function,expected value and variance are then presented.Markov inequality,Chebyshev's inequality and the Khinchine's law of large numbers on quasi-probability spaces are also proved.Then the key theorem of learning theory on quasi-probability spaces is proved,and the bounds on the rate of uniform convergence of learning process on quasi-probability spaces are constructed.The investigations will help lay essential theoretical foundations for the systematic and comprehensive development of the quasi-statistical learning theory.

...read moreread less