Showing papers on "Statistical learning theory published in 2009"

PDF

Open Access

Book•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

27 Jun 2009

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

...read moreread less

Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

...read moreread less

2,515 citations

Journal Article•DOI•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

01 Mar 2009-Foundations and Trends in Information Retrieval

TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.

...read moreread less

Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

...read moreread less

591 citations

Journal Article•DOI•

PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data

[...]

Michael Hanke¹, Yaroslav O. Halchenko², Yaroslav O. Halchenko³, Per B. Sederberg⁴, Stephen José Hanson², James V. Haxby⁵, Stefan Pollmann¹ - Show less +3 more•Institutions (5)

Otto-von-Guericke University Magdeburg¹, Rutgers University², New Jersey Institute of Technology³, Princeton University⁴, Dartmouth College⁵

28 Jan 2009-Neuroinformatics

TL;DR: A Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets, which makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages.

...read moreread less

Abstract: Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

...read moreread less

480 citations

Journal Article•DOI•

Support vector machines

[...]

Alessia Mammone¹, Marco Turchi², Nello Cristianini²•Institutions (2)

Sapienza University of Rome¹, University of Bristol²

01 Nov 2009-Wiley Interdisciplinary Reviews: Computational Statistics

TL;DR: Support vector machines are a family of machine learning methods originally introduced for the problem of classification and later generalized to various other situations, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision.

...read moreread less

Abstract: Support vector machines (SVMs) are a family of machine learning methods, originally introduced for the problem of classification and later generalized to various other situations. They are based on principles of statistical learning theory and convex optimization, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision. Copyright © 2009 John Wiley & Sons, Inc. For further resources related to this article, please visit the WIREs website.

...read moreread less

323 citations

Book•

Knowledge Discovery with Support Vector Machines

[...]

Lutz Hamel

03 Aug 2009

TL;DR: Knowledge Discovery with Support Vector Machines (KVM) as mentioned in this paper provides an in-depth, easy-to-follow introduction to support vector machines drawing only from minimal, carefully motivated technical and mathematical background material.

...read moreread less

Abstract: An easy-to-follow introduction to support vector machines This book provides an in-depth, easy-to-follow introduction to support vector machines drawing only from minimal, carefully motivated technical and mathematical background material. It begins with a cohesive discussion of machine learning and goes on to cover: Knowledge discovery environments Describing data mathematically Linear decision surfaces and functions Perceptron learning Maximum margin classifiers Support vector machines Elements of statistical learning theory Multi-class classification Regression with support vector machines Novelty detection Complemented with hands-on exercises, algorithm descriptions, and data sets, Knowledge Discovery with Support Vector Machines is an invaluable textbook for advanced undergraduate and graduate courses. It is also an excellent tutorial on support vector machines for professionals who are pursuing research in machine learning and related areas.

...read moreread less

274 citations

Book•

Algebraic Geometry and Statistical Learning Theory

[...]

Sumio Watanabe¹•Institutions (1)

Tokyo Institute of Technology¹

01 Aug 2009

TL;DR: The theory achieved here underpins accurate estimation techniques in the presence of singularities and lays the foundations for the use of algebraic geometry in statistical learning theory.

...read moreread less

Abstract: Sure to be influential, Watanabe's book lays the foundations for the use of algebraic geometry in statistical learning theory. Many models/machines are singular: mixture models, neural networks, HMMs, Bayesian networks, stochastic context-free grammars are major examples. The theory achieved here underpins accurate estimation techniques in the presence of singularities.

...read moreread less

255 citations

Journal Article•DOI•

Elastic-net regularization in learning theory

[...]

Christine De Mol¹, Ernesto De Vito, Lorenzo Rosasco²•Institutions (2)

Université libre de Bruxelles¹, Massachusetts Institute of Technology²

01 Apr 2009-Journal of Complexity

TL;DR: It is proved that there exists a particular ''elastic-net representation'' of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection.

...read moreread less

227 citations

Elastic-net regularization in learning theory

[...]

Christine De Mol¹, Ernesto De Vito, Lorenzo Rosasco²•Institutions (2)

Université libre de Bruxelles¹, Massachusetts Institute of Technology²

01 Jan 2009

TL;DR: Zou and Hastie as discussed by the authors proposed an elastic-net regularization scheme for random-design regression, where the response variable is vector-valued and the prediction functions are linear combinations of elements (features) in an infinite-dimensional dictionary.

...read moreread less

Abstract: Within the framework of statistical learning theory we analyze in detail the so-called elastic-net regularization scheme proposed by Zou and Hastie [H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67(2) (2005) 301-320] for the selection of groups of correlated variables. To investigate the statistical properties of this scheme and in particular its consistency properties, we set up a suitable mathematical framework. Our setting is random-design regression where we allow the response variable to be vector-valued and we consider prediction functions which are linear combinations of elements (features) in an infinite-dimensional dictionary. Under the assumption that the regression function admits a sparse representation on the dictionary, we prove that there exists a particular ''elastic-net representation'' of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection. Our results include finite-sample bounds and an adaptive scheme to select the regularization parameter. Moreover, using convex analysis tools, we derive an iterative thresholding algorithm for computing the elastic-net solution which is different from the optimization procedure originally proposed in the above-cited work.

...read moreread less

208 citations

Journal Article•DOI•

Fault diagnosis of power transformer based on support vector machine with genetic algorithm

[...]

Sheng-wei Fei¹, Xiao-bin Zhang•Institutions (1)

Shanghai Jiao Tong University¹

01 Oct 2009-Expert Systems With Applications

TL;DR: The experimental results indicate that the SVMG method can achieve higher diagnostic accuracy than IEC three ratios, normal SVM classifier and artificial neural network.

...read moreread less

Abstract: Diagnosis of potential faults concealed inside power transformers is the key of ensuring stable electrical power supply to consumers. Support vector machine (SVM) is a new machine learning method based on the statistical learning theory, which is a powerful tool for solving the problem with small sampling, nonlinearity and high dimension. The selection of SVM parameters has an important influence on the classification accuracy of SVM. However, it is very difficult to select appropriate SVM parameters. In this study, support vector machine with genetic algorithm (SVMG) is applied to fault diagnosis of a power transformer, in which genetic algorithm (GA) is used to select appropriate free parameters of SVM. The experimental data from several electric power companies in China are used to illustrate the performance of the proposed SVMG model. The experimental results indicate that the SVMG method can achieve higher diagnostic accuracy than IEC three ratios, normal SVM classifier and artificial neural network.

...read moreread less

192 citations

Journal Article•DOI•

Randomized Strategies for Probabilistic Solutions of Uncertain Feasibility and Optimization Problems

[...]

Teodoro Alamo¹, Roberto Tempo, Eduardo F. Camacho¹•Institutions (1)

University of Seville¹

13 Oct 2009-IEEE Transactions on Automatic Control

TL;DR: A randomized algorithm is proposed that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived, and it is proved that the required sample size is inversely proportional to the accuracy for fixed confidence.

...read moreread less

Abstract: In this paper, we study two general semi-infinite programming problems by means of a randomized strategy based on statistical learning theory. The sample size results obtained with this approach are generally considered to be very conservative by the control community. The first main contribution of this paper is to demonstrate that this is not necessarily the case. Utilizing as a starting point one-sided results from statistical learning theory, we obtain bounds on the number of required samples that are manageable for ldquoreasonablerdquo values of probabilistic confidence and accuracy. In particular, we show that the number of required samples grows with the accuracy parameter epsiv as 1/epsivln 1/epsiv , and this is a significant improvement when compared to the existing bounds which depend on 1/epsiv2ln 1/epsiv2. Secondly, we present new results for optimization and feasibility problems involving Boolean expressions consisting of polynomials. In this case, when the accuracy parameter is sufficiently small, an explicit bound that only depends on the number of decision variables, and on the confidence and accuracy parameters is presented. For convex optimization problems, we also prove that the required sample size is inversely proportional to the accuracy for fixed confidence. Thirdly, we propose a randomized algorithm that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived.

...read moreread less

191 citations

Book•

Machine Learning for Spatial Environmental Data: Theory, Applications, and Software

[...]

Mikhail Kanevski, Alexei Pozdnoukhov, Vadim Timonin

09 Jun 2009

TL;DR: PREFACE LEARNING FROM GEOSPATIAL DATA Problems and important concepts of machine learning Machine learning algorithms for geospatial data Contents of the book Software description Short review of the literature

...read moreread less

Abstract: PREFACE LEARNING FROM GEOSPATIAL DATA Problems and important concepts of machine learning Machine learning algorithms for geospatial data Contents of the book Software description Short review of the literature EXPLORATORY SPATIAL DATA ANALYSIS PRESENTATION OF DATA AND CASE STUDIES Exploratory spatial data analysis Data pre-processing Spatial correlations: Variography Presentation of data k-Nearest neighbours algorithm: a benchmark model for regression and classification Conclusions to chapter GEOSTATISTICS Spatial predictions Geostatistical conditional simulations Spatial classification Software Conclusions ARTIFICIAL NEURAL NETWORKS Introduction Radial basis function neural networks General regression neural networks Probabilistic neural networks Self-organising maps Gaussian mixture models and mixture density network Conclusions SUPPORT VECTOR MACHINES AND KERNEL METHODS Introduction to statistical learning theory Support vector classification Spatial data classification with SVM Support vector regression Advanced topics in kernel methods REFERENCES INDEX

...read moreread less

Journal Article•DOI•

A Collaborative Training Algorithm for Distributed Learning

[...]

J.B. Predd¹, Sanjeev R. Kulkarni², H.V. Poor²•Institutions (2)

RAND Corporation¹, Princeton University²

01 Apr 2009-IEEE Transactions on Information Theory

TL;DR: A statistical analysis shows that the generalization error afforded agents by the collaborative training algorithm can be bounded in terms of the relationship between the network topology and the representational capacity of the relevant reproducing kernel Hilbert space.

...read moreread less

Abstract: In this paper, an algorithm is developed for collaboratively training networks of kernel-linear least-squares regression estimators. The algorithm is shown to distributively solve a relaxation of the classical centralized least-squares regression problem. A statistical analysis shows that the generalization error afforded agents by the collaborative training algorithm can be bounded in terms of the relationship between the network topology and the representational capacity of the relevant reproducing kernel Hilbert space. Numerical experiments suggest that the algorithm is effective at reducing noise. The algorithm is relevant to the problem of distributed learning in wireless sensor networks by virtue of its exploitation of local communication. Several new questions for statistical learning theory are proposed.

...read moreread less

Proceedings Article•

Adaptive concept drift detection

[...]

Anton Dries¹, Ulrich Rückert²•Institutions (2)

Katholieke Universiteit Leuven¹, International Computer Science Institute²

02 May 2009

TL;DR: In this paper, the authors present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand, based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier.

...read moreread less

Abstract: An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. The statistical theory offers rank-based statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects the properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we show how uniform convergence bounds in learning theory can be adjusted for adaptive concept drift detection. In particular, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1-norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier. We compare these new approaches with the maximum mean discrepancy method, the StreamKrimp system, and the multivariate Wald–Wolfowitz test. The results indicate that the new methods are able to detect concept drift reliably and that they perform favorably in a precision-recall analysis. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 311-327, 2009

...read moreread less

Journal Article•DOI•

Fabric defect detection based on multiple fractal features and support vector data description

[...]

Honggang Bu¹, Jun Wang¹, Xiubao Huang¹•Institutions (1)

Donghua University¹

01 Mar 2009-Engineering Applications of Artificial Intelligence

TL;DR: A robust new scheme is presented in this paper for optimally selecting values of the parameters especially that of the scale parameter of the Gaussian kernel function involved in the training of the SVDD model.

...read moreread less

Theoretical foundations of active learning

[...]

Steve Hanneke¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2009

TL;DR: Borders on the rates of convergence achievable by active learning are derived, under various noise models and under general conditions on the hypothesis class.

...read moreread less

Abstract: I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning.

...read moreread less

Journal Article•DOI•

Natural disaster risk analysis for critical infrastructure systems: An approach based on statistical learning theory

[...]

Seth D. Guikema¹•Institutions (1)

Johns Hopkins University¹

01 Apr 2009-Reliability Engineering & System Safety

TL;DR: An overview of statistical learning theory methods is provided and their potential for greater use in risk analysis is discussed.

...read moreread less

Journal Article•DOI•

Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies

[...]

Wolfgang Karl Härdle¹, Yuh-Jye Lee², Dorothea Schäfer, Yi-Ren Yeh²•Institutions (2)

Humboldt University of Berlin¹, National Taiwan University of Science and Technology²

01 Sep 2009-Journal of Forecasting

TL;DR: The suitability of smooth support vector machines (SSVM), and how important factors such as the selection of appropriate accounting ratios (predictors), length of training period and structure of the training sample influence the precision of prediction are investigated.

...read moreread less

Abstract: Variable Selection and Oversampling in the Use of Smooth Support Vector Machines for Predicting the Default Risk of Companies

...read moreread less

Book Chapter•DOI•

SVM: Support Vector Machines

[...]

Hui Xue, Qiang Yang, Songcan Chen

09 Apr 2009

TL;DR: Support vector machines (SVM) as discussed by the authors are among the most robust and accurate methods in all well-known data mining algorithms and have a sound theoretical foundation rooted in statistical learning theory, require only as few as a dozen examples for training, and are insensitive to the number of dimensions.

...read moreread less

Abstract: Support vector machines (SVMs), including support vector classiﬁer (SVC) and support vector regressor (SVR), are among the most robust and accurate methods in all well-known data mining algorithms. SVMs, which were originally developed by Vapnik in the 1990s [1-11], have a sound theoretical foundation rooted in statistical learning theory, require only as few as a dozen examples for training, and are often insensitive to the number of dimensions. In the past decade, SVMs have been developed at a fast pace both in theory and practice.

...read moreread less

Journal Article•DOI•

Another look at statistical learning theory and regularization

[...]

Vladimir Cherkassky¹, Yunqian Ma²•Institutions (2)

University of Minnesota¹, Honeywell²

01 Sep 2009-Neural Networks

TL;DR: This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform.

...read moreread less

Proceedings Article•DOI•

An algorithm of soft fault diagnosis for analog circuit based on the optimized SVM by GA

[...]

Hua Li¹, YongXin Zhang¹•Institutions (1)

Northeastern University (China)¹

02 Oct 2009

TL;DR: The experimental results of a negative feedback amplifier circuit indicate that the GA-SVM method can achieve higher diagnostic accuracy than normal SVM classifier and artificial neural network.

...read moreread less

Abstract: A soft fault diagnosis method for analog circuits based on Support vector machine (SVM)is developed in this paper. SVM is a novel machine learning method based on the statistical learning theory, which is a powerful tool for solving the problem with small sampling, nonlinearity and high dimension. The multi-classification SVM methods including one versus rest, one versus one, and decision directed acyclic graph(DDAG)has been applied to many areas. Some researchers have used it in fault diagnosis of analog circuit. The selection of SVM parameters has an important influence on the classification accuracy of SVM. However, it is very difficult to select appropriate SVM parameters. In this study, support vector machine with genetic algorithm (GA-SVM) is applied to fault diagnosis, in which genetic algorithm (GA) is used to select appropriate parameters of SVM. The experimental results of a negative feedback amplifier circuit indicate that the GA-SVM method can achieve higher diagnostic accuracy than normal SVM classifier and artificial neural network.

...read moreread less

Journal Article•

Mammographic mass classification using wavelet based support vector machine

[...]

Pelin Gorgel, Ahmet Sertbas¹, Niyazi Kilic, Osman N. Ucan, Onur Osman - Show less +1 more•Institutions (1)

Istanbul University¹

01 Jan 2009-IU-Journal of Electrical & Electronics Engineering

TL;DR: A combination of Support Vector Machine (SVM) and wavelet-based subband image decomposition and decision making was performed in two stages as feature extraction by computing the wavelet coefficients and classification using the classifier trained on the extracted features.

...read moreread less

Abstract: In this paper, we investigate an approach for classification of mammographic masses as benign or malign. This study relies on a combination of Support Vector Machine (SVM) and wavelet-based subband image decomposition. Decision making was performed in two stages as feature extraction by computing the wavelet coefficients and classification using the classifier trained on the extracted features. SVM, a learning machine based on statistical learning theory, was trained through supervised learning to classify masses. The research involved 66 digitized mammographic images. The masses were segmented manually by radiologists, prior to introduction to the classification system. Preliminary test on mammogram showed over 84.8% classification accuracy by using the SVM with Radial Basis Function (RBF) kernel. Also confusion matrix, accuracy, sensitivity and specificity analysis with different kernel types were used to show the classification performance of SVM.

...read moreread less

Journal Article•DOI•

Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions

[...]

David Corfield¹, Bernhard Schölkopf², Vladimir Vapnik•Institutions (2)

University of Kent¹, Max Planck Society²

20 Aug 2009-Journal for General Philosophy of Science

TL;DR: It is discussed how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

...read moreread less

Abstract: We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

...read moreread less

Book Chapter•DOI•

Metric Learning for Prototype-based classification

[...]

Michael Biehl¹, Barbara Hammer², Petra Schneider¹, Thomas Villmann³•Institutions (3)

University of Groningen¹, Clausthal University of Technology², Leipzig University³

01 Jan 2009-Springer US

TL;DR: In this chapter, one of themost popular and intuitive prototype-based classification algorithms, learning vector quantization (LVQ), is revisited, and recent extensions towards automatic metric adaptation are introduced.

...read moreread less

Abstract: In this chapter, one of themost popular and intuitive prototype-based classification algorithms, learning vector quantization (LVQ), is revisited, and recent extensions towards automatic metric adaptation are introduced Metric adaptation schemes extend LVQ in two aspects: on the one hand a greater flexibility is achieved since the metric which is essential for the classification is adapted according to the given classification task at hand On the other hand a better interpretability of the results is gained since the metric parameters reveal the relevance of single dimensions as well as correlations which are important for the classification Thereby, the flexibility of the metric can be scaled from a simple diagonal term to full matrices attached locally to the single prototypes These choices result in a more complex form of the classification boundaries of the models, whereby the excellent inherent generalization ability of the classifier is maintained, as can be shown by means of statistical learning theory

...read moreread less

Journal Article•DOI•

On Stochastic Optimization and Statistical Learning in Reproducing Kernel Hilbert Spaces by Support Vector Machines (SVM)

[...]

Vladimir I. Norkin¹, Michiel Keyzer²•Institutions (2)

National Academy of Sciences of Ukraine¹, University of Amsterdam²

01 Apr 2009-Informatica (lithuanian Academy of Sciences)

TL;DR: The paper solves stochastic optimization problems in Reproducing Kernel Hilbert Spaces by sample average approximation combined with Tihonov's regularization and establishes sufficient conditions for uniform convergence of approximate solutions with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size.

...read moreread less

Abstract: The paper studies stochastic optimization problems in Reproducing Kernel Hilbert Spaces (RKHS). The objective function of such problems is a mathematical expectation functional depending on decision rules (or strategies), i.e. on functions of observed random parameters. Feasible rules are restricted to belong to a RKHS. This kind of problems arises in on-line decision making and in statistical learning theory. We solve the problem by sample average approximation combined with Tihonov's regularization and establish sufficient conditions for uniform convergence of approximate solutions with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size.

...read moreread less

Proceedings Article•DOI•

Geological Units Classification of Multispectral Images by Using Support Vector Machines

[...]

Milo Kovacevic¹, Branislav Bajat¹, Branislav Trivić¹, Radmila Pavlović¹•Institutions (1)

University of Belgrade¹

04 Nov 2009

TL;DR: Support Vector Machines is introduced, a method derived from recent achievements in the statistical learning theory, in classification of geological units based on the source of the Landsat multispectral images, suggesting the usefulness of the proposed classification approach.

...read moreread less

Abstract: Quantitative techniques for spatial prediction and classification in geological survey are developing rapidly. The recent applications of machine learning techniques confirm possibilities of their application in this field of research. The paper introduces Support Vector Machines, a method derived from recent achievements in the statistical learning theory, in classification of geological units based on the source of the Landsat multispectral images. The initial experiments suggest the usefulness of the proposed classification approach.

...read moreread less

Proceedings Article•DOI•

Electricity Price Forecasting Based on Support Vector Machine Trained by Genetic Algorithm

[...]

Chen Yan-Gao¹, Ma Guangwen¹•Institutions (1)

Sichuan University¹

21 Nov 2009

TL;DR: Support vector machine trained by genetic algorithm (GA-SVM) is adopted to forecast electricity price, in which GA is used to select parameters of SVM, which has better prediction accuracy than radial basis function neural network (RBFNN).

...read moreread less

Abstract: Accurate electricity price forecasting can provide crucial information for electricity market participants to make reasonable competing strategies. Support vector machine (SVM) is a novel algorithm based on statistical learning theory, which has greater generalization ability, and is superior to the empirical risk minimization principle as adopted by traditional neural networks. However, its generalization performance depends on a good setting of the training parameters C, for the nonlinear SVM. In the study, support vector machine trained by genetic algorithm (GA-SVM) is adopted to forecast electricity price, in which GA is used to select parameters of SVM. National electricity price data in China from 1996 to 2007 are used to study the forecasting performance of the GA-SVM model. The experimental results show that GA-SVM algorithm has better prediction accuracy than radial basis function neural network (RBFNN).

...read moreread less

Dissertation•

Fundamental Limitations of Semi-Supervised Learning

[...]

Tyler Lu

05 May 2009

TL;DR: This thesis develops a framework under which one can analyze the potential benefits, as measured by the sample complexity of semi-supervised learning, and concludes that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution, semi- supervised learning cannot provide significant advantages over supervised learning.

...read moreread less

Abstract: The emergence of a new paradigm in machine learning known as semi-supervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semi-supervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theory-practice gap. We focus on investigating the inherent limitations of the benefits semi-supervised learning can provide over supervised learning. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of semi-supervised learning. Our framework is utopian in the sense that a semi-supervised algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual semi-supervised model. Thus, any lower bound on the sample complexity of semi-supervised learning in this model implies lower bounds in the usual model. Roughly, our conclusion is that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution (“SSL type assumption”), semi-supervised learning cannot provide significant advantages over supervised learning. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a no-prior-knowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class. We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring unlabeled data and doing supervised learning. This thesis concludes with a look into future research directions that builds on our investigation.

...read moreread less

Book Chapter•DOI•

A Statistical Learning Perspective of Genetic Programming

[...]

Nur Merve Amil¹, Nicolas Bredeche¹, Christian Gagné¹, Sylvain Gelly¹, Marc Schoenauer¹, Olivier Teytaud¹ - Show less +2 more•Institutions (1)

University of Paris-Sud¹

10 Apr 2009

TL;DR: It is proved that a parsimonious fitness ensures universal consistency and a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

...read moreread less

Abstract: This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

...read moreread less

Journal Article•DOI•

A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning

[...]

Fen Xia¹, Yanwu Yang¹, Liang Zhou¹, Fuxin Li¹, Min Cai², Daniel Zeng¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Beijing Jiaotong University²

01 Jul 2009-Pattern Recognition

TL;DR: This paper investigates an approach reducing a multi-class cost-sensitive learning to a standard classification task based on the data space expansion technique developed by Abe et al., which coincides with Elkan's reduction with respect to binary classification tasks.

...read moreread less

Proceedings Article•DOI•

Automatic recognition of epileptic seizure in EEG via support vector machine and dimension fractal

[...]

Mauro Schneider¹, Pollyana Notargiacomo Mustaro¹, Clodoaldo A. M. Lima¹•Institutions (1)

Mackenzie Presbyterian University¹

14 Jun 2009

TL;DR: From the experimental results, it can be seen that classification based on SVM with FD perform well in EEG signals classification, which indicates this classification method is valid and has promising application.

...read moreread less

Abstract: Support vector machine (SVM) is a machine learning technique widely applied in classification problems. SVM are based on the Vapnik's Statistical Learning Theory, and successively extended by a number of researchers. On the order hand, the electroencephalogram (EEG) signal captures the electrical activity of the brain and is an important source of information for studying neurological disorders. In order to extract relevant information of EEG signal, a variety of computerized-analysis methods have been developed. Recent studies indicate that methods based on the nonlinear dynamics theory can extract valuable information from neuronal dynamics. However, many these of methods need large amount of data and are computationally expensive. From chaos theory, a global value that is relatively simple to compute is the fractal dimension (FD), it can be used to measure the geometrical complexity of a time series. The FD of a waveform represents a powerful tool for transient detection. In analysis of EEG this feature can been used to identify and distinguish specific states of physiologic function. A variety of algorithms are available for the computation of FD. In this work, we employ SVM to classify the EEG signals from healthy subjects and epileptic subjects using as the features vector the FD. From the experimental results, we can see that classification based on SVM with FD perform well in EEG signals classification, which indicates this classification method is valid and has promising application.

...read moreread less