Showing papers on "Statistical learning theory published in 2007"

PDF

Open Access

Book Chapter•DOI•

Applications of Support Vector Machines in Chemistry

[...]

14 Feb 2007

TL;DR: Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner, and are a group of supervised learning methods that can be applied to classification or regression.

...read moreread less

Abstract: Kernel-based techniques (such as support vector machines, Bayes point machines, kernel principal component analysis, and Gaussian processes) represent a major development in machine learning algorithms. Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression. In a short period of time, SVM found numerous applications in chemistry, such as in drug design (discriminating between ligands and nonligands, inhibitors and noninhibitors, etc.), quantitative structure-activity relationships (QSAR, where SVM regression is used to predict various physical, chemical, or biological properties), chemometrics (optimization of chromatographic separation or compound concentration prediction from spectral data as examples), sensors (for qualitative and quantitative prediction from sensor data), chemical engineering (fault detection and modeling of industrial processes), and text mining (automatic recognition of scientific information). Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner. The SVM algorithm is based on the statistical learning theory and the Vapnik–Chervonenkis

...read moreread less

375 citations

Book•

PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning

[...]

Olivier Catoni

30 Jun 2007

TL;DR: An alternative selection scheme based on relative bounds between estimators is described and study, and a two step localization technique which can handle the selection of a parametric model from a family of those is presented.

...read moreread less

Abstract: This monograph deals with adaptive supervised classification, using tools borrowed from statistical mechanics and information theory, stemming from the PACBayesian approach pioneered by David McAllester and applied to a conception of statistical learning theory forged by Vladimir Vapnik. Using convex analysis on the set of posterior probability measures, we show how to get local measures of the complexity of the classification model involving the relative entropy of posterior distributions with respect to Gibbs posterior measures. We then discuss relative bounds, comparing the generalization error of two classification rules, showing how the margin assumption of Mammen and Tsybakov can be replaced with some empirical measure of the covariance structure of the classification model.We show how to associate to any posterior distribution an effective temperature relating it to the Gibbs prior distribution with the same level of expected error rate, and how to estimate this effective temperature from data, resulting in an estimator whose expected error rate converges according to the best possible power of the sample size adaptively under any margin and parametric complexity assumptions. We describe and study an alternative selection scheme based on relative bounds between estimators, and present a two step localization technique which can handle the selection of a parametric model from a family of those. We show how to extend systematically all the results obtained in the inductive setting to transductive learning, and use this to improve Vapnik's generalization bounds, extending them to the case when the sample is made of independent non-identically distributed pairs of patterns and labels. Finally we review briefly the construction of Support Vector Machines and show how to derive generalization bounds for them, measuring the complexity either through the number of support vectors or through the value of the transductive or inductive margin.

...read moreread less

369 citations

Journal Article•DOI•

The application of machine learning to structural health monitoring

[...]

Keith Worden¹, Graeme Manson¹•Institutions (1)

University of Sheffield¹

15 Feb 2007-Philosophical Transactions of the Royal Society A

TL;DR: The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies.

...read moreread less

Abstract: In broad terms, there are two approaches to damage identification Model-driven methods establish a high-fidelity physical model of the structure, usually by finite element analysis, and then establish a comparison metric between the model and the measured data from the real structure If the model is for a system or structure in normal (ie undamaged) condition, any departures indicate that the structure has deviated from normal condition and damage is inferred Data-driven approaches also establish a model, but this is usually a statistical representation of the system, eg a probability density function of the normal condition Departures from normality are then signalled by measured data appearing in regions of very low density The algorithms that have been developed over the years for data-driven approaches are mainly drawn from the discipline of pattern recognition, or more broadly, machine learning The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies

...read moreread less

342 citations

Journal Article•DOI•

Support Vector Machine for Classification of Voltage Disturbances

[...]

Peter G. V. Axelberg, Irene Yu-Hua Gu¹, Math Bollen²•Institutions (2)

Chalmers University of Technology¹, Luleå University of Technology²

02 Jul 2007-IEEE Transactions on Power Delivery

TL;DR: A novel SVM classification system for voltage disturbances with high accuracy in classification with training data from one power network and unseen testing data from another and lower accuracy when the SVM classifier was trained on synthetic data and test data originated from the power network.

...read moreread less

Abstract: The support vector machine (SVM) is a powerful method for statistical classification of data used in a number of different applications. However, the usefulness of the method in a commercial available system is very much dependent on whether the SVM classifier can be pretrained from a factory since it is not realistic that the SVM classifier must be trained by the customers themselves before it can be used. This paper proposes a novel SVM classification system for voltage disturbances. The performance of the proposed SVM classifier is investigated when the voltage disturbance data used for training and testing originated from different sources. The data used in the experiments were obtained from both real disturbances recorded in two different power networks and from synthetic data. The experimental results shown high accuracy in classification with training data from one power network and unseen testing data from another. High accuracy was also achieved when the SVM classifier was trained on data from a real power network and test data originated from synthetic data. A lower accuracy resulted when the SVM classifier was trained on synthetic data and test data originated from the power network.

...read moreread less

195 citations

Book Chapter•DOI•

Support Vector Machine

[...]

Abhisek Ukil

01 Jan 2007

127 citations

Journal Article•DOI•

On L1-Norm Multiclass Support Vector Machines

[...]

Lifeng Wang¹, Xiaotong Shen•Institutions (1)

University of Pennsylvania¹

01 Jan 2007-Journal of the American Statistical Association

TL;DR: A novel multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation, and is compared against some competitors in terms of accuracy of prediction.

...read moreread less

Abstract: Binary support vector machines (SVMs) have been proven to deliver high performance. In multiclass classification, however, issues remain with respect to variable selection. One challenging issue is classification and variable selection in the presence of variables in the magnitude of thousands, greatly exceeding the size of training sample. This often occurs in genomics classification. To meet the challenge, this article proposes a novel multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation. The proposed methodology, together with the developed regularization solution path, permits variable selection in such a situation. For the proposed methodology, a statistical learning theory is developed to quantify the generalization error in an attempt to gain insight into the basic structure of sparse learning, permitting the number of variables to greatly exceed the sample size. The operating characteristics of the...

...read moreread less

115 citations

Posted Content•

Image Classification Using SVMs: One-against-One Vs One-against-All

[...]

Anthony Gidudu, Greg Hulley, Tshilidzi Marwala

19 Nov 2007-arXiv: Learning

TL;DR: The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach.

...read moreread less

Abstract: Support Vector Machines (SVMs) are a relatively new supervised classification technique to the land cover mapping community They have their roots in Statistical Learning Theory and have gained prominence because they are robust, accurate and are effective even when using a small training sample By their nature SVMs are essentially binary classifiers, however, they can be adopted to handle the multiple classification tasks common in remote sensing studies The two approaches commonly used are the One-Against-One (1A1) and One-Against-All (1AA) techniques In this paper, these approaches are evaluated in as far as their impact and implication for land cover mapping The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach It is the authors conclusion therefore that ultimately the choice of technique adopted boils down to personal preference and the uniqueness of the dataset at hand

...read moreread less

87 citations

Journal Article•

VC Theory of Large Margin Multi-Category Classifiers

[...]

Yann Guermeur

01 Dec 2007-Journal of Machine Learning Research

TL;DR: A VC theory of large margin multi-category classifiers is introduced, central in this theory are generalized VC dimensions called the γ-Ψ-dimensions, which make it possible to apply the structural risk minimization inductive principle to those machines.

...read moreread less

Abstract: In the context of discriminant analysis, Vapnik's statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binary-valued functions, the computation of dichotomies with real-valued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself. The case of classes of vector-valued functions used to compute polytomies has seldom been considered independently, which is unsatisfactory, for three main reasons. First, this case encompasses the other ones. Second, it cannot be treated appropriately through a naive extension of the results devoted to the computation of dichotomies. Third, most of the classification problems met in practice involve multiple categories. In this paper, a VC theory of large margin multi-category classifiers is introduced. Central in this theory are generalized VC dimensions called the γ-Ψ-dimensions. First, a uniform convergence bound on the risk of the classifiers of interest is derived. The capacity measure involved in this bound is a covering number. This covering number can be upper bounded in terms of the γ-Ψ-dimensions thanks to generalizations of Sauer's lemma, as is illustrated in the specific case of the scale-sensitive Natarajan dimension. A bound on this latter dimension is then computed for the class of functions on which multi-class SVMs are based. This makes it possible to apply the structural risk minimization inductive principle to those machines.

...read moreread less

68 citations

Book•

Reliable Reasoning: Induction and Statistical Learning Theory

[...]

Gilbert Harman, Sanjeev R. Kulkarni

01 Jan 2007

TL;DR: Reliable Reasoning provides an admirably clear account of the basic framework of SLT and its implications for inductive reasoning, and discusses various topics in machine learning, including nearest-neighbor methods, neural networks, and support vector machines.

...read moreread less

Abstract: In Reliable Reasoning, Gilbert Harman and Sanjeev Kulkarni -- a philosopher and an engineer -- argue that philosophy and cognitive science can benefit from statistical learning theory (SLT), the theory that lies behind recent advances in machine learning. The philosophical problem of induction, for example, is in part about the reliability of inductive reasoning, where the reliability of a method is measured by its statistically expected percentage of errors -- a central topic in SLT. After discussing philosophical attempts to evade the problem of induction, Harman and Kulkarni provide an admirably clear account of the basic framework of SLT and its implications for inductive reasoning. They explain the Vapnik-Chervonenkis (VC) dimension of a set of hypotheses and distinguish two kinds of inductive reasoning. The authors discuss various topics in machine learning, including nearest-neighbor methods, neural networks, and support vector machines. Finally, they describe transductive reasoning and suggest possible new models of human reasoning suggested by developments in SLT.

...read moreread less

65 citations

Proceedings Article•DOI•

Bearing Fault Diagnosis Based on PCA and SVM

[...]

Lu Shuang, Li Meng¹•Institutions (1)

Changchun University¹

24 Sep 2007

TL;DR: In this article, a new method of fault diagnosis based on principal components analysis (PCA) and support vector machine is presented on the basis of statistical learning theory and the feature analysis of vibrating signal of rolling bearing.

...read moreread less

Abstract: A new method of fault diagnosis based on principal components analysis (PCA) and support vector machine is presented on the basis of statistical learning theory and the feature analysis of vibrating signal of rolling bearing. The key to the fault bearings diagnosis is feature extracting and feature classifying. Multidimensional correlated variable is converted into low dimensional independent eigenvector by means of principal components analysis. The pattern recognition and the nonlinear regression are achieved by the method of support vector machine (SVM). In the light of the feature of vibrating signals, eigenvector is obtained using principal components analysis, fault diagnosis of rolling bearing is recognized correspondingly using support vector machine multiple fault classifier. Theory and experiment show that the recognition of fault diagnosis of rolling bearing based on principal components analysis and support vector machine theory is available in the fault pattern recognizing and provides a new approach to intelligent fault diagnosis.

...read moreread less

56 citations

Journal Article•DOI•

Approximate Test Risk Bound Minimization Through Soft Margin Estimation

[...]

Jinyu Li¹, Ming Yuan, Chin-Hui Lee•Institutions (1)

Georgia Institute of Technology¹

01 Nov 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a hidden Markov model framework using the approximate test risk bound minimization principle.

...read moreread less

Abstract: Inspired by the great success of margin-based classifiers, there is a trend to incorporate the margin concept into hidden Markov modeling for speech recognition. Several attempts based on margin maximization were proposed recently. In this paper, a new discriminative learning framework, called soft margin estimation (SME), is proposed for estimating the parameters of continuous-density hidden Markov models. The proposed method makes direct use of the successful ideas of soft margin in support vector machines to improve generalization capability and decision feedback learning in minimum classification error training to enhance model separation in classifier design. SME is illustrated from a perspective of statistical learning theory. By including a margin in formulating the SME objective function, SME is capable of directly minimizing an approximate test risk bound. Frame selection, utterance selection, and discriminative separation are unified into a single objective function that can be optimized using the generalized probabilistic descent algorithm. Tested on the TIDIGITS connected digit recognition task, the proposed SME approach achieves a string accuracy of 99.43%. On the 5 k-word Wall Street Journal task, SME obtains relative word error rate reductions of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a hidden Markov model framework. Further improvements are expected because the approximate test risk bound minimization principle offers a flexible and rigorous framework to facilitate incorporation of new margin-based optimization criteria into hidden Markov model training.

...read moreread less

Proceedings Article•DOI•

Almost All Learning Machines are Singular

[...]

Sumio Watanabe¹•Institutions (1)

Tokyo Institute of Technology¹

01 Apr 2007

TL;DR: It is proposed that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which it can be proved the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation.

...read moreread less

Abstract: A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, stochastic context-free grammars, and reduced rank regressions are singular. In singular learning machines, the likelihood function can not be approximated by any quadratic form of the parameter. Moreover, neither the distribution of the maximum likelihood estimator nor the Bayes a posteriori distribution converges to the normal distribution, even if the number of training samples tends to infinity. Therefore, the conventional statistical learning theory does not hold in singular learning machines. This paper establishes the new mathematical foundation for singular learning machines. We propose that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which we can prove the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation. The result will be a base on which training algorithms of singular learning machines are devised and optimized

...read moreread less

Journal Article•DOI•

Support vector regression model for the estimation of γ -ray buildup factors for multi-layer shields

[...]

Krešimir Trontl, Tomislav Šmuc, Dubravko Pevec

01 Dec 2007-Annals of Nuclear Energy

TL;DR: The results based on support vector regression machine learning confirm that this approach provides a framework for general, accurate and computationally acceptable multi-layer buildup factor model.

...read moreread less

Journal Article•

Combining PAC-Bayesian and Generic Chaining Bounds

[...]

Jean-Yves Audibert¹, Olivier Bousquet•Institutions (1)

École des ponts ParisTech¹

01 May 2007-Journal of Machine Learning Research

TL;DR: This work combines the PAC-Bayes approach introduced by McAllester (1998), with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand, in a way that also takes into account the variance of the combined functions.

...read moreread less

Abstract: There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds.

...read moreread less

Journal Article•DOI•

On selection of kernel parametes in relevance vector machines for hydrologic applications

[...]

Shivam Tripathi¹, Rao S. Govindaraju¹•Institutions (1)

Purdue University¹

17 Sep 2007-Stochastic Environmental Research and Risk Assessment

TL;DR: This study presents several analytic methods that rely on structural properties of the data rather than expensive re-sampling approaches commonly used in RVM applications and are found to yield robust estimates of parameters for kernel functions.

...read moreread less

Abstract: Recent advances in statistical learning theory have yielded tools that are improving our capabilities for analyzing large and complex datasets. Among such tools, relevance vector machines (RVMs) are finding increasing applications in hydrology because of (1) their excellent generalization properties, and (2) the probabilistic interpretation associated with this technique that yields prediction uncertainty. RVMs combine the strengths of kernel-based methods and Bayesian theory to establish relationships between a set of input vectors and a desired output. However, a bias–variance analysis of RVM estimates revealed that a careful selection of kernel parameters is of paramount importance for achieving good performance from RVMs. In this study, several analytic methods are presented for selection of kernel parameters. These methods rely on structural properties of the data rather than expensive re-sampling approaches commonly used in RVM applications. An analytical expression for prediction risk in leave-one-out cross validation is derived. For brevity, the effectiveness of the proposed methods is assessed first by data generated from the benchmark sinc function, followed by an example involving estimation of hydraulic conductivity values over a field based on observations. It is shown that a straightforward maximization of likelihood function can lead to misleading results. The proposed methods are found to yield robust estimates of parameters for kernel functions.

...read moreread less

Proceedings Article•DOI•

Face Recognition Based on Curvefaces

[...]

Jiulong Zhang, Zhiyu Zhang, Wei Huang, Yanjun Lu, Yinghui Wang - Show less +1 more

24 Aug 2007

TL;DR: A new method called curvefaces was firstly presented for face recognition, which is based on curvelet transform, and the simulation shows that the proposed method is better than wavelet based method.

...read moreread less

Abstract: A new method called curvefaces was firstly presented for face recognition, which is based on curvelet transform. Curvelet is the latest multiscale geometric analysis tool. Contrast to wavelet transform, curvelet transform directly takes edges as the basic representation elements and is anisotropic with strong direction. It is a multiresolution, band pass and directional function analysis method which is useful to represent the image edges and the curved singularities in images more efficiently. It yields a more sparse representation of the image than wavelet and ridgelet transform. In face recognition, the curvelet coefficients can better represent the main features of the faces. The support vector machine (SVM) can then be used to classify the images. SVM is based on the statistical learning theory and is especially valid for small sample set and can get high recognition rate. Multi-class SVM is employed in this paper. The simulation shows that the proposed method is better than wavelet based method.

...read moreread less

Journal Article•DOI•

Regularized estimation for preference disaggregation in multiple criteria decision making

[...]

Michael Doumpos¹, Constantin Zopounidis¹•Institutions (1)

Technical University of Crete¹

01 Sep 2007-Computational Optimization and Applications

TL;DR: This paper proposes simple extensions of existing formulations, based on the concept of regularization which has been introduced within the context of the statistical learning theory, to improve the performance of the proposed formulations over the ones traditionally used in preference disaggregation analysis.

...read moreread less

Abstract: Disaggregation methods have been extensively used in multiple criteria decision making to infer preferential information from reference examples, using linear programming techniques. This paper proposes simple extensions of existing formulations, based on the concept of regularization which has been introduced within the context of the statistical learning theory. The properties of the resulting new formulations are analyzed for both ranking and classification problems and experimental results are presented demonstrating the improved performance of the proposed formulations over the ones traditionally used in preference disaggregation analysis.

...read moreread less

Book Chapter•DOI•

Margin-Based First-Order Rule Learning

[...]

Ulrich Rückert¹, Stefan Kramer¹•Institutions (1)

Technische Universität München¹

01 Mar 2007

TL;DR: This work frames relational learning as a statistical classification problem and applies tools and concepts from statistical learning theory to design a new statistical first-order rule learning system, which is implemented as a stand-alone tool integrating a Prolog engine.

...read moreread less

Abstract: Learning sets of first-order rules has a long tradition in machine learning and inductive logic programming. While most traditional systems follow a separate-and-conquer approach, many modern systems are based on statistical considerations, such as ensemble theory, large margin classification or graphical models. In this work, we frame relational learning as a statistical classificationproblem and apply tools and concepts from statistical learning theory to design a new statistical first-order rule learning system. The system's design is motivated by the goal of finding theoretically well-founded answers to some of the greatest challenges faced by first-order learning systems. First, using strict binary-valued logic as a representation language is known to be suboptimal for noisy, imprecise or uncertain data and background knowledge as frequently encountered in practice. As in many other state-of-the-art rule learning approaches [1], we therefore assign weights to the rules. In this way, a rule set represents a linear classifier and one can optimize margin-basedoptimization criteria, essentially reducing the misclassification error on noisy data. Since we aim at comprehensible models, we employ margins without the kernel trick. Second, the problem of finding a hypothesis that explains the training set is known to be NP-hard even for the simplest possible classifiers, from propositional monomials to linear classifiers. To avoid the computational complexity of optimizing the empirical training error directly, we use a feasible margin-based relaxation, margin minus variance(MMV), as introduced recently for propositional domains [2]. MMV minimization is linear in the number of instances and therefore well-suited for large datasets. Third, in multi-relational learning settings, one can formulate almost arbitrarily complex queries or clauses, to describe a training or test instance. Thus, there is a potentially unlimited number of features that can be used for classification and overfitting avoidance should be of great importance. We derived an error bound based on MMV, giving us a theoretically sound stopping criterion controlling the number of rules in a weighted rule set. The rule generation process is based on traditional first-order rule refinement and declarative language bias. It is possible to choose from a variety of search strategies, from a predefined order of clauses to rearranging the order based on the weights attached to clauses in the model so far. The system is implemented as a stand-alone tool integrating a Prolog engine.

...read moreread less

Journal Article•DOI•

Temperature drift modelling and compensation for a dynamically tuned gyroscope by combining WT and SVM method

[...]

Guoping Xu¹, Weifeng Tian¹, Zhihua Jin¹, Li Qian¹•Institutions (1)

Shanghai Jiao Tong University¹

01 May 2007-Measurement Science and Technology

TL;DR: Modelling and compensation results indicate that the proposed WT-SVM model outperforms the NN and single SVM models, and is feasible and effective in temperature drift modelling and compensation of the DTG.

...read moreread less

Abstract: Temperature drift is the main source of errors affecting the precision and performance of a dynamically tuned gyroscope (DTG). In this paper, the support vector machine (SVM), a novel learning machine based on statistical learning theory (SLT), is described and applied in the temperature drift modelling and compensation to reduce the influence of temperature variation on the output of the DTG and to enhance its precision. To improve the modelling and compensation capability, wavelet transform (WT) is introduced into the SVM model to eliminate any impactive noises. The real temperature drift data set from the long-term measurement system of a certain DTG is employed to validate the effectiveness of the proposed combination strategy. Moreover, the traditional neural network (NN) approach is also investigated as a comparison with the SVM based method. The modelling and compensation results indicate that the proposed WT-SVM model outperforms the NN and single SVM models, and is feasible and effective in temperature drift modelling and compensation of the DTG.

...read moreread less

Journal Article•DOI•

Time-Frequency Learning Machines

[...]

Paul Honeine, Cédric Richard, Patrick Flandrin¹•Institutions (1)

École normale supérieure de Lyon¹

01 Jul 2007-IEEE Transactions on Signal Processing

TL;DR: This paper is a further contribution which extends the framework of the so-called kernel learning machines to time-frequency analysis, showing that some specific reproducing kernels allow these algorithms to operate in the time- frequency domain.

...read moreread less

Abstract: Over the last decade, the theory of reproducing kernels has made a major breakthrough in the field of pattern recognition. It has led to new algorithms, with improved performance and lower computational cost, for nonlinear analysis in high dimensional feature spaces. Our paper is a further contribution which extends the framework of the so-called kernel learning machines to time-frequency analysis, showing that some specific reproducing kernels allow these algorithms to operate in the time-frequency domain. This link offers new perspectives in the field of non-stationary signal analysis, which can benefit from the developments of pattern recognition and statistical learning theory.

...read moreread less

Posted Content•

Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

[...]

Andrea Lecchini-Visintini¹, John Lygeros², Jan M. Maciejowski³•Institutions (3)

University of Leicester¹, ETH Zurich², University of Cambridge³

19 Sep 2007-arXiv: Machine Learning

TL;DR: In this paper, a new general formulation of simulated annealing is introduced, which allows one to guarantee finite-time performance in the optimization of functions of continuous variables, and the results hold universally for any optimization problem on a bounded domain and establish a connection between simulated anealing and convergence of Markov chain Monte Carlo methods on continuous domains.

...read moreread less

Abstract: Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.

...read moreread less

Journal Article•DOI•

Structural risk minimization: a robust method for density‐dependence detection and model selection

[...]

Giorgio Corani¹, Marino Gatto•Institutions (1)

Polytechnic University of Milan¹

01 Jun 2007-Ecography

TL;DR: The findings show that SRM outperforms traditional ICs, because generally a) it recognizes the model underlying the data with higher frequency, and b) it leads to lower errors in out-of-samples predictions, specially apparent with short time series.

...read moreread less

Abstract: Statistically distinguishing density-dependent from density-independent populations and selecting the best demographic model for a given population are problems of primary importance Traditional approaches are PBLR (parametric bootstrapping of likelihood ratios) and Information criteria (IC), such as the Schwarz information criterion (SIC), the Akaike information criterion (AIC) or the Final prediction error (FPE) While PBLR is suitable for choosing from a couple of models, ICs select the best model from among a set of candidates In this paper, we use the Structural risk minimization (SRM) approach SRM is the model selection criterion developed within the Statistical learning theory (SLT), a theory of great generality for modelling and learning with finite samples SRM is almost unknown in the ecological literature and has never been used to analyze time series First, we compare SRM with PBLR in terms of their ability to discriminate between the Malthusian and the density-dependent Ricker model We rigorously repeat the experiments described in a previous study and find out that SRM is equally powerful in detecting density-independence and much more powerful in detecting density-dependence Then, we compare SRM against ICs in terms of their ability to select one of several candidate models; we generate, via stochastic simulation, a huge amount of artificial time series both density-independent and dependent, with and without exogenous covariates, using different dataset sizes, noise levels and parameter values Our findings show that SRM outperforms traditional ICs, because generally a) it recognizes the model underlying the data with higher frequency, and b) it leads to lower errors in out-of-samples predictions SRM superiority is specially apparent with short time series We finally apply SRM to the population records of Alpine ibex Capra ibex living in the Gran Paradiso National Park (Italy), already investigated by other authors via traditional statistical methods; we both analyze their models and introduce some novel ones We show that models that are best according to SRM show also the lowest leave-one-out cross-validation error A widely addressed problem in ecology is the identification of the basic mechanisms underlying the observed course of population abundances In particular, statistically distinguishing density-dependent from independent time series, which is of paramount importance to correctly predict future population abundances, stimu

...read moreread less

Posted Content•

Classification of Images Using Support Vector Machines

[...]

Anthony Gidudu, Greg Hulley, Tshilidzi Marwala

25 Sep 2007-arXiv: Learning

...read moreread less

Abstract: Support Vector Machines (SVMs) are a relatively new supervised classification technique to the land cover mapping community. They have their roots in Statistical Learning Theory and have gained prominence because they are robust, accurate and are effective even when using a small training sample. By their nature SVMs are essentially binary classifiers, however, they can be adopted to handle the multiple classification tasks common in remote sensing studies. The two approaches commonly used are the One-Against-One (1A1) and One-Against-All (1AA) techniques. In this paper, these approaches are evaluated in as far as their impact and implication for land cover mapping. The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach. It is the authors conclusions that ultimately the choice of technique adopted boils down to personal preference and the uniqueness of the dataset at hand.

...read moreread less

Proceedings Article•DOI•

Approximate Test Risk Minimization Through Soft Margin Estimation

[...]

Jinyu Li¹, Sabato Marco Siniscalchi¹, Chin-Hui Lee¹•Institutions (1)

Georgia Institute of Technology¹

15 Apr 2007

TL;DR: This paper illustrates SME from a perspective of statistical learning theory and shows that by including a margin in formulating the SME objective function it is capable of directly minimizing the approximate test risk, while most other training methods intent to minimize only the empirical risks.

...read moreread less

Abstract: In a recent study, we proposed soft margin estimation (SME) to learn parameters of continuous density hidden Markov models (HMMs). Our earlier experiments with connect digit recognition have shown that SME offers great advantages over other state-of-the-art discriminative training methods. In this paper, we illustrate SME from a perspective of statistical learning theory and show that by including a margin in formulating the SME objective function it is capable of directly minimizing the approximate test risk, while most other training methods intent to minimize only the empirical risks. We test SME on the 5k-word Wall Street Journal task, and find the proposed approach achieves a relative word error rate reduction of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition. We also expect further performance improvements in the future because the approximate test risk minimization principle offers a flexible and yet rigorous framework to facilitate easy incorporation of new margin-based optimization criteria into HMM training.

...read moreread less

Proceedings Article•DOI•

Analysis of Partial Discharge Measurement Data Using a Support Vector Machine

[...]

N.F. Ab Aziz, L. Hao¹, Paul Lewin¹•Institutions (1)

University of Southampton¹

11 Dec 2007

TL;DR: This paper investigates the recognition of partial discharge sources by using a statistical learning theory, support vector machine (SVM), and concludes that the frequency domain approach gives a better classification rate.

...read moreread less

Abstract: This paper investigates the recognition of partial discharge sources by using a statistical learning theory, support vector machine (SVM). SVM provides a new approach to pattern classification and has been proven to be successful in fields such as image identification and face recognition. To apply SVM learning in partial discharge classification, data input is very important. The input should be able to fully represent different patterns in an effective way. The determination of features that describe the characteristics of partial discharge signals and the extraction of reliable information from the raw data are the key to acquiring valuable patterns of partial discharge signals. In this paper, data obtained from experiment is carried out in both time and frequency domain. By using appropriate combination of kernel functions and parameters, it is concluded that the frequency domain approach gives a better classification rate.

...read moreread less

Proceedings Article•DOI•

Evaluation of the performances of ANN and SVM techniques used in water quality classification

[...]

M. Bouamar, M. Ladjal

01 Dec 2007

TL;DR: This paper presents a comparative study of two techniques resulting from the field of the artificial intelligence namely: artificial neural networks (ANN), and support vector machines (SVM), developed from the statistical learning theory.

...read moreread less

Abstract: The modern techniques in control and monitoring of drinking water, acquires a particular attention in the last few years. We attend more and more rigorous follow-ups of the quality of this resource, in order to master an effective control of the risks incurred for the public health. Several methods of control were implemented to meet this aim. In this paper, we present a comparative study of two techniques resulting from the field of the artificial intelligence namely: artificial neural networks (ANN), and support vector machines (SVM). Developed from the statistical learning theory, these methods display optimal training performances and generalization in many fields of application, among others the field of pattern recognition. Applied as classification tools, these techniques should ensure within a multi-sensor monitoring system, a direct and quasi permanent control of water quality. In order to evaluate their performances, a simulation corresponding to the recognition rate, the training time, and the robustness, is carried out. To validate their functionalities, an application of control of drinking water quality is presented.

...read moreread less

Statistical and Information-Theoretic Methods for Data-Analysis

[...]

Teemu Roos

09 Jun 2007

TL;DR: This Thesis develops theory and methods for computational data analysis, and applies the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.

...read moreread less

Abstract: In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the informationtheoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals. Computing Reviews (1998)

...read moreread less

Proceedings Article•DOI•

Fault Detection of Oil Pump Based on Classify Support Vector Machine

[...]

Jingwen Tian¹, Meijuan Gao¹, Kai Li², Hao Zhou²•Institutions (2)

Beijing Union University¹, Beijing University of Chemical Technology²

05 Nov 2007

TL;DR: With the ability of strong self-learning and well generalization of SVM, the detection method can truly diagnosticate the fault of oil pump by learning the fault information ofOil pump.

...read moreread less

Abstract: Statistical learning theory is introduced to fault detection of oil pump. Considering the issues that the relationship between the fault of oil pump existent and fault information is a complicated and nonlinear system, and it is very difficult to found the process model to describe it. The support vector machine (SVM) has the ability of strong nonlinear function approach and the ability of strong generalization and also has the feature of global optimization. In this paper, a fault detection method of oil pump based on SVM is presented, moreover, the genetic algorithm(GA) was used to optimize SVM parameters. With the ability of strong self-learning and well generalization of SVM, the detection method can truly diagnosticate the fault of oil pump by learning the fault information of oil pump. The real detection results show that this method is feasible and effective.

...read moreread less

Proceedings Article•DOI•

Paper Currency Verification with Support Vector Machines

[...]

Chin Chen Chang, Tai Xing Yu, Hsuan Yen Yen

16 Dec 2007

TL;DR: This paper shall present a novel method by applying the support vector machine (SVM) approach to distinguish counterfeit banknotes from genuine ones on the basis of the statistical learning theory.

...read moreread less

Abstract: Distinct from conventional techniques where the neural network (NN) is employed to solve the problem of paper currency verification, in this paper, we shall present a novel method by applying the support vector machine (SVM) approach to distinguish counterfeit banknotes from genuine ones. On the basis of the statistical learning theory, SVM has better generalization ability and higher performance especially when it comes to pattern classification. Besides, discrete wavelet transformation (DWT) will also be applied so as to reduce the input scale of SVM. Finally, the results of our experiment will show that the proposed method does achieve very good performance.

...read moreread less

Proceedings Article•DOI•

Revisiting statistical learning theory for uncertain feasibility and optimization problems

[...]

Teodoro Alamo, R. Tempo¹, Eduardo F. Camacho•Institutions (1)

University of Seville¹

01 Dec 2007

TL;DR: Using as a starting point one-side results from statistical learning theory, bounds on the number of required samples that are manageable for "reasonable" values of confidence delta and accuracy isin are obtained.

...read moreread less

Abstract: In this paper, we study two general semi-infinite programming problems by means of statistical learning theory. The sample size results obtained with this approach are generally considered to be very conservative by the control community. The main contribution of this paper is to demonstrate that this is not necessarily the case. Using as a starting point one-side results from statistical learning theory, we obtain bounds on the number of required samples that are manageable for "reasonable" values of confidence delta and accuracy isin. In particular, we provide sample size bounds growing with 1/isin ln 1/isin instead of the usual 1/isin2 ln 1/isin2 dependence.

...read moreread less