scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2007"


Book ChapterDOI
14 Feb 2007
TL;DR: Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner, and are a group of supervised learning methods that can be applied to classification or regression.
Abstract: Kernel-based techniques (such as support vector machines, Bayes point machines, kernel principal component analysis, and Gaussian processes) represent a major development in machine learning algorithms. Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression. In a short period of time, SVM found numerous applications in chemistry, such as in drug design (discriminating between ligands and nonligands, inhibitors and noninhibitors, etc.), quantitative structure-activity relationships (QSAR, where SVM regression is used to predict various physical, chemical, or biological properties), chemometrics (optimization of chromatographic separation or compound concentration prediction from spectral data as examples), sensors (for qualitative and quantitative prediction from sensor data), chemical engineering (fault detection and modeling of industrial processes), and text mining (automatic recognition of scientific information). Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner. The SVM algorithm is based on the statistical learning theory and the Vapnik–Chervonenkis

375 citations


Book
30 Jun 2007
TL;DR: An alternative selection scheme based on relative bounds between estimators is described and study, and a two step localization technique which can handle the selection of a parametric model from a family of those is presented.
Abstract: This monograph deals with adaptive supervised classification, using tools borrowed from statistical mechanics and information theory, stemming from the PACBayesian approach pioneered by David McAllester and applied to a conception of statistical learning theory forged by Vladimir Vapnik. Using convex analysis on the set of posterior probability measures, we show how to get local measures of the complexity of the classification model involving the relative entropy of posterior distributions with respect to Gibbs posterior measures. We then discuss relative bounds, comparing the generalization error of two classification rules, showing how the margin assumption of Mammen and Tsybakov can be replaced with some empirical measure of the covariance structure of the classification model.We show how to associate to any posterior distribution an effective temperature relating it to the Gibbs prior distribution with the same level of expected error rate, and how to estimate this effective temperature from data, resulting in an estimator whose expected error rate converges according to the best possible power of the sample size adaptively under any margin and parametric complexity assumptions. We describe and study an alternative selection scheme based on relative bounds between estimators, and present a two step localization technique which can handle the selection of a parametric model from a family of those. We show how to extend systematically all the results obtained in the inductive setting to transductive learning, and use this to improve Vapnik's generalization bounds, extending them to the case when the sample is made of independent non-identically distributed pairs of patterns and labels. Finally we review briefly the construction of Support Vector Machines and show how to derive generalization bounds for them, measuring the complexity either through the number of support vectors or through the value of the transductive or inductive margin.

369 citations


Journal ArticleDOI
TL;DR: The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies.
Abstract: In broad terms, there are two approaches to damage identification Model-driven methods establish a high-fidelity physical model of the structure, usually by finite element analysis, and then establish a comparison metric between the model and the measured data from the real structure If the model is for a system or structure in normal (ie undamaged) condition, any departures indicate that the structure has deviated from normal condition and damage is inferred Data-driven approaches also establish a model, but this is usually a statistical representation of the system, eg a probability density function of the normal condition Departures from normality are then signalled by measured data appearing in regions of very low density The algorithms that have been developed over the years for data-driven approaches are mainly drawn from the discipline of pattern recognition, or more broadly, machine learning The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies

342 citations


Journal ArticleDOI
TL;DR: A novel SVM classification system for voltage disturbances with high accuracy in classification with training data from one power network and unseen testing data from another and lower accuracy when the SVM classifier was trained on synthetic data and test data originated from the power network.
Abstract: The support vector machine (SVM) is a powerful method for statistical classification of data used in a number of different applications. However, the usefulness of the method in a commercial available system is very much dependent on whether the SVM classifier can be pretrained from a factory since it is not realistic that the SVM classifier must be trained by the customers themselves before it can be used. This paper proposes a novel SVM classification system for voltage disturbances. The performance of the proposed SVM classifier is investigated when the voltage disturbance data used for training and testing originated from different sources. The data used in the experiments were obtained from both real disturbances recorded in two different power networks and from synthetic data. The experimental results shown high accuracy in classification with training data from one power network and unseen testing data from another. High accuracy was also achieved when the SVM classifier was trained on data from a real power network and test data originated from synthetic data. A lower accuracy resulted when the SVM classifier was trained on synthetic data and test data originated from the power network.

195 citations


Book ChapterDOI
01 Jan 2007

127 citations


Journal ArticleDOI
TL;DR: A novel multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation, and is compared against some competitors in terms of accuracy of prediction.
Abstract: Binary support vector machines (SVMs) have been proven to deliver high performance. In multiclass classification, however, issues remain with respect to variable selection. One challenging issue is classification and variable selection in the presence of variables in the magnitude of thousands, greatly exceeding the size of training sample. This often occurs in genomics classification. To meet the challenge, this article proposes a novel multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation. The proposed methodology, together with the developed regularization solution path, permits variable selection in such a situation. For the proposed methodology, a statistical learning theory is developed to quantify the generalization error in an attempt to gain insight into the basic structure of sparse learning, permitting the number of variables to greatly exceed the sample size. The operating characteristics of the...

115 citations


Posted Content
TL;DR: The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach.
Abstract: Support Vector Machines (SVMs) are a relatively new supervised classification technique to the land cover mapping community They have their roots in Statistical Learning Theory and have gained prominence because they are robust, accurate and are effective even when using a small training sample By their nature SVMs are essentially binary classifiers, however, they can be adopted to handle the multiple classification tasks common in remote sensing studies The two approaches commonly used are the One-Against-One (1A1) and One-Against-All (1AA) techniques In this paper, these approaches are evaluated in as far as their impact and implication for land cover mapping The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach It is the authors conclusion therefore that ultimately the choice of technique adopted boils down to personal preference and the uniqueness of the dataset at hand

87 citations


Journal Article
TL;DR: A VC theory of large margin multi-category classifiers is introduced, central in this theory are generalized VC dimensions called the γ-Ψ-dimensions, which make it possible to apply the structural risk minimization inductive principle to those machines.
Abstract: In the context of discriminant analysis, Vapnik's statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binary-valued functions, the computation of dichotomies with real-valued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself. The case of classes of vector-valued functions used to compute polytomies has seldom been considered independently, which is unsatisfactory, for three main reasons. First, this case encompasses the other ones. Second, it cannot be treated appropriately through a naive extension of the results devoted to the computation of dichotomies. Third, most of the classification problems met in practice involve multiple categories. In this paper, a VC theory of large margin multi-category classifiers is introduced. Central in this theory are generalized VC dimensions called the γ-Ψ-dimensions. First, a uniform convergence bound on the risk of the classifiers of interest is derived. The capacity measure involved in this bound is a covering number. This covering number can be upper bounded in terms of the γ-Ψ-dimensions thanks to generalizations of Sauer's lemma, as is illustrated in the specific case of the scale-sensitive Natarajan dimension. A bound on this latter dimension is then computed for the class of functions on which multi-class SVMs are based. This makes it possible to apply the structural risk minimization inductive principle to those machines.

68 citations


Book
01 Jan 2007
TL;DR: Reliable Reasoning provides an admirably clear account of the basic framework of SLT and its implications for inductive reasoning, and discusses various topics in machine learning, including nearest-neighbor methods, neural networks, and support vector machines.
Abstract: In Reliable Reasoning, Gilbert Harman and Sanjeev Kulkarni -- a philosopher and an engineer -- argue that philosophy and cognitive science can benefit from statistical learning theory (SLT), the theory that lies behind recent advances in machine learning. The philosophical problem of induction, for example, is in part about the reliability of inductive reasoning, where the reliability of a method is measured by its statistically expected percentage of errors -- a central topic in SLT. After discussing philosophical attempts to evade the problem of induction, Harman and Kulkarni provide an admirably clear account of the basic framework of SLT and its implications for inductive reasoning. They explain the Vapnik-Chervonenkis (VC) dimension of a set of hypotheses and distinguish two kinds of inductive reasoning. The authors discuss various topics in machine learning, including nearest-neighbor methods, neural networks, and support vector machines. Finally, they describe transductive reasoning and suggest possible new models of human reasoning suggested by developments in SLT.

65 citations


Proceedings ArticleDOI
24 Sep 2007
TL;DR: In this article, a new method of fault diagnosis based on principal components analysis (PCA) and support vector machine is presented on the basis of statistical learning theory and the feature analysis of vibrating signal of rolling bearing.
Abstract: A new method of fault diagnosis based on principal components analysis (PCA) and support vector machine is presented on the basis of statistical learning theory and the feature analysis of vibrating signal of rolling bearing. The key to the fault bearings diagnosis is feature extracting and feature classifying. Multidimensional correlated variable is converted into low dimensional independent eigenvector by means of principal components analysis. The pattern recognition and the nonlinear regression are achieved by the method of support vector machine (SVM). In the light of the feature of vibrating signals, eigenvector is obtained using principal components analysis, fault diagnosis of rolling bearing is recognized correspondingly using support vector machine multiple fault classifier. Theory and experiment show that the recognition of fault diagnosis of rolling bearing based on principal components analysis and support vector machine theory is available in the fault pattern recognizing and provides a new approach to intelligent fault diagnosis.

56 citations


Journal ArticleDOI
TL;DR: This is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a hidden Markov model framework using the approximate test risk bound minimization principle.
Abstract: Inspired by the great success of margin-based classifiers, there is a trend to incorporate the margin concept into hidden Markov modeling for speech recognition. Several attempts based on margin maximization were proposed recently. In this paper, a new discriminative learning framework, called soft margin estimation (SME), is proposed for estimating the parameters of continuous-density hidden Markov models. The proposed method makes direct use of the successful ideas of soft margin in support vector machines to improve generalization capability and decision feedback learning in minimum classification error training to enhance model separation in classifier design. SME is illustrated from a perspective of statistical learning theory. By including a margin in formulating the SME objective function, SME is capable of directly minimizing an approximate test risk bound. Frame selection, utterance selection, and discriminative separation are unified into a single objective function that can be optimized using the generalized probabilistic descent algorithm. Tested on the TIDIGITS connected digit recognition task, the proposed SME approach achieves a string accuracy of 99.43%. On the 5 k-word Wall Street Journal task, SME obtains relative word error rate reductions of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a hidden Markov model framework. Further improvements are expected because the approximate test risk bound minimization principle offers a flexible and rigorous framework to facilitate incorporation of new margin-based optimization criteria into hidden Markov model training.

Proceedings ArticleDOI
01 Apr 2007
TL;DR: It is proposed that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which it can be proved the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation.
Abstract: A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, stochastic context-free grammars, and reduced rank regressions are singular. In singular learning machines, the likelihood function can not be approximated by any quadratic form of the parameter. Moreover, neither the distribution of the maximum likelihood estimator nor the Bayes a posteriori distribution converges to the normal distribution, even if the number of training samples tends to infinity. Therefore, the conventional statistical learning theory does not hold in singular learning machines. This paper establishes the new mathematical foundation for singular learning machines. We propose that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which we can prove the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation. The result will be a base on which training algorithms of singular learning machines are devised and optimized

Journal ArticleDOI
TL;DR: The results based on support vector regression machine learning confirm that this approach provides a framework for general, accurate and computationally acceptable multi-layer buildup factor model.

Journal Article
TL;DR: This work combines the PAC-Bayes approach introduced by McAllester (1998), with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand, in a way that also takes into account the variance of the combined functions.
Abstract: There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds.

Journal ArticleDOI
TL;DR: This study presents several analytic methods that rely on structural properties of the data rather than expensive re-sampling approaches commonly used in RVM applications and are found to yield robust estimates of parameters for kernel functions.
Abstract: Recent advances in statistical learning theory have yielded tools that are improving our capabilities for analyzing large and complex datasets. Among such tools, relevance vector machines (RVMs) are finding increasing applications in hydrology because of (1) their excellent generalization properties, and (2) the probabilistic interpretation associated with this technique that yields prediction uncertainty. RVMs combine the strengths of kernel-based methods and Bayesian theory to establish relationships between a set of input vectors and a desired output. However, a bias–variance analysis of RVM estimates revealed that a careful selection of kernel parameters is of paramount importance for achieving good performance from RVMs. In this study, several analytic methods are presented for selection of kernel parameters. These methods rely on structural properties of the data rather than expensive re-sampling approaches commonly used in RVM applications. An analytical expression for prediction risk in leave-one-out cross validation is derived. For brevity, the effectiveness of the proposed methods is assessed first by data generated from the benchmark sinc function, followed by an example involving estimation of hydraulic conductivity values over a field based on observations. It is shown that a straightforward maximization of likelihood function can lead to misleading results. The proposed methods are found to yield robust estimates of parameters for kernel functions.

Proceedings ArticleDOI
24 Aug 2007
TL;DR: A new method called curvefaces was firstly presented for face recognition, which is based on curvelet transform, and the simulation shows that the proposed method is better than wavelet based method.
Abstract: A new method called curvefaces was firstly presented for face recognition, which is based on curvelet transform. Curvelet is the latest multiscale geometric analysis tool. Contrast to wavelet transform, curvelet transform directly takes edges as the basic representation elements and is anisotropic with strong direction. It is a multiresolution, band pass and directional function analysis method which is useful to represent the image edges and the curved singularities in images more efficiently. It yields a more sparse representation of the image than wavelet and ridgelet transform. In face recognition, the curvelet coefficients can better represent the main features of the faces. The support vector machine (SVM) can then be used to classify the images. SVM is based on the statistical learning theory and is especially valid for small sample set and can get high recognition rate. Multi-class SVM is employed in this paper. The simulation shows that the proposed method is better than wavelet based method.

Journal ArticleDOI
TL;DR: This paper proposes simple extensions of existing formulations, based on the concept of regularization which has been introduced within the context of the statistical learning theory, to improve the performance of the proposed formulations over the ones traditionally used in preference disaggregation analysis.
Abstract: Disaggregation methods have been extensively used in multiple criteria decision making to infer preferential information from reference examples, using linear programming techniques. This paper proposes simple extensions of existing formulations, based on the concept of regularization which has been introduced within the context of the statistical learning theory. The properties of the resulting new formulations are analyzed for both ranking and classification problems and experimental results are presented demonstrating the improved performance of the proposed formulations over the ones traditionally used in preference disaggregation analysis.

Book ChapterDOI
01 Mar 2007
TL;DR: This work frames relational learning as a statistical classification problem and applies tools and concepts from statistical learning theory to design a new statistical first-order rule learning system, which is implemented as a stand-alone tool integrating a Prolog engine.
Abstract: Learning sets of first-order rules has a long tradition in machine learning and inductive logic programming. While most traditional systems follow a separate-and-conquer approach, many modern systems are based on statistical considerations, such as ensemble theory, large margin classification or graphical models. In this work, we frame relational learning as a statistical classificationproblem and apply tools and concepts from statistical learning theory to design a new statistical first-order rule learning system. The system's design is motivated by the goal of finding theoretically well-founded answers to some of the greatest challenges faced by first-order learning systems. First, using strict binary-valued logic as a representation language is known to be suboptimal for noisy, imprecise or uncertain data and background knowledge as frequently encountered in practice. As in many other state-of-the-art rule learning approaches [1], we therefore assign weights to the rules. In this way, a rule set represents a linear classifier and one can optimize margin-basedoptimization criteria, essentially reducing the misclassification error on noisy data. Since we aim at comprehensible models, we employ margins without the kernel trick. Second, the problem of finding a hypothesis that explains the training set is known to be NP-hard even for the simplest possible classifiers, from propositional monomials to linear classifiers. To avoid the computational complexity of optimizing the empirical training error directly, we use a feasible margin-based relaxation, margin minus variance(MMV), as introduced recently for propositional domains [2]. MMV minimization is linear in the number of instances and therefore well-suited for large datasets. Third, in multi-relational learning settings, one can formulate almost arbitrarily complex queries or clauses, to describe a training or test instance. Thus, there is a potentially unlimited number of features that can be used for classification and overfitting avoidance should be of great importance. We derived an error bound based on MMV, giving us a theoretically sound stopping criterion controlling the number of rules in a weighted rule set. The rule generation process is based on traditional first-order rule refinement and declarative language bias. It is possible to choose from a variety of search strategies, from a predefined order of clauses to rearranging the order based on the weights attached to clauses in the model so far. The system is implemented as a stand-alone tool integrating a Prolog engine.

Journal ArticleDOI
TL;DR: Modelling and compensation results indicate that the proposed WT-SVM model outperforms the NN and single SVM models, and is feasible and effective in temperature drift modelling and compensation of the DTG.
Abstract: Temperature drift is the main source of errors affecting the precision and performance of a dynamically tuned gyroscope (DTG). In this paper, the support vector machine (SVM), a novel learning machine based on statistical learning theory (SLT), is described and applied in the temperature drift modelling and compensation to reduce the influence of temperature variation on the output of the DTG and to enhance its precision. To improve the modelling and compensation capability, wavelet transform (WT) is introduced into the SVM model to eliminate any impactive noises. The real temperature drift data set from the long-term measurement system of a certain DTG is employed to validate the effectiveness of the proposed combination strategy. Moreover, the traditional neural network (NN) approach is also investigated as a comparison with the SVM based method. The modelling and compensation results indicate that the proposed WT-SVM model outperforms the NN and single SVM models, and is feasible and effective in temperature drift modelling and compensation of the DTG.

Journal ArticleDOI
TL;DR: This paper is a further contribution which extends the framework of the so-called kernel learning machines to time-frequency analysis, showing that some specific reproducing kernels allow these algorithms to operate in the time- frequency domain.
Abstract: Over the last decade, the theory of reproducing kernels has made a major breakthrough in the field of pattern recognition. It has led to new algorithms, with improved performance and lower computational cost, for nonlinear analysis in high dimensional feature spaces. Our paper is a further contribution which extends the framework of the so-called kernel learning machines to time-frequency analysis, showing that some specific reproducing kernels allow these algorithms to operate in the time-frequency domain. This link offers new perspectives in the field of non-stationary signal analysis, which can benefit from the developments of pattern recognition and statistical learning theory.

Posted Content
TL;DR: In this paper, a new general formulation of simulated annealing is introduced, which allows one to guarantee finite-time performance in the optimization of functions of continuous variables, and the results hold universally for any optimization problem on a bounded domain and establish a connection between simulated anealing and convergence of Markov chain Monte Carlo methods on continuous domains.
Abstract: Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.

Journal ArticleDOI
TL;DR: The findings show that SRM outperforms traditional ICs, because generally a) it recognizes the model underlying the data with higher frequency, and b) it leads to lower errors in out-of-samples predictions, specially apparent with short time series.
Abstract: Statistically distinguishing density-dependent from density-independent populations and selecting the best demographic model for a given population are problems of primary importance Traditional approaches are PBLR (parametric bootstrapping of likelihood ratios) and Information criteria (IC), such as the Schwarz information criterion (SIC), the Akaike information criterion (AIC) or the Final prediction error (FPE) While PBLR is suitable for choosing from a couple of models, ICs select the best model from among a set of candidates In this paper, we use the Structural risk minimization (SRM) approach SRM is the model selection criterion developed within the Statistical learning theory (SLT), a theory of great generality for modelling and learning with finite samples SRM is almost unknown in the ecological literature and has never been used to analyze time series First, we compare SRM with PBLR in terms of their ability to discriminate between the Malthusian and the density-dependent Ricker model We rigorously repeat the experiments described in a previous study and find out that SRM is equally powerful in detecting density-independence and much more powerful in detecting density-dependence Then, we compare SRM against ICs in terms of their ability to select one of several candidate models; we generate, via stochastic simulation, a huge amount of artificial time series both density-independent and dependent, with and without exogenous covariates, using different dataset sizes, noise levels and parameter values Our findings show that SRM outperforms traditional ICs, because generally a) it recognizes the model underlying the data with higher frequency, and b) it leads to lower errors in out-of-samples predictions SRM superiority is specially apparent with short time series We finally apply SRM to the population records of Alpine ibex Capra ibex living in the Gran Paradiso National Park (Italy), already investigated by other authors via traditional statistical methods; we both analyze their models and introduce some novel ones We show that models that are best according to SRM show also the lowest leave-one-out cross-validation error A widely addressed problem in ecology is the identification of the basic mechanisms underlying the observed course of population abundances In particular, statistically distinguishing density-dependent from independent time series, which is of paramount importance to correctly predict future population abundances, stimu

Posted Content
TL;DR: The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach.
Abstract: Support Vector Machines (SVMs) are a relatively new supervised classification technique to the land cover mapping community. They have their roots in Statistical Learning Theory and have gained prominence because they are robust, accurate and are effective even when using a small training sample. By their nature SVMs are essentially binary classifiers, however, they can be adopted to handle the multiple classification tasks common in remote sensing studies. The two approaches commonly used are the One-Against-One (1A1) and One-Against-All (1AA) techniques. In this paper, these approaches are evaluated in as far as their impact and implication for land cover mapping. The main finding from this research is that whereas the 1AA technique is more predisposed to yielding unclassified and mixed pixels, the resulting classification accuracy is not significantly different from 1A1 approach. It is the authors conclusions that ultimately the choice of technique adopted boils down to personal preference and the uniqueness of the dataset at hand.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This paper illustrates SME from a perspective of statistical learning theory and shows that by including a margin in formulating the SME objective function it is capable of directly minimizing the approximate test risk, while most other training methods intent to minimize only the empirical risks.
Abstract: In a recent study, we proposed soft margin estimation (SME) to learn parameters of continuous density hidden Markov models (HMMs). Our earlier experiments with connect digit recognition have shown that SME offers great advantages over other state-of-the-art discriminative training methods. In this paper, we illustrate SME from a perspective of statistical learning theory and show that by including a margin in formulating the SME objective function it is capable of directly minimizing the approximate test risk, while most other training methods intent to minimize only the empirical risks. We test SME on the 5k-word Wall Street Journal task, and find the proposed approach achieves a relative word error rate reduction of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition. We also expect further performance improvements in the future because the approximate test risk minimization principle offers a flexible and yet rigorous framework to facilitate easy incorporation of new margin-based optimization criteria into HMM training.

Proceedings ArticleDOI
11 Dec 2007
TL;DR: This paper investigates the recognition of partial discharge sources by using a statistical learning theory, support vector machine (SVM), and concludes that the frequency domain approach gives a better classification rate.
Abstract: This paper investigates the recognition of partial discharge sources by using a statistical learning theory, support vector machine (SVM). SVM provides a new approach to pattern classification and has been proven to be successful in fields such as image identification and face recognition. To apply SVM learning in partial discharge classification, data input is very important. The input should be able to fully represent different patterns in an effective way. The determination of features that describe the characteristics of partial discharge signals and the extraction of reliable information from the raw data are the key to acquiring valuable patterns of partial discharge signals. In this paper, data obtained from experiment is carried out in both time and frequency domain. By using appropriate combination of kernel functions and parameters, it is concluded that the frequency domain approach gives a better classification rate.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: This paper presents a comparative study of two techniques resulting from the field of the artificial intelligence namely: artificial neural networks (ANN), and support vector machines (SVM), developed from the statistical learning theory.
Abstract: The modern techniques in control and monitoring of drinking water, acquires a particular attention in the last few years. We attend more and more rigorous follow-ups of the quality of this resource, in order to master an effective control of the risks incurred for the public health. Several methods of control were implemented to meet this aim. In this paper, we present a comparative study of two techniques resulting from the field of the artificial intelligence namely: artificial neural networks (ANN), and support vector machines (SVM). Developed from the statistical learning theory, these methods display optimal training performances and generalization in many fields of application, among others the field of pattern recognition. Applied as classification tools, these techniques should ensure within a multi-sensor monitoring system, a direct and quasi permanent control of water quality. In order to evaluate their performances, a simulation corresponding to the recognition rate, the training time, and the robustness, is carried out. To validate their functionalities, an application of control of drinking water quality is presented.

09 Jun 2007
TL;DR: This Thesis develops theory and methods for computational data analysis, and applies the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Abstract: In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the informationtheoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals. Computing Reviews (1998)

Proceedings ArticleDOI
05 Nov 2007
TL;DR: With the ability of strong self-learning and well generalization of SVM, the detection method can truly diagnosticate the fault of oil pump by learning the fault information ofOil pump.
Abstract: Statistical learning theory is introduced to fault detection of oil pump. Considering the issues that the relationship between the fault of oil pump existent and fault information is a complicated and nonlinear system, and it is very difficult to found the process model to describe it. The support vector machine (SVM) has the ability of strong nonlinear function approach and the ability of strong generalization and also has the feature of global optimization. In this paper, a fault detection method of oil pump based on SVM is presented, moreover, the genetic algorithm(GA) was used to optimize SVM parameters. With the ability of strong self-learning and well generalization of SVM, the detection method can truly diagnosticate the fault of oil pump by learning the fault information of oil pump. The real detection results show that this method is feasible and effective.

Proceedings ArticleDOI
16 Dec 2007
TL;DR: This paper shall present a novel method by applying the support vector machine (SVM) approach to distinguish counterfeit banknotes from genuine ones on the basis of the statistical learning theory.
Abstract: Distinct from conventional techniques where the neural network (NN) is employed to solve the problem of paper currency verification, in this paper, we shall present a novel method by applying the support vector machine (SVM) approach to distinguish counterfeit banknotes from genuine ones. On the basis of the statistical learning theory, SVM has better generalization ability and higher performance especially when it comes to pattern classification. Besides, discrete wavelet transformation (DWT) will also be applied so as to reduce the input scale of SVM. Finally, the results of our experiment will show that the proposed method does achieve very good performance.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: Using as a starting point one-side results from statistical learning theory, bounds on the number of required samples that are manageable for "reasonable" values of confidence delta and accuracy isin are obtained.
Abstract: In this paper, we study two general semi-infinite programming problems by means of statistical learning theory. The sample size results obtained with this approach are generally considered to be very conservative by the control community. The main contribution of this paper is to demonstrate that this is not necessarily the case. Using as a starting point one-side results from statistical learning theory, we obtain bounds on the number of required samples that are manageable for "reasonable" values of confidence delta and accuracy isin. In particular, we provide sample size bounds growing with 1/isin ln 1/isin instead of the usual 1/isin2 ln 1/isin2 dependence.