scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2005"


Journal ArticleDOI
TL;DR: Techniques and algorithms developed in the framework of Statistical Learning Theory are applied to the problem of determining the location of a wireless device by measuring the signal strength values from a set of access points (location fingerprinting), with the advantage of a low algorithmic complexity in the normal operating phase.

602 citations


Journal ArticleDOI
TL;DR: In this article, the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces are briefly described, with particular emphasis on a description of the so-called ν-SVM.
Abstract: We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called ν-SVM, including details of the algorithm and its implementation, theoretical results, and practical applications. Copyright © 2005 John Wiley & Sons, Ltd.

410 citations


Journal ArticleDOI
TL;DR: This paper investigated several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), and formulated differentiation of malignant from benign MCs as a supervised learning problem, and applied these learning methods to develop the classification algorithm.
Abstract: In this paper, we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs). The classifier is part of a computer-aided diagnosis (CADx) scheme that is aimed to assisting radiologists in making more accurate diagnoses of breast cancer on mammograms. The methods we considered were: support vector machine (SVM), kernel Fisher discriminant (KFD), relevance vector machine (RVM), and committee machines (ensemble averaging and AdaBoost), of which most have been developed recently in statistical learning theory. We formulated differentiation of malignant from benign MCs as a supervised learning problem, and applied these learning methods to develop the classification algorithm. As input, these methods used image features automatically extracted from clustered MCs. We tested these methods using a database of 697 clinical mammograms from 386 cases, which included a wide spectrum of difficult-to-classify cases. We analyzed the distribution of the cases in this database using the multidimensional scaling technique, which reveals that in the feature space the malignant cases are not trivially separable from the benign ones. We used receiver operating characteristic (ROC) analysis to evaluate and to compare classification performance by the different methods. In addition, we also investigated how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD, and RVM) yielded the best performance (A/sub z/=0.85, SVM), significantly outperforming a well-established, clinically-proven CADx approach that is based on neural network (A/sub z/=0.80).

305 citations


Journal ArticleDOI
TL;DR: A comparison of the two classifiers in off-line signature verification using random, simple and simulated forgeries to observe the capability of the classifiers to absorb intrapersonal variability and highlight interpersonal similarity.

199 citations


Journal ArticleDOI
TL;DR: It is shown that the coarse- grained and fine-grained localization problems for ad hoc sensor networks can be posed and solved as a pattern recognition problem using kernel methods from statistical learning theory, and a simple and effective localization algorithm is derived.
Abstract: We show that the coarse-grained and fine-grained localization problems for ad hoc sensor networks can be posed and solved as a pattern recognition problem using kernel methods from statistical learning theory. This stems from an observation that the kernel function, which is a similarity measure critical to the effectiveness of a kernel-based learning algorithm, can be naturally defined in terms of the matrix of signal strengths received by the sensors. Thus we work in the natural coordinate system provided by the physical devices. This not only allows us to sidestep the difficult ranging procedure required by many existing localization algorithms in the literature, but also enables us to derive a simple and effective localization algorithm. The algorithm is particularly suitable for networks with densely distributed sensors, most of whose locations are unknown. The computations are initially performed at the base sensors, and the computation cost depends only on the number of base sensors. The localization step for each sensor of unknown location is then performed locally in linear time. We present an analysis of the localization error bounds, and provide an evaluation of our algorithm on both simulated and real sensor networks.

198 citations


Journal ArticleDOI
TL;DR: Comparisons between the SVM model and the classical radial basis function (RBF) network demonstrate that the S VM is superior to the conventional RBF network in predicting air quality parameters with different time series and of better generalization performance than the RBF model.

197 citations


Journal ArticleDOI
TL;DR: A multi-layer SVM classifier is applied to fault diagnosis of power transformer for the first time in this paper and shows that the classifier has an excellent performance on training speed and reliability.

193 citations


Journal ArticleDOI
TL;DR: This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed (i.i.d.) training examples from each hypothesis and demonstrates that several concepts from statistical learning theory have counterparts in the NP context.
Abstract: The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any /spl alpha/>0, the NP lemma specifies the most powerful test of size /spl alpha/, but assumes the distributions for each hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed (i.i.d.) training examples from each hypothesis. Building on a "fundamental lemma" of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees.

164 citations


Journal ArticleDOI
TL;DR: The main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces are described, with particular emphasis on a description of the so-called -SVM.
Abstract: We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called -SVM, in...

152 citations


Journal ArticleDOI
TL;DR: A method for estimating preference models that can be highly nonlinear and robust to noise and is based on computationally efficient optimization techniques, which can be useful for analyzing large amounts of data that are noisy or for estimating interactions among product features.
Abstract: We introduce methods from statistical learning theory to the field of conjoint analysis for preference modeling. We present a method for estimating preference models that can be highly nonlinear and robust to noise. Like recently developed polyhedral methods for conjoint analysis, our method is based on computationally efficient optimization techniques. We compare our method with standard logistic regression, hierarchical Bayes, and the polyhedral methods using standard, widely used simulation data. The experiments show that the proposed method handles noise significantly better than both logistic regression and the recent polyhedral methods and is never worse than the best method among the three mentioned above. It can also be used for estimating nonlinearities in preference models faster and better than all other methods. Finally, a simple extension for handling heterogeneity shows promising results relative to hierarchical Bayes. The proposed method can therefore be useful, for example, for analyzing large amounts of data that are noisy or for estimating interactions among product features.

123 citations


Journal ArticleDOI
TL;DR: The proposed PPSVM is a natural and an analytical extension of regular SVMs based on the statistical learning theory and is closer to the Bayes optimal without knowing the distributions.
Abstract: This paper proposes a complete framework of posterior probability support vector machines (PPSVMs) for weighted training samples using modified concepts of risks, linear separability, margin, and optimal hyperplane. Within this framework, a new optimization problem for unbalanced classification problems is formulated and a new concept of support vectors established. Furthermore, a soft PPSVM with an interpretable parameter /spl nu/ is obtained which is similar to the /spl nu/-SVM developed by Scho/spl uml/lkopf et al., and an empirical method for determining the posterior probability is proposed as a new approach to determine /spl nu/. The main advantage of an PPSVM classifier lies in that fact that it is closer to the Bayes optimal without knowing the distributions. To validate the proposed method, two synthetic classification examples are used to illustrate the logical correctness of PPSVMs and their relationship to regular SVMs and Bayesian methods. Several other classification experiments are conducted to demonstrate that the performance of PPSVMs is better than regular SVMs in some cases. Compared with fuzzy support vector machines (FSVMs), the proposed PPSVM is a natural and an analytical extension of regular SVMs based on the statistical learning theory.

Journal ArticleDOI
TL;DR: Support vector machines from statistical learning theory divide a set of labelled credit applicants into subsets of ‘typical’ and ‘critical’ patterns, which leads to improved generalization and linear discriminant analysis with prior training subset selection via SVM.
Abstract: Credit applicants are assigned to good or bad risk classes according to their record of defaulting. Each applicant is described by a high-dimensional input vector of situational characteristics and by an associated class label. A statistical model, which maps the inputs to the labels, can decide whether a new credit applicant should be accepted or rejected, by predicting the class label given the new inputs. Support vector machines (SVM) from statistical learning theory can build such models from the data, requiring extremely weak prior assumptions about the model structure. Furthermore, SVM divide a set of labelled credit applicants into subsets of ‘typical’ and ‘critical’ patterns. The correct class label of a typical pattern is usually very easy to predict, even with linear classification methods. Such patterns do not contain much information about the classification boundary. The critical patterns (the support vectors) contain the less trivial training examples. For instance, linear discriminant analysis with prior training subset selection via SVM also leads to improved generalization. Using non-linear SVM, more ‘surprising’ critical regions may be detected, but owing to the relative sparseness of the data, this potential seems to be limited in credit scoring practice.

Book ChapterDOI
Yu Zhao1, Bing Li1, Xiu Li1, Wenhuang Liu1, Shouju Ren1 
22 Jul 2005
TL;DR: Wang et al. as mentioned in this paper introduced an improved one-class Support Vector Machines (SVM) to predict customer churn in the wireless industry, and the method has shown to perform very well compared with other traditional methods, ANN, Decision Tree, and Naive Bays.
Abstract: Customer Churn Prediction is an increasingly pressing issue in today's ever-competitive commercial arena. Although there are several researches in churn prediction, but the accuracy rate, which is very important to business, is not high enough. Recently, Support Vector Machines (SVMs), based on statistical learning theory, are gaining applications in the areas of data mining, machine learning, computer vision and pattern recognition because of high accuracy and good generalization capability. But there has no report about using SVM to Customer Churn Prediction. According to churn data set characteristic, the number of negative examples is very small, we introduce an improved one-class SVM. And we have tested our method on the wireless industry customer churn data set. Our method has been shown to perform very well compared with other traditional methods, ANN, Decision Tree, and Naive Bays.

Journal ArticleDOI
TL;DR: New learning methods tolerant to imprecision are introduced and applied to fuzzy modeling based on the Takagi-Sugeno-Kang fuzzy system, which results in a model with the minimal Vapnik-Chervonenkis dimension and improved generalization ability of this model and its outliers robustness.
Abstract: In this paper, new learning methods tolerant to imprecision are introduced and applied to fuzzy modeling based on the Takagi-Sugeno-Kang fuzzy system. The fuzzy modeling has an intrinsic inconsistency. It may perform thinking tolerant to imprecision, but learning methods are zero-tolerant to imprecision. The proposed methods make it possible to exclude this intrinsic inconsistency of a fuzzy modeling, where zero-tolerance learning is used to obtain fuzzy model tolerant to imprecision. These new methods can be called /spl epsiv/-insensitive learning or /spl epsiv/ learning, where, in order to fit the fuzzy model to real data, the /spl epsiv/-insensitive loss function is used. This leads to a weighted or "fuzzified" version of Vapnik's support vector regression machine. This paper introduces two approaches to solving the /spl epsiv/-insensitive learning problem. The first approach leads to the quadratic programming problem with bound constraints and one linear equality constraint. The second approach leads to a problem of solving a system of linear inequalities. Two computationally efficient numerical methods for the /spl epsiv/-insensitive learning are proposed. The /spl epsiv/-insensitive learning leads to a model with the minimal Vapnik-Chervonenkis dimension, which results in an improved generalization ability of this model and its outliers robustness. Finally, numerical examples are given to demonstrate the validity of the introduced methods.

Proceedings ArticleDOI
07 Nov 2005
TL;DR: A set of unusual behavior detection algorithm is presented in this paper based on support vector machine (SVM) in order to take the place of traditional predefined-rule suspicious transaction data filtering system.
Abstract: Statistical learning theory (SLT) is introduced to improve the embarrassments of anti-money laundering (AML) intelligence collection. A set of unusual behavior detection algorithm is presented in this paper based on support vector machine (SVM) in order to take the place of traditional predefined-rule suspicious transaction data filtering system. It could efficiently surmount the worst forms of suspicious data analyzing and reporting mechanism among bank branches including enormous data volume, dimensionality disorder with massive variances and feature overload.

01 Jan 2005
TL;DR: This paper argues that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework and suggests that the main replacements of general- ization bounds should be convergence proofs and stability considerations.
Abstract: The goal of this paper is to discuss statistical aspects of clus- tering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those prob- lems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of general- ization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies im- portant questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.

Journal ArticleDOI
TL;DR: This paper presents a systematic optimization-based approach for customer demand forecasting through support vector regression (SVR) analysis based on the recently developed statistical learning theory (Vapnik, 1998) and its applications on SVR.
Abstract: This paper presents a systematic optimization-based approach for customer demand forecasting through support vector regression (SVR) analysis. The proposed methodology is based on the recently developed statistical learning theory (Vapnik, 1998) and its applications on SVR. The proposed three-step algorithm comprises both nonlinear programming (NLP) and linear programming (LP) mathematical model formulations to determine the regression function while the final step employs a recursive methodology to perform customer demand forecasting. Based on historical sales data, the algorithm features an adaptive and flexible regression function able to identify the underlying customer demand patterns from the available training points so as to capture customer behaviour and derive an accurate forecast. The applicability of our proposed methodology is demonstrated by a number of illustrative examples.

01 Jan 2005
TL;DR: This paper describes the Support Vector Machine technology, its relation to the main ideas of Statistical Learning Theory, and shows a universal nature of SVMs.
Abstract: This paper describes the Support Vector Machine (SVM) technology, its relation to the main ideas of Statistical Learning Theory, and shows a universal nature of SVMs. It also contains examples that show a high level of generalization ability of SVMs.

Journal ArticleDOI
TL;DR: The experimental results show that finding the splitting hyperplane is not a trivial task and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space and the utility of GSVM -AR is very good because it is easy to be implemented.

Journal ArticleDOI
TL;DR: The SVM‐based reconstruction is used to develop time series forecasts for multiple lead times ranging from 2 weeks to several months and is able to extract the dynamics using only a few past observed data points out of the training examples.
Abstract: [1] The reconstruction of low-order nonlinear dynamics from the time series of a state variable has been an active area of research in the last decade. The 154 year long, biweekly time series of the Great Salt Lake volume has been analyzed by many researchers from this perspective. In this study, we present the application of a powerful state space reconstruction methodology using the method of support vector machines (SVM) to this data set. SVM are machine learning systems that use a hypothesis space of linear functions in a kernel-induced higher-dimensional feature space. SVM are optimized by minimizing a bound on a generalized error (risk) measure rather than just the mean square error over a training set. Under Mercer's conditions on the kernels the corresponding optimization problems are convex; hence global optimal solutions can be readily computed. The SVM-based reconstruction is used to develop time series forecasts for multiple lead times ranging from 2 weeks to several months. Unlike previously reported methodologies, SVM are able to extract the dynamics using only a few past observed data points out of the training examples. The reliability of the algorithm in learning and forecasting the dynamics is tested using split sample sensitivity analysis, with a particular interest in forecasting extreme states. Efforts are also made to assess variations in predictability as a function of initial conditions and as a function of the degree of extrapolation from the state space used for learning the model.

Journal Article
LI Qinghua1
TL;DR: General picture and development in the domain of SLT and SVM are reviewed; actuality of investigation on SVM is also introduced.
Abstract: General picture and development in the domain of SLT and SVM are reviewed; actuality of investigation on SVM is also introduced.

Proceedings ArticleDOI
06 Dec 2005
TL;DR: To increase the linguistic interpretability of the generated rules, a methodology for extracting fuzzy rules from a trained SVM, where the rule's antecedents are associated with fuzzy sets is proposed.
Abstract: This paper proposes a fuzzy rule extraction method from support vector machines. Support vector machines (SVM) are learning systems based on statistical learning theory that have been successfully applied to a wide variety of application. However, SVM are "black box" models, that is, they generate a solution with linear combination of kernel functions which has a quite difficult interpretation. Methods for rule extraction from trained SVM have already been proposed, however, the rules generated by these methods have, in their antecedents, intervals or functions. This format decreases the interpretability of the generated rules and jeopardizes the knowledge extraction capability. Hence, to increase the linguistic interpretability of the generated rules, we propose in this paper a methodology for extracting fuzzy rules from a trained SVM, where the rule's antecedents are associated with fuzzy sets.

Book ChapterDOI
TL;DR: This work shows that support vector machines are capable of extracting useful information from financial data, although extensive data sets are required in order to fully utilize their classification power.
Abstract: The purpose of this work is to introduce one of the most promising among recently developed statistical techniques – the support vector machine (SVM) – to corporate bankruptcy analysis. An SVM is implemented for analysing such predictors as financial ratios. A method of adapting it to default probability estimation is proposed. A survey of practically applied methods is given. This work shows that support vector machines are capable of extracting useful information from financial data, although extensive data sets are required in order to fully utilize their classification power.

Proceedings ArticleDOI
07 Nov 2005
TL;DR: This paper predicts elevator traffic flow using least squares support vector machines (LS-SVMs), which is a kind of SVM with quadratic loss function, which has greater generalization ability and guarantee global minima for given training data.
Abstract: Elevator traffic flow is fundamental in elevator group control systems. Accurate elevator traffic flow prediction is crucial to the planning and dispatching of elevator group control systems. Support vector machine (SVM) based on statistical learning theory has shown its advantage in regression and prediction. In this paper, we predict elevator traffic flow using least squares support vector machines (LS-SVMs), which is a kind of SVM with quadratic loss function. Since SVM has greater generalization ability and guarantee global minima for given training data, it is believed that we can get good performance for elevator traffic flow with time series prediction. By using LS-SVMs, we built up three elevator traffic flow time series predictors. Experimental results show that the prediction of LS-SVMs get satisfied performance. The proposed elevator traffic flow time series prediction method is of considerable practical value and can be used in other application fields.

Journal ArticleDOI
01 Jan 2005
TL;DR: Support vector machine, a learning machine based on statistical learning theory, is trained through supervised learning to detect architectural distortion and produced more accurate classification results in distinguishing architectural distortion abnormality from normal breast parenchyma.
Abstract: This paper investigates detection of architectural distortion in mammographic images using support vector machine. Hausdorff dimension is used to characterise the texture feature of mammographic images. Support vector machine, a learning machine based on statistical learning theory, is trained through supervised learning to detect architectural distortion. Compared to the Radial Basis Function neural networks, SVM produced more accurate classification results in distinguishing architectural distortion abnormality from normal breast parenchyma.

Proceedings ArticleDOI
07 Nov 2005
TL;DR: Analysis of the experimental results proved that SVM could achieve greater accuracy and faster speed than the BP neural network.
Abstract: A novel method based on SVM for the electric power system short-term load forecasting was presented. The proposed algorithm embodies the structural risk minimization (SRM) principle is more generalized performance and accurate as compared to artificial neural network which embodies the embodies risk minimization (ERM) principle. The theory of the SVM algorithm is based on statistical learning theory. Training of SVM leads to a quadratic programming problem. In order to improve forecast accuracy, the SVM interpolates among the load and temperature data in a training data set. Analysis of the experimental results proved that SVM could achieve greater accuracy and faster speed than the BP neural network.

Journal ArticleDOI
TL;DR: This paper shows that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students.
Abstract: Ensemble learning of K nonlinear perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. This paper shows that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning, and AdaTron (adaptive perceptron) learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is "maintaining variety among students." Results show that AdaTron learning is superior to the other two rules with respect to that affinity.

Journal ArticleDOI
TL;DR: The linear two layer feed forward neural network with back propagation learning rule has been adapted for strain and displacement sensors fusion of a railway bridge load test and the trained NN has been used for structural analysis and finite element (FE) model updating.
Abstract: Field testing of bridge vibrations induced by passage of vehicle is an economic and practical form of bridge load testing. Data processing of this type of tests are usually carried out in a system identification framework using output measurements techniques which are categorized as parametric or nonparametric methods. These methods are based on the theory of probability. Learning theory which stems its origin from two separate disciplines of statistical learning theory and neural networks, presents an efficient and robust framework for data processing of such tests. In this article, the linear two layer feed forward neural network (NN) with back propagation learning rule has been adapted for strain and displacement sensors fusion of a railway bridge load test. The trained NN has been used for structural analysis and finite element (FE) model updating.

Proceedings Article
05 Dec 2005
TL;DR: This work proposes a generative model based on the Delaunay graph of the prototypes and the Expectation-Maximization algorithm to learn the parameters and is a first step towards the construction of a topological model of a set of points grounded on statistics.
Abstract: Given a set of points and a set of prototypes representing them, how to create a graph of the prototypes whose topology accounts for that of the points? This problem had not yet been explored in the framework of statistical learning theory. In this work, we propose a generative model based on the Delaunay graph of the prototypes and the Expectation-Maximization algorithm to learn the parameters. This work is a first step towards the construction of a topological model of a set of points grounded on statistics.

Proceedings ArticleDOI
25 Jul 2005
TL;DR: This paper demonstrates the success of statistical learning theory-based support vector machine (SVM) and sparse Bayesian learning-based relevancevector machine (RVM) to perform reliable predictions and will be utilized to achieve high level inference.
Abstract: There is much concurrent ongoing research to develop, advance and apply new techniques capable of addressing the diverse applications and complexities of data fusion. In this paper we demonstrate the success of statistical learning theory-based support vector machine (SVM) and sparse Bayesian learning-based relevance vector machine (RVM) to perform reliable predictions. The prognostic capability of SVM and RVM will be utilized to achieve high level inference. The plausibility of these techniques is shown by their superior performance in forecasting soil moisture providing exogenous knowledge.