scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2009"


Book
Tie-Yan Liu1
27 Jun 2009
TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

2,515 citations


Journal ArticleDOI
Tie-Yan Liu1
TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.
Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

591 citations


Journal ArticleDOI
TL;DR: A Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets, which makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages.
Abstract: Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

480 citations


Journal ArticleDOI
TL;DR: Support vector machines are a family of machine learning methods originally introduced for the problem of classification and later generalized to various other situations, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision.
Abstract: Support vector machines (SVMs) are a family of machine learning methods, originally introduced for the problem of classification and later generalized to various other situations. They are based on principles of statistical learning theory and convex optimization, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision. Copyright © 2009 John Wiley & Sons, Inc. For further resources related to this article, please visit the WIREs website.

323 citations


Book
03 Aug 2009
TL;DR: Knowledge Discovery with Support Vector Machines (KVM) as mentioned in this paper provides an in-depth, easy-to-follow introduction to support vector machines drawing only from minimal, carefully motivated technical and mathematical background material.
Abstract: An easy-to-follow introduction to support vector machines This book provides an in-depth, easy-to-follow introduction to support vector machines drawing only from minimal, carefully motivated technical and mathematical background material. It begins with a cohesive discussion of machine learning and goes on to cover: Knowledge discovery environments Describing data mathematically Linear decision surfaces and functions Perceptron learning Maximum margin classifiers Support vector machines Elements of statistical learning theory Multi-class classification Regression with support vector machines Novelty detection Complemented with hands-on exercises, algorithm descriptions, and data sets, Knowledge Discovery with Support Vector Machines is an invaluable textbook for advanced undergraduate and graduate courses. It is also an excellent tutorial on support vector machines for professionals who are pursuing research in machine learning and related areas.

274 citations


Book
01 Aug 2009
TL;DR: The theory achieved here underpins accurate estimation techniques in the presence of singularities and lays the foundations for the use of algebraic geometry in statistical learning theory.
Abstract: Sure to be influential, Watanabe's book lays the foundations for the use of algebraic geometry in statistical learning theory. Many models/machines are singular: mixture models, neural networks, HMMs, Bayesian networks, stochastic context-free grammars are major examples. The theory achieved here underpins accurate estimation techniques in the presence of singularities.

255 citations


Journal ArticleDOI
TL;DR: It is proved that there exists a particular ''elastic-net representation'' of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection.

227 citations


01 Jan 2009
TL;DR: Zou and Hastie as discussed by the authors proposed an elastic-net regularization scheme for random-design regression, where the response variable is vector-valued and the prediction functions are linear combinations of elements (features) in an infinite-dimensional dictionary.
Abstract: Within the framework of statistical learning theory we analyze in detail the so-called elastic-net regularization scheme proposed by Zou and Hastie [H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67(2) (2005) 301-320] for the selection of groups of correlated variables. To investigate the statistical properties of this scheme and in particular its consistency properties, we set up a suitable mathematical framework. Our setting is random-design regression where we allow the response variable to be vector-valued and we consider prediction functions which are linear combinations of elements (features) in an infinite-dimensional dictionary. Under the assumption that the regression function admits a sparse representation on the dictionary, we prove that there exists a particular ''elastic-net representation'' of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection. Our results include finite-sample bounds and an adaptive scheme to select the regularization parameter. Moreover, using convex analysis tools, we derive an iterative thresholding algorithm for computing the elastic-net solution which is different from the optimization procedure originally proposed in the above-cited work.

208 citations


Journal ArticleDOI
TL;DR: The experimental results indicate that the SVMG method can achieve higher diagnostic accuracy than IEC three ratios, normal SVM classifier and artificial neural network.
Abstract: Diagnosis of potential faults concealed inside power transformers is the key of ensuring stable electrical power supply to consumers. Support vector machine (SVM) is a new machine learning method based on the statistical learning theory, which is a powerful tool for solving the problem with small sampling, nonlinearity and high dimension. The selection of SVM parameters has an important influence on the classification accuracy of SVM. However, it is very difficult to select appropriate SVM parameters. In this study, support vector machine with genetic algorithm (SVMG) is applied to fault diagnosis of a power transformer, in which genetic algorithm (GA) is used to select appropriate free parameters of SVM. The experimental data from several electric power companies in China are used to illustrate the performance of the proposed SVMG model. The experimental results indicate that the SVMG method can achieve higher diagnostic accuracy than IEC three ratios, normal SVM classifier and artificial neural network.

192 citations


Journal ArticleDOI
TL;DR: A randomized algorithm is proposed that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived, and it is proved that the required sample size is inversely proportional to the accuracy for fixed confidence.
Abstract: In this paper, we study two general semi-infinite programming problems by means of a randomized strategy based on statistical learning theory. The sample size results obtained with this approach are generally considered to be very conservative by the control community. The first main contribution of this paper is to demonstrate that this is not necessarily the case. Utilizing as a starting point one-sided results from statistical learning theory, we obtain bounds on the number of required samples that are manageable for ldquoreasonablerdquo values of probabilistic confidence and accuracy. In particular, we show that the number of required samples grows with the accuracy parameter epsiv as 1/epsivln 1/epsiv , and this is a significant improvement when compared to the existing bounds which depend on 1/epsiv2ln 1/epsiv2. Secondly, we present new results for optimization and feasibility problems involving Boolean expressions consisting of polynomials. In this case, when the accuracy parameter is sufficiently small, an explicit bound that only depends on the number of decision variables, and on the confidence and accuracy parameters is presented. For convex optimization problems, we also prove that the required sample size is inversely proportional to the accuracy for fixed confidence. Thirdly, we propose a randomized algorithm that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived.

191 citations


Book
09 Jun 2009
TL;DR: PREFACE LEARNING FROM GEOSPATIAL DATA Problems and important concepts of machine learning Machine learning algorithms for geospatial data Contents of the book Software description Short review of the literature
Abstract: PREFACE LEARNING FROM GEOSPATIAL DATA Problems and important concepts of machine learning Machine learning algorithms for geospatial data Contents of the book Software description Short review of the literature EXPLORATORY SPATIAL DATA ANALYSIS PRESENTATION OF DATA AND CASE STUDIES Exploratory spatial data analysis Data pre-processing Spatial correlations: Variography Presentation of data k-Nearest neighbours algorithm: a benchmark model for regression and classification Conclusions to chapter GEOSTATISTICS Spatial predictions Geostatistical conditional simulations Spatial classification Software Conclusions ARTIFICIAL NEURAL NETWORKS Introduction Radial basis function neural networks General regression neural networks Probabilistic neural networks Self-organising maps Gaussian mixture models and mixture density network Conclusions SUPPORT VECTOR MACHINES AND KERNEL METHODS Introduction to statistical learning theory Support vector classification Spatial data classification with SVM Support vector regression Advanced topics in kernel methods REFERENCES INDEX

Journal ArticleDOI
TL;DR: A statistical analysis shows that the generalization error afforded agents by the collaborative training algorithm can be bounded in terms of the relationship between the network topology and the representational capacity of the relevant reproducing kernel Hilbert space.
Abstract: In this paper, an algorithm is developed for collaboratively training networks of kernel-linear least-squares regression estimators. The algorithm is shown to distributively solve a relaxation of the classical centralized least-squares regression problem. A statistical analysis shows that the generalization error afforded agents by the collaborative training algorithm can be bounded in terms of the relationship between the network topology and the representational capacity of the relevant reproducing kernel Hilbert space. Numerical experiments suggest that the algorithm is effective at reducing noise. The algorithm is relevant to the problem of distributed learning in wireless sensor networks by virtue of its exploitation of local communication. Several new questions for statistical learning theory are proposed.

Proceedings Article
02 May 2009
TL;DR: In this paper, the authors present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand, based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier.
Abstract: An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. The statistical theory offers rank-based statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects the properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we show how uniform convergence bounds in learning theory can be adjusted for adaptive concept drift detection. In particular, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1-norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier. We compare these new approaches with the maximum mean discrepancy method, the StreamKrimp system, and the multivariate Wald–Wolfowitz test. The results indicate that the new methods are able to detect concept drift reliably and that they perform favorably in a precision-recall analysis. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 311-327, 2009

Journal ArticleDOI
TL;DR: A robust new scheme is presented in this paper for optimally selecting values of the parameters especially that of the scale parameter of the Gaussian kernel function involved in the training of the SVDD model.

01 Jan 2009
TL;DR: Borders on the rates of convergence achievable by active learning are derived, under various noise models and under general conditions on the hypothesis class.
Abstract: I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning.

Journal ArticleDOI
TL;DR: An overview of statistical learning theory methods is provided and their potential for greater use in risk analysis is discussed.

Journal ArticleDOI
TL;DR: The suitability of smooth support vector machines (SSVM), and how important factors such as the selection of appropriate accounting ratios (predictors), length of training period and structure of the training sample influence the precision of prediction are investigated.
Abstract: Variable Selection and Oversampling in the Use of Smooth Support Vector Machines for Predicting the Default Risk of Companies

Book ChapterDOI
09 Apr 2009
TL;DR: Support vector machines (SVM) as discussed by the authors are among the most robust and accurate methods in all well-known data mining algorithms and have a sound theoretical foundation rooted in statistical learning theory, require only as few as a dozen examples for training, and are insensitive to the number of dimensions.
Abstract: Support vector machines (SVMs), including support vector classifier (SVC) and support vector regressor (SVR), are among the most robust and accurate methods in all well-known data mining algorithms. SVMs, which were originally developed by Vapnik in the 1990s [1-11], have a sound theoretical foundation rooted in statistical learning theory, require only as few as a dozen examples for training, and are often insensitive to the number of dimensions. In the past decade, SVMs have been developed at a fast pace both in theory and practice.

Journal ArticleDOI
TL;DR: This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform.

Proceedings ArticleDOI
02 Oct 2009
TL;DR: The experimental results of a negative feedback amplifier circuit indicate that the GA-SVM method can achieve higher diagnostic accuracy than normal SVM classifier and artificial neural network.
Abstract: A soft fault diagnosis method for analog circuits based on Support vector machine (SVM)is developed in this paper. SVM is a novel machine learning method based on the statistical learning theory, which is a powerful tool for solving the problem with small sampling, nonlinearity and high dimension. The multi-classification SVM methods including one versus rest, one versus one, and decision directed acyclic graph(DDAG)has been applied to many areas. Some researchers have used it in fault diagnosis of analog circuit. The selection of SVM parameters has an important influence on the classification accuracy of SVM. However, it is very difficult to select appropriate SVM parameters. In this study, support vector machine with genetic algorithm (GA-SVM) is applied to fault diagnosis, in which genetic algorithm (GA) is used to select appropriate parameters of SVM. The experimental results of a negative feedback amplifier circuit indicate that the GA-SVM method can achieve higher diagnostic accuracy than normal SVM classifier and artificial neural network.

Journal Article
TL;DR: A combination of Support Vector Machine (SVM) and wavelet-based subband image decomposition and decision making was performed in two stages as feature extraction by computing the wavelet coefficients and classification using the classifier trained on the extracted features.
Abstract: In this paper, we investigate an approach for classification of mammographic masses as benign or malign. This study relies on a combination of Support Vector Machine (SVM) and wavelet-based subband image decomposition. Decision making was performed in two stages as feature extraction by computing the wavelet coefficients and classification using the classifier trained on the extracted features. SVM, a learning machine based on statistical learning theory, was trained through supervised learning to classify masses. The research involved 66 digitized mammographic images. The masses were segmented manually by radiologists, prior to introduction to the classification system. Preliminary test on mammogram showed over 84.8% classification accuracy by using the SVM with Radial Basis Function (RBF) kernel. Also confusion matrix, accuracy, sensitivity and specificity analysis with different kernel types were used to show the classification performance of SVM.

Journal ArticleDOI
TL;DR: It is discussed how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.
Abstract: We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

Book ChapterDOI
TL;DR: In this chapter, one of themost popular and intuitive prototype-based classification algorithms, learning vector quantization (LVQ), is revisited, and recent extensions towards automatic metric adaptation are introduced.
Abstract: In this chapter, one of themost popular and intuitive prototype-based classification algorithms, learning vector quantization (LVQ), is revisited, and recent extensions towards automatic metric adaptation are introduced Metric adaptation schemes extend LVQ in two aspects: on the one hand a greater flexibility is achieved since the metric which is essential for the classification is adapted according to the given classification task at hand On the other hand a better interpretability of the results is gained since the metric parameters reveal the relevance of single dimensions as well as correlations which are important for the classification Thereby, the flexibility of the metric can be scaled from a simple diagonal term to full matrices attached locally to the single prototypes These choices result in a more complex form of the classification boundaries of the models, whereby the excellent inherent generalization ability of the classifier is maintained, as can be shown by means of statistical learning theory

Journal ArticleDOI
TL;DR: The paper solves stochastic optimization problems in Reproducing Kernel Hilbert Spaces by sample average approximation combined with Tihonov's regularization and establishes sufficient conditions for uniform convergence of approximate solutions with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size.
Abstract: The paper studies stochastic optimization problems in Reproducing Kernel Hilbert Spaces (RKHS). The objective function of such problems is a mathematical expectation functional depending on decision rules (or strategies), i.e. on functions of observed random parameters. Feasible rules are restricted to belong to a RKHS. This kind of problems arises in on-line decision making and in statistical learning theory. We solve the problem by sample average approximation combined with Tihonov's regularization and establish sufficient conditions for uniform convergence of approximate solutions with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size.

Proceedings ArticleDOI
04 Nov 2009
TL;DR: Support Vector Machines is introduced, a method derived from recent achievements in the statistical learning theory, in classification of geological units based on the source of the Landsat multispectral images, suggesting the usefulness of the proposed classification approach.
Abstract: Quantitative techniques for spatial prediction and classification in geological survey are developing rapidly. The recent applications of machine learning techniques confirm possibilities of their application in this field of research. The paper introduces Support Vector Machines, a method derived from recent achievements in the statistical learning theory, in classification of geological units based on the source of the Landsat multispectral images. The initial experiments suggest the usefulness of the proposed classification approach.

Proceedings ArticleDOI
21 Nov 2009
TL;DR: Support vector machine trained by genetic algorithm (GA-SVM) is adopted to forecast electricity price, in which GA is used to select parameters of SVM, which has better prediction accuracy than radial basis function neural network (RBFNN).
Abstract: Accurate electricity price forecasting can provide crucial information for electricity market participants to make reasonable competing strategies. Support vector machine (SVM) is a novel algorithm based on statistical learning theory, which has greater generalization ability, and is superior to the empirical risk minimization principle as adopted by traditional neural networks. However, its generalization performance depends on a good setting of the training parameters C, for the nonlinear SVM. In the study, support vector machine trained by genetic algorithm (GA-SVM) is adopted to forecast electricity price, in which GA is used to select parameters of SVM. National electricity price data in China from 1996 to 2007 are used to study the forecasting performance of the GA-SVM model. The experimental results show that GA-SVM algorithm has better prediction accuracy than radial basis function neural network (RBFNN).

Dissertation
05 May 2009
TL;DR: This thesis develops a framework under which one can analyze the potential benefits, as measured by the sample complexity of semi-supervised learning, and concludes that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution, semi- supervised learning cannot provide significant advantages over supervised learning.
Abstract: The emergence of a new paradigm in machine learning known as semi-supervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semi-supervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theory-practice gap. We focus on investigating the inherent limitations of the benefits semi-supervised learning can provide over supervised learning. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of semi-supervised learning. Our framework is utopian in the sense that a semi-supervised algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual semi-supervised model. Thus, any lower bound on the sample complexity of semi-supervised learning in this model implies lower bounds in the usual model. Roughly, our conclusion is that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution (“SSL type assumption”), semi-supervised learning cannot provide significant advantages over supervised learning. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a no-prior-knowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class. We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring unlabeled data and doing supervised learning. This thesis concludes with a look into future research directions that builds on our investigation.

Book ChapterDOI
10 Apr 2009
TL;DR: It is proved that a parsimonious fitness ensures universal consistency and a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.
Abstract: This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

Journal ArticleDOI
TL;DR: This paper investigates an approach reducing a multi-class cost-sensitive learning to a standard classification task based on the data space expansion technique developed by Abe et al., which coincides with Elkan's reduction with respect to binary classification tasks.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: From the experimental results, it can be seen that classification based on SVM with FD perform well in EEG signals classification, which indicates this classification method is valid and has promising application.
Abstract: Support vector machine (SVM) is a machine learning technique widely applied in classification problems. SVM are based on the Vapnik's Statistical Learning Theory, and successively extended by a number of researchers. On the order hand, the electroencephalogram (EEG) signal captures the electrical activity of the brain and is an important source of information for studying neurological disorders. In order to extract relevant information of EEG signal, a variety of computerized-analysis methods have been developed. Recent studies indicate that methods based on the nonlinear dynamics theory can extract valuable information from neuronal dynamics. However, many these of methods need large amount of data and are computationally expensive. From chaos theory, a global value that is relatively simple to compute is the fractal dimension (FD), it can be used to measure the geometrical complexity of a time series. The FD of a waveform represents a powerful tool for transient detection. In analysis of EEG this feature can been used to identify and distinguish specific states of physiologic function. A variety of algorithms are available for the computation of FD. In this work, we employ SVM to classify the EEG signals from healthy subjects and epileptic subjects using as the features vector the FD. From the experimental results, we can see that classification based on SVM with FD perform well in EEG signals classification, which indicates this classification method is valid and has promising application.