An approach to incremental SVM learning algorithm

doi:10.1109/TAI.2000.889881

Home
/
Papers
/
An approach to incremental SVM learning algorithm

Proceedings Article•DOI•

An approach to incremental SVM learning algorithm

Rong Xiao¹, Jicheng Wang¹, Fayan Zhang¹•Institutions (1)

Nanjing University¹

13 Nov 2000-pp 268-273

TL;DR: This paper proposes an intercross iterative approach for training SVM to incremental learning taking the possible impact of new training data to history data each other into account and shows that this approach has more satisfying accuracy in classification precision.

read less

Abstract: The classification algorithm that is based on a support vector machine (SVM) is now attracting more attention, due to its perfect theoretical properties and good empirical results. In this paper, we first analyze the properties of the support vector (SV) set thoroughly, then introduce a new learning method, which extends the SVM classification algorithm to the incremental learning area. The theoretical basis of this algorithm is the classification equivalence of the SV set and the training set. In this algorithm, knowledge is accumulated in the process of incremental learning. In addition, unimportant samples are discarded optimally by a least-recently used (LRU) scheme. Theoretical analyses and experimental results showed that this algorithm could not only speed up the training process, but it could also reduce the storage costs, while the classification precision is also guaranteed.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Comparison of SVM and LS-SVM for Regression

[...]

Haifeng Wang¹, Dejin Hu¹•Institutions (1)

Shanghai Jiao Tong University¹

13 Oct 2005

TL;DR: Comparisons of least squares support vector machines with SVM for regression show that LS-SVM is preferred especially for large scale problem, because its solution procedure is high efficiency and after pruning both sparseness and performance are comparable with those of SVM.

...read moreread less

Abstract: Support vector machines (SVM) has been widely used in classification and nonlinear function estimation. However, the major drawback of SVM is its higher computational burden for the constrained optimization programming. This disadvantage has been overcome by least squares support vector machines (LS-SVM), which solves linear equations instead of a quadratic programming problem. This paper compares LS-SVM with SVM for regression. According to the parallel test results, conclusions can be made that LS-SVM is preferred especially for large scale problem, because its solution procedure is high efficiency and after pruning both sparseness and performance of LS-SVM are comparable with those of SVM

...read moreread less

277 citations

Journal Article•DOI•

Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey

[...]

Igor Škrjanc¹, Jose Antonio Iglesias², Araceli Sanchis², Daniel Leite, Edwin Lughofer³, Fernando Gomide⁴ - Show less +2 more•Institutions (4)

University of Ljubljana¹, Charles III University of Madrid², Johannes Kepler University of Linz³, State University of Campinas⁴

01 Jul 2019-Information Sciences

TL;DR: This survey focuses on evolving fuzzy rule-based models and neuro-fuzzy networks for clustering, classification and regression and system identification in online, real-time environments where learning and model development should be performed incrementally.

...read moreread less

231 citations

Journal Article•DOI•

Creating Evolving User Behavior Profiles Automatically

[...]

Jose Antonio Iglesias¹, Plamen Angelov², Agapito Ledezma¹, Araceli Sanchis¹•Institutions (2)

Charles III University of Madrid¹, Lancaster University²

01 May 2012-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper combines the evolving classifier with a trie-based user profiling to obtain a powerful self-learning online scheme and develops further the recursive formula of the potential of a data point to become a cluster center using cosine distance.

...read moreread less

Abstract: Knowledge about computer users is very beneficial for assisting them, predicting their future actions or detecting masqueraders. In this paper, a new approach for creating and recognizing automatically the behavior profile of a computer user is presented. In this case, a computer user behavior is represented as the sequence of the commands she/he types during her/his work. This sequence is transformed into a distribution of relevant subsequences of commands in order to find out a profile that defines its behavior. Also, because a user profile is not necessarily fixed but rather it evolves/changes, we propose an evolving method to keep up to date the created profiles using an Evolving Systems approach. In this paper, we combine the evolving classifier with a trie-based user profiling to obtain a powerful self-learning online scheme. We also develop further the recursive formula of the potential of a data point to become a cluster center using cosine distance, which is provided in the Appendix. The novel approach proposed in this paper can be applicable to any problem of dynamic/evolving user behavior modeling where it can be represented as a sequence of actions or events. It has been evaluated on several real data streams.

...read moreread less

130 citations

Patent•

Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector

[...]

Xi Long¹, W. Louis Cleveland¹, Y. Lawrence Yao¹•Institutions (1)

Columbia University¹

25 Apr 2007

TL;DR: In this paper, a method of identifying and localizing objects belonging to one of three or more classes, including deriving vectors, each being mapped to objects, where each of the vectors is an element of an N-dimensional space.

...read moreread less

Abstract: A method of identifying and localizing objects belonging to one of three or more classes, includes deriving vectors, each being mapped to one of the objects, where each of the vectors is an element of an N-dimensional space. The method includes training an ensemble of binary classifiers with a CISS technique, using an ECOC technique. For each object corresponding to a class, the method includes calculating a probability that the associated vector belongs to a particular class, using an ECOC probability estimation technique. In another embodiment, increased detection accuracy is achieved by using images obtained with different contrast methods. A nonlinear dimensional reduction technique, Kernel PCA, was employed to extract features from the multi-contrast composite image. The Kernel PCA preprocessing shows improvements over traditional linear PCA preprocessing possibly due to its ability to capture high-order, nonlinear correlations in the high dimensional image space.

...read moreread less

89 citations

Journal Article•DOI•

Comparative study between incremental and ensemble learning on data streams: Case study

[...]

Wenyu Zang¹, Peng Zhang¹, Chuan Zhou¹, Li Guo¹•Institutions (1)

Chinese Academy of Sciences¹

24 Jun 2014-Journal of Big Data

TL;DR: A comparative study between the history of incremental learning and ensemble learning and their performances w.r.t. accuracy and time efficiency, under various concept drift scenarios.

...read moreread less

Abstract: With unlimited growth of real-world data size and increasing requirement of real-time processing, immediate processing of big stream data has become an urgent problem. In stream data, hidden patterns commonly evolve over time (i.e.,concept drift), where many dynamic learning strategies have been proposed, such as the incremental learning and ensemble learning. To the best of our knowledge, there is no work systematically compare these two methods. In this paper we conduct comparative study between theses two learning methods. We first introduce the concept of “concept drift”, and propose how to quantitatively measure it. Then, we recall the history of incremental learning and ensemble learning, introducing milestones of their developments. In experiments, we comprehensively compare and analyze their performances w.r.t. accuracy and time efficiency, under various concept drift scenarios. We conclude with several future possible research problems.

...read moreread less

60 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Additional excerpts

...SVM [2] is a new approach of pattern recognition based on Structural Risk Minimization which is suitable to deal with magnitude features problem with a given finite amount of training data and can construct decision rules that generalize well....
[...]

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

"An approach to incremental SVM lear..." refers background in this paper

...O(N~+LjV~+d&N,) [ 7 ]. In the case of incremental learning (a = 1,p = 0), suppose the incremental set has...
[...]

Journal Article•DOI•

A Tutorial on Support Vector Machines for Pattern Recognition

[...]

Christopher John Burges¹•Institutions (1)

Alcatel-Lucent¹

01 Jun 1998-Data Mining and Knowledge Discovery

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.

...read moreread less

Abstract: The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

...read moreread less

15,696 citations

Proceedings Article•

RSVM: Reduced Support Vector Machines

[...]

Yuh-Jye Lee¹, Olvi L. Mangasarian¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2001

TL;DR: Computational results indicate that test set correctness for the reduced support vector machine (RSVM), with a nonlinear separating surface that depends on a small randomly selected portion of the dataset, is better than that of a conventional support vectors machine (SVM) with aNonlinear surface that explicitly depends on the entire dataset, and much better than a conventional SVM using a small random sample of the data.

...read moreread less

Abstract: An algorithm is proposed which generates a nonlinear kernel-based separating surface that requires as little as 1% of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1% of the data kept. The remainder of the data can be thrown away after solving the optimization problem. This is achieved by making use of a rectangular m×m kernel K(A, Ā) that greatly reduces the size of the quadratic program to be solved and simplifies the characterization of the nonlinear separating surface. Here, the m rows of A represent the original m data points while the m rows of Ā represent a greatly reduced m data points. Computational results indicate that test set correctness for the reduced support vector machine (RSVM), with a nonlinear separating surface that depends on a small randomly selected portion of the dataset, is better than that of a conventional support vector machine (SVM) with a nonlinear surface that explicitly depends on the entire dataset, and much better than a conventional SVM using a small random sample of the data. Computational times, as well as memory usage, are much smaller for RSVM than that of a conventional SVM using the entire dataset.

...read moreread less

691 citations

"An approach to incremental SVM lear..." refers methods in this paper

...Most incremental learning algorithm are based on improving SVM training process by collecting more useful data as support vectors([5-8])....
[...]

Journal Article•DOI•

Lagrangian support vector machines

[...]

Olvi L. Mangasarian¹, David R. Musicant²•Institutions (2)

University of Wisconsin-Madison¹, Carleton College²

01 Sep 2001-Journal of Machine Learning Research

TL;DR: An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed, which leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points.

...read moreread less

Abstract: An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. This problem is solvable by an extremely simple linearly convergent Lagrangian support vector machine (LSVM) algorithm. LSVM requires the inversion at the outset of a single matrix of the order of the much smaller dimensionality of the original input space plus one. The full algorithm is given in this paper in 11 lines of MATLAB code without any special optimization tools such as linear or quadratic programming solvers. This LSVM code can be used "as is" to solve classification problems with millions of points. For example, 2 million points in 10 dimensional input space were classified by a linear surface in 82 minutes on a Pentium III 500 MHz notebook with 384 megabytes of memory (and additional swap space), and in 7 minutes on a 250 MHz UltraSPARC II processor with 2 gigabytes of memory. Other standard classification test problems were also solved. Nonlinear kernel classification can also be solved by LSVM. Although it does not scale up to very large problems, it can handle any positive semidefinite kernel and is guaranteed to converge. A short MATLAB code is also given for nonlinear kernels and tested on a number of problems.

...read moreread less

634 citations