scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Least Squares Support Vector Machine Classifiers

01 Jun 1999-Neural Processing Letters (Kluwer Academic Publishers)-Vol. 9, Iss: 3, pp 293-300
TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.
Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

Summary (1 min read)

1. Introduction

  • Recently, support vector machines (Vapnik, 1995; Vapknik, 1998a; Vapnik, 1998b) have been introduced for solving pattern recognition problems.
  • In this method one maps the data into a higher dimensional input space and one constructs an optimal separating hyperplane in this space.
  • Later, the support vector method was extended for solving function estimation problems.
  • In Section 3 the authors discuss the least squares support vector machine classifiers.
  • In Section 4 examples are given to illustrate the support values and on a two-spiral benchmark problem.

2. Support Vector Machines for Classification

  • In this Section the authors shortly review some basic work on support vector machines (SVM) for classification problems.
  • XTk x+θ] (two layer neural SVM), whereσ, κ andθ are constants.
  • Because the matrix associated with this quadratic programming problem is not indefinite, the solution to (11) will be global (Fletcher, 1987).

3. Least Squares Support Vector Machines

  • N∑ k=1 αk{yk[wTϕ(xk)+ b] − 1+ ek}, (17) whereαk are Lagrange multipliers (which can be either positive or negative now due to the equality constraints as follows from the Kuhn-Tucker conditions (Fletcher, 1987)).
  • (21) Hence, the classifier (1) is found by solving the linear set of Equations (20)–(21) instead of quadratic programming.
  • The parameters of the kernels such asσ for the RBF kernel can be optimally chosen according to (12).
  • The support valuesαk are proportional to the errors at the data points (18), while in the case of (14) most values are equal to zero.
  • Hence, one could rather speak of a support value spectrum in the least squares case.

4. Examples

  • The size of the circles indicated at the training data is chosen proportionally to the absolute values of the support values.
  • This is different from SVM’s based on inequality constraints, where only points that are near the decision line have nonzero support values.
  • The training data are shown on Figure 2 with two classes indicated by ’o’ and′∗′ (360 points with 180 for each class) in a two dimensional input space.
  • The excellent generalization performance is clear from the decision boundaries shown on the figures.
  • Other methods which have been applied to the two-spiral benchmark problem, such as the use of circular units (Ridella et al., 1997), have shown good performance as well.

5. Conclusions

  • The authors discussed a least squares version of support vector machine classifiers.
  • For a complicated two-spiral classification problem it is illustrated that a least squares SVM with RBF kernel is readily found with excellent generalization performance and low computational cost.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Neural Processing Letters 9: 293–300, 1999.
© 1999 Kluwer Academic Publishers. Printed in the Netherlands.
293
Least Squares Support Vector Machine Classifiers
J.A.K. SUYKENS and J. VANDEWALLE
Katholieke Universiteit Leuven, Department of Electrical Engineering, ESAT-SISTA Kardinaal
Mercierlaan 94, B–3001 Leuven (Heverlee), Belgium, e-mail: johan.suykens@esat.kuleuven.ac.be
Abstract. In this letter we discuss a least squares version for support vector machine (SVM) classi-
fiers. Due to equality type constraints in the formulation, the solution follows from solving a set of
linear equations, instead of quadratic programming for classical SVM’s. The approach is illustrated
on a two-spiral benchmark classification problem.
Key words: classification, support vector machines, linear least squares, radial basis function kernel
Abbreviations: SVM Support Vector Machines; VC Vapnik-Chervonenkis; RBF Radial Basis
Function
1. Introduction
Recently, support vector machines (Vapnik, 1995; Vapknik, 1998a; Vapnik, 1998b)
have been introduced for solving pattern recognition problems. In this method one
maps the data into a higher dimensional input space and one constructs an optimal
separating hyperplane in this space. This basically involves solving a quadratic
programming problem, while gradient based training methods for neural network
architectures on the other hand suffer from the existence of many local minima
(Bishop, 1995; Cherkassky & Mulier, 1998; Haykin, 1994; Zurada, 1992). Kernel
functions and parameters are chosen such that a bound on the VC dimension is
minimized. Later, the support vector method was extended for solving function es-
timation problems. For this purpose Vapnik’s epsilon insensitive loss function and
Huber’s loss function have been employed. Besides the linear case, SVM’s based
on polynomials, splines, radial basis function networks and multilayer perceptrons
have been successfully applied. Being based on the structural risk minimization
principle and capacity concept with pure combinatorial definitions, the quality and
complexity of the SVM solution does not depend directly on the dimensionality of
the input space (Vapnik, 1995; Vapknik, 1998a; Vapnik, 1998b).
In this paper we formulate a least squares version of SVM’s for classification
problems with two classes. For the function estimation problem a support vec-
tor interpretation of ridge regression (Golub & Van Loan, 1989) has been given
in (Saunders et al., 1998), which considers equality type constraints instead of
inequalities from the classical SVM approach. Here, we also consider equality

294 J.A.K. SUYKENS AND J. VANDEWALLE
constraints for the classification problem with a formulation in least squares sense.
As a result the solution follows directly from solving a set of linear equations,
instead of quadratic programming. While in classical SVM’s many support values
are zero (nonzero values correspond to support vectors), in least squares SVM’s
the support values are proportional to the errors.
This paper is organized as follows. In Section 2 we review some basic work
about support vector machine classifiers. In Section 3 we discuss the least squares
support vector machine classifiers. In Section 4 examples are given to illustrate the
support values and on a two-spiral benchmark problem.
2. Support Vector Machines for Classification
In this Section we shortly review some basic work on support vector machines
(SVM) for classification problems. For all further details we refer to (Vapnik, 1995;
Vapnik, 1998a; Vapnik, 1998b).
Given a training set of N data points {y
k
,x
k
}
N
k=1
,wherex
k
R
n
is the kth input
pattern and y
k
R is the kth output pattern, the support vector method approach
aims at constructing a classifier of the form:
y(x) = sign
"
N
X
k=1
α
k
y
k
ψ(x, x
k
) + b
#
, (1)
where α
k
are positive real constants and b is a real constant. For ψ(·, ·) one typically
has the following choices: ψ(x, x
k
) = x
T
k
x (linear SVM); ψ(x, x
k
) = (x
T
k
x + 1)
d
(polynomial SVM of degree d); ψ(x, x
k
) = exp{−kx x
k
k
2
2
/σ
2
} (RBF SVM);
ψ(x, x
k
) = tanh[κ x
T
k
x+θ] (two layer neural SVM), where σ, κ and θ are constants.
The classifier is constructed as follows. One assumes that
w
T
ϕ(x
k
) + b 1 , if y
k
=+1,
w
T
ϕ(x
k
) + b ≤−1 , if y
k
=−1,
(2)
which is equivalent to
y
k
[w
T
ϕ(x
k
) + b]≥1,k= 1, ..., N, (3)
where ϕ(·) is a nonlinear function which maps the input space into a higher di-
mensional space. However, this function is not explicitly constructed. In order to
have the possibility to violate (3), in case a separating hyperplane in this higher
dimensional space does not exist, variables ξ
k
are introduced such that
y
k
[w
T
ϕ(x
k
) + b]≥1 ξ
k
,k= 1, ..., N,
ξ
k
0,k= 1, ..., N.
(4)
According to the structural risk minimization principle, the risk bound is minim-
ized by formulating the optimization problem
min
w,ξ
k
J
1
(w, ξ
k
) =
1
2
w
T
w + c
N
X
k=1
ξ
k
(5)

LEAST SQUARES SUPPORT VECTOR MACHINE CLASSIFIERS 295
subject to (4). Therefore, one constructs the Lagrangian
L
1
(w, b, ξ
k
; α
k
, ν
k
) = J
1
(w, ξ
k
)
P
N
k=1
α
k
{y
k
[w
T
ϕ(x
k
) + b]−
1 + ξ
k
}−
P
N
k=1
ν
k
ξ
k
(6)
by introducing Lagrange multipliers α
k
0, ν
k
0 (k = 1, ..., N ). The solution
is given by the saddle point of the Lagrangian by computing
max
α
k
,ν
k
min
w,b,ξ
k
L
1
(w, b, ξ
k
; α
k
, ν
k
). (7)
One obtains
L
1
∂w
= 0 w =
P
N
k=1
α
k
y
k
ϕ(x
k
),
L
1
∂b
= 0
P
N
k=1
α
k
y
k
= 0,
L
1
ξ
k
= 0 0 α
k
c, k = 1, ..., N,
(8)
which leads to the solution of the following quadratic programming problem
max
α
k
Q
1
(α
k
; ϕ(x
k
)) =−
1
2
N
X
k,l=1
y
k
y
l
ϕ(x
k
)
T
ϕ(x
l
) α
k
α
l
+
N
X
k=1
α
k
, (9)
such that
N
X
k=1
α
k
y
k
= 0, 0 α
k
c, k = 1, ..., N.
The function ϕ(x
k
) in (9) is related then to ψ(x, x
k
) by imposing
ϕ(x)
T
ϕ(x
k
) = ψ(x, x
k
), (10)
which is motivated by Mercer’s Theorem. Note that for the two layer neural SVM,
Mercer’s condition only holds for certain parameter values of κ and θ.
The classifier (1) is designed by solving
max
α
k
Q
1
(α
k
; ψ(x
k
,x
l
)) =−
1
2
N
X
k,l=1
y
k
y
l
ψ(x
k
,x
l
) α
k
α
l
+
N
X
k=1
α
k
, (11)
subject to the constraints in (9). One does not have to calculate w nor ϕ(x
k
) in order
to determine the decision surface. Because the matrix associated with this quadratic
programming problem is not indefinite, the solution to (11) will be global (Fletcher,
1987).
Furthermore, one can show that hyperplanes (3) satisfying the constraint kwk
2
a have a VC-dimension h which is bounded by
h min([r
2
a
2
],n)+ 1, (12)

296 J.A.K. SUYKENS AND J. VANDEWALLE
where [.] denotes the integer part and r is the radius of the smallest ball containing
the points ϕ(x
1
), ..., ϕ(x
N
). Finding this ball is done by defining the Lagrangian
L
2
(r, q, λ
k
) = r
2
N
X
k=1
λ
k
(r
2
−kϕ(x
k
) qk
2
2
), (13)
where q is the center of the ball and λ
k
are positive Lagrange multipliers. In a
similar way as for (5) one nds that the center is equal to q =
P
k
λ
k
ϕ(x
k
),where
the Lagrange multipliers follow from
max
λ
k
Q
2
(λ
k
; ϕ(x
k
)) =−
N
X
k,l=1
ϕ(x
k
)
T
ϕ(x
l
) λ
k
λ
l
+
N
X
k=1
λ
k
ϕ(x
k
)
T
ϕ(x
k
), (14)
such that
N
X
k=1
λ
k
= 1, λ
k
0 ,k = 1, ..., N.
Based on (10), Q
2
can also be expressed in terms of ψ(x
k
,x
l
). Finally, one
selects a support vector machine with minimal VC dimension by solving (11) and
computing (12) from (14).
3. Least Squares Support Vector Machines
Here we introduce a least squares version to the SVM classifier by formulating the
classification problem as
min
w,b,e
J
3
(w,b,e) =
1
2
w
T
w + γ
1
2
N
X
k=1
e
2
k
, (15)
subject to the equality constraints
y
k
[w
T
ϕ(x
k
) + b]=1 e
k
,k= 1, ..., N. (16)
One defines the Lagrangian
L
3
(w,b,e; α) = J
3
(w,b,e)
N
X
k=1
α
k
{y
k
[w
T
ϕ(x
k
) + b]−1 + e
k
}, (17)
where α
k
are Lagrange multipliers (which can be either positive or negative now
due to the equality constraints as follows from the Kuhn-Tucker conditions (Fletcher,
1987)).
The conditions for optimality
L
3
∂w
= 0 w =
P
N
k=1
α
k
y
k
ϕ(x
k
),
L
3
∂b
= 0
P
N
k=1
α
k
y
k
= 0,
L
3
∂e
k
= 0 α
k
= γe
k
,k= 1, ..., N,
L
3
α
k
= 0 y
k
[w
T
ϕ(x
k
) + b]−1 + e
k
= 0,k = 1, ..., N
(18)

LEAST SQUARES SUPPORT VECTOR MACHINE CLASSIFIERS 297
can be written immediately as the solution to the following set of linear equations
(Fletcher, 1987)
I 00
Z
T
000 Y
T
00γI I
ZY I 0
w
b
e
α
=
0
0
0
E
1
, (19)
where Z =[ϕ(x
1
)
T
y
1
; ...; ϕ(x
N
)
T
y
N
], Y =[y
1
; ...; y
N
],
E
1 =[1; ...; 1], e =
[e
1
; ...; e
N
], α =[α
1
; ...; α
N
]. The solution is also given by
0
Y
T
Y ZZ
T
+ γ
1
I

b
α
=
0
E
1
. (20)
Mercer’s condition can be applied again to the matrix = ZZ
T
,where
kl
= y
k
y
l
ϕ(x
k
)
T
ϕ(x
l
)
= y
k
y
l
ψ(x
k
,x
l
).
(21)
Hence, the classifier (1) is found by solving the linear set of Equations (20)–(21)
instead of quadratic programming. The parameters of the kernels such as σ for
the RBF kernel can be optimally chosen according to (12). The support values α
k
are proportional to the errors at the data points (18), while in the case of (14) most
values are equal to zero. Hence, one could rather speak of a support value spectrum
in the least squares case.
4. Examples
In a first example (Figure 1) we illustrate the support values for a linearly separable
problem of two classes in a two dimensional space. The size of the circles indicated
at the training data is chosen proportionally to the absolute values of the support
values. A linear SVM has been taken with γ = 1. Clearly, points located close and
far from the decision line have the largest support values. This is different from
SVMs based on inequality constraints, where only points that are near the decision
line have nonzero support values. This can be understood from the fact that the
signed distance from a point x
k
to the decision line is equal to (w
T
x
k
+ b)/kwk=
(1 e
k
)/(y
k
kwk) and α
k
= γe
k
in the least squares SVM case.
In a second example (Figure 2) we illustrate a least squares support vector
machine RBF classifier on a two-spiral benchmark problem. The training data are
shown on Figure 2 with two classes indicated by ’o’ and
0
0
(360 points with 180
for each class) in a two dimensional input space. Points in between the training
data located on the two spirals are often considered as test data for this problem but
are not shown on the figure. The excellent generalization performance is clear from
the decision boundaries shown on the figures. In this case σ = 1andγ = 1were
chosen as parameters. Other methods which have been applied to the two-spiral

Citations
More filters
Journal ArticleDOI
01 Apr 2012
TL;DR: ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly and in theory, ELM can approximate any target continuous function and classify any disjoint regions.
Abstract: Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built. ELM works for the “generalized” single-hidden-layer feedforward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to SVM, polynomial network, and the conventional feedforward neural networks. This paper shows the following: 1) ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly; 2) from the optimization method point of view, ELM has milder optimization constraints compared to LS-SVM and PSVM; 3) in theory, compared to ELM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity; and 4) in theory, ELM can approximate any target continuous function and classify any disjoint regions. As verified by the simulation results, ELM tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multiclass cases) generalization performance at much faster learning speed (up to thousands times) than traditional SVM and LS-SVM.

4,835 citations


Cites background from "Least Squares Support Vector Machin..."

  • ...Index Terms—Extreme learning machine (ELM), feature mapping, kernel, least square support vector machine (LS-SVM), proximal support vector machine (PSVM), regularization network....

    [...]

  • ...Cortes and Vapnik [1] study the relationship between SVM and multilayer feedforward neural networks and showed that SVM can be seen as a specific type of SLFNs, the so-called support vector networks....

    [...]

Book
17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

3,672 citations

Journal ArticleDOI
TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Abstract: We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).

2,616 citations


Cites methods from "Least Squares Support Vector Machin..."

  • ...56. lssvmRadial t implements the least squares SVM (Suykens and Vandewalle, 1999), using the function lssvm in the kernlab package, with Gaussian kernel tuning the kernel spread with values 10−2..107....

    [...]

Journal Article
TL;DR: It is argued that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines.
Abstract: We consider the problem of multiclass classification. Our main thesis is that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.

1,841 citations


Cites background from "Least Squares Support Vector Machin..."

  • ...…the name “proximal support vector machines” (Fung and Mangasarian, 2001b,a), and Suykens et al., under the name “least-squares support vector machines” (Suykens and Vandewalle, 1999a,b, Suykens et al., 1999), both derive essentially the same algorithm (we view the presence or absence of a bias…...

    [...]

Journal ArticleDOI
TL;DR: A survey on Extreme learning machine (ELM) and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELm, (3) online sequential ELM; and (4) incremental ELM and (5) ensemble ofELM.
Abstract: Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

1,767 citations


Cites background from "Least Squares Support Vector Machin..."

  • ...algorithms such as LS-SVM [49]....

    [...]

  • ...[39] further extended this study to generalized SLFNs with different type of hidden nodes (feature mappings) as well as kernels and showed that the simple unified algorithm of ELM can be obtained for regression, binary and multi-label classification cases which, however, have to be handled separately by SVMs and its variants [2, 45–49]....

    [...]

References
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


"Least Squares Support Vector Machin..." refers background in this paper

  • ...Introduction Recently, support vector machines (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b) have been introduced for solving pattern recognition problems....

    [...]

  • ...Being based onthe structural risk minimization principle and capacity concept with purecombinatorial definitions, the quality and complexity of the SVM solution does not depend directly on the dimensionality of the input space (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b)....

    [...]

  • ...The functionφ(xk) in (9) is related then toψ(x, xk) by imposing φ(x) φ(xk) = ψ(x, xk), (10)...

    [...]

  • ...For all further details we refer to (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b)....

    [...]

  • ...max αk Q1(αk;φ(xk)) = −12 N ∑ k,l=1 ykyl φ(xk) φ(xl) αkαl + N ∑ k=1 αk, (9)...

    [...]

Book
01 Jan 1983

34,729 citations

Book
16 Jul 1998
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Abstract: From the Publisher: This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.

29,130 citations

01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations


"Least Squares Support Vector Machin..." refers background in this paper

  • ...Introduction Recently, support vector machines (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b) have been introduced for solving pattern recognition problems....

    [...]

  • ...Being based onthe structural risk minimization principle and capacity concept with purecombinatorial definitions, the quality and complexity of the SVM solution does not depend directly on the dimensionality of the input space (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b)....

    [...]

  • ...The functionφ(xk) in (9) is related then toψ(x, xk) by imposing φ(x) φ(xk) = ψ(x, xk), (10)...

    [...]

  • ...For all further details we refer to (Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b)....

    [...]

  • ...Based on (10),Q2 can also be expressed in terms of ψ(xk, xl)....

    [...]

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Least squares support vector machine classifiers" ?

In this letter the authors discuss a least squares version for support vector machine ( SVM ) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM ’ s.