scispace - formally typeset
SciSpace - Your AI assistant to discover and understand research papers | Product Hunt

Book ChapterDOI

Adaptation, Performance and Vapnik-Chervonenkis Dimension of Straight Line Programs

10 Apr 2009-pp 315-326

TL;DR: Empirical comparation between model selection methods based on Linear Genetic Programming shows an upper bound for the Vapnik-Chervonenkis (VC) dimension of classes of programs representing linear code defined by arithmetic computations and sign tests is used.

AbstractWe discuss here empirical comparation between model selection methods based on Linear Genetic Programming. Two statistical methods are compared: model selection based on Empirical Risk Minimization (ERM) and model selection based on Structural Risk Minimization (SRM). For this purpose we have identified the main components which determine the capacity of some linear structures as classifiers showing an upper bound for the Vapnik-Chervonenkis (VC) dimension of classes of programs representing linear code defined by arithmetic computations and sign tests. This upper bound is used to define a fitness based on VC regularization that performs significantly better than the fitness based on empirical risk.

Summary (2 min read)

1 Introduction

  • Throughout these pages the authors study some theoretical and empirical properties of a new structure for representing computer programs in the GP paradigm.
  • Another advantage with respect to trees is that the slp structure can describe multivariate functions by selecting a number of assignments as the output set.
  • The GP approach with slp’s can be seen as a particular case of LGP where the data structures representing the programs are lists of computational assignments.
  • The authors study the practical performance of ad-hoc recombination operators for slp’s.
  • This bound constitutes their basic tool in order to perform structural risk minimization of the slp structure.

2 Straight Line Programs: Basic Concepts and Properties

  • Straight line programs are commonly used for solving problems of algebraic and geometric flavor.
  • The formal definition of slp’s the authors provide in this section is taken from [2].

3 Vapnik-Chervonenkis Dimension of Families of slp’s

  • In the last years GP has been applied to a range of complex learning problems including that of classification and symbolic regression in a variety of fields like quantum computing, electronic design, sorting, searching, game playing, etc.
  • A common feature of both tasks is that they can be thought of as a supervised learning problem (see [5]) where the hypothesis class C is the search space described by the genotypes of the evolving structures.
  • In the seventies the work by Vapnik and Chervonenkis ([6], [7], [8]) provided a remarkable family of bounds relating the performance of a learning machine (see [9] for a modern presentation of the theory).
  • The VCD depends on the class of classifiers.

3.1 Estimating the VC dimension of slp’s parameterized by real numbers

  • Next, make the following assumptions about the functions τi.
  • Form the sv functions τi(w,αj) from IRk to IR.
  • With the above setup, the following result is proved in [10].
  • In the new class C parameters αji, βjiare allowed to take values in IR.

3.2 Estimating the Average Error of slp’s

  • The authors show how to apply the bound in Equation 8 to estimate the average error with respect to the unknown distribution from which the examples are drawn.
  • The average error of a classifier with parameters (α, β) is ε(α, β) = ∫ Q(t, α, β; y)dµ, (16) where Q measures the loss between the semantic function of Γ(α,β) and the target concept, and µ is the distribution from which examples {(ti, yi)}1≤i≤m are drawn to the GP machine.
  • Now, the results by Vapnik state that the average error ε(α, β) can be estimated independently of the distribution of µ(t, y) due to the following formula.
  • The constant η is the probability that the bound is violated.

4 SLP-Based Genetic Programming

  • The authors keep homogeneous populations of equal length slp’s.
  • Next, the authors describe the recombination operator.
  • Then a new random selection is made within the arguments of the function f ∈ F that constitutes the instruction ui.

4.1 Fitness based on Structural Risk Minimization

  • In this situation one chooses the model that minimizes the right side of Equation 17.
  • For practical use of Equation 17 the authors adopt the following formula with appropriately chosen practical values of theoretical constants (see [12] for the derivation of this formula).

4.2 Experimentation

  • The authors consider instances of Symbolic Regression for their experimentation.
  • The authors adopt slp’s as the structures that evolve within the process.
  • In table 2 the authors show the corresponding success rates for each crossover method and target function.
  • The above experimental procedure is repeated 100 times using 100 different random realizations of n training samples (from the same statistical distribution).
  • Accordingly, the values in the comparative rows that are bigger than or equal to 1 represent a better performance of VC-fitness.

5 Conclusions and Future Research

  • The authors have calculated a sharp bound for the VC dimension of the GP genotype defined by computer programs using straight line code.
  • The authors have used this bound to perform VC-based model selection under the GP paradigm showing that this model selection method consistently outperforms LGP algorithms based on empirical risk minimization.
  • A second goal in their research on SLP-based GP is to study the experimental behavior of the straight line program computation model under Vapnik-Chervonenkis regularization but without assuming previous knowledge of the length of the structure.
  • This investigation is crucial in practical applications for which the GP machine must be able to learn not only the shape but also the length of the evolved structures.
  • To this end new recombination operators must be designed since the crossover procedure employed in this paper only applies to populations having fixed length chromosomes.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

Adaptation, Performance and
Vapnik-Chervonenkis Dimension of Straight Line
Programs
Jos´e L. Monta˜na
1
, Cruz E. Borges
1
, C´esar L. Alonso
2
, and Jos´e L. Crespo
1
1
Departamento de Matem´aticas, Estad´ıstica y Computaci´on,
Universidad de Cantabria
{montanjl, borgesce, luis.crespo}@unican.es
2
Centro de Inteligencia Artificial, Universidad de Oviedo
Campus de Viesques, 33271 Gij´on, Spain
calonso@aic.uniovi.es
Abstract. We discuss here empirical comparation between model se-
lection methods based on Linear Genetic Programming. Two statistical
methods are compared: model selection based on Empirical Risk Min-
imization (ERM) and model selection based on Structural Risk Mini-
mization (SRM). For this purpose we have identified the main compo-
nents which determine the capacity of some linear structures as classifiers
showing an upper bound for the Vapnik-Chervonenkis (VC) dimension of
classes of programs representing linear code defined by arithmetic com-
putations and sign tests. This upper bound is used to define a fitness
based on VC regularization that performs significantly better than the
fitness based on empirical risk.
Key words: Genetic Programming, Linear Genetic Programming, Vapnik-
Chervonenkis dimension
1 Introduction
Throughout these pages we study some theoretical and empirical properties of
a new structure for representing computer programs in the GP paradigm. This
data structure –called straight line program (slp) in the framework of Symbolic
Computation ([1])– was introduced for the first time into the GP setting in [2].
A slp consists of a finite sequence of computational assignments. Each assign-
ment is obtained by applying some functional (selected from a specified) to a
set of arguments that can be variables, constants or pre-computed results. The
slp structure can describe complex computable functions using less amount of
computational resources than GP-trees. The key point for explaining this feature
is the ability of slp’s for reusing previously computed results during the evalu-
ation process. Another advantage with respect to trees is that the slp structure
can describe multivariate functions by selecting a number of assignments as the
output set. Hence one single slp has the same representation capacity as a forest
of trees (see [2] for a complete presentation of this structure).

2 New Outcomes in Linear Genetic Programming
Linear Genetic Programming (LGP) is a GP variant that evolves sequences
of instructions from an imperative programming language or from a machine
language. The structure of the program representation consists of assignments
of operations over constants or memory variables called registers, to another
registers (see [3] for a complete overview on LGP). The GP approach with slp’s
can be seen as a particular case of LGP where the data structures representing
the programs are lists of computational assignments.
We study the practical performance of ad-hoc recombination operators for
slp’s. We apply the SLP-based GP approach to solve some instances of the sym-
bolic regression problem. Experimentation done over academic examples uses a
weak form of structural risk minimization and suggests that the slp structure
behaves very well when dealing with bounded length individuals directed to min-
imize a compromise between empirical risk and non-scalar length (i.e. number
of non-linear operations used by the structure). We have calculated an explicit
upper bound for the Vapnik-Chervonenkis dimension (VCD) of some particu-
lar classes of slp’s. This bound constitutes our basic tool in order to perform
structural risk minimization of the slp structure.
2 Straight Line Programs: Basic Concepts and Properties
Straight line programs are commonly used for solving problems of algebraic and
geometric flavor. An extensive study of the use of slp’s in this context can be
found in [4]. The formal definition of slp’s we provide in this section is taken
from [2].
Definition 1. Let F = {f
1
, . . . , f
n
} be a set of functions, where each f
i
has arity
a
i
, 1 i n, and let T = {t
1
, . . . , t
m
} be a set of terminals. A straight line
program (slp) over F and T is a finite sequence of computational instructions
Γ = {I
1
, . . . , I
l
}, where for each k {1, . . . , l},
I
k
u
k
:= f
j
k
(α
1
, . . . , α
a
j
k
); with f
j
k
F,
α
i
T for all i if k = 1 and α
i
T {u
1
, . . . , u
k1
} for 1 < k l.
The set of terminals T satisfies T = V C where V = {x
1
, . . . , x
p
} is a finite
set of variables and C = {c
1
, . . . , c
q
} is a finite set of constants. The number of
instructions l is the length of Γ.
Usually an slp Γ = {I
1
, . . . , I
l
} will be identified with the set of variables
u
i
introduced at each instruction u
i
, thus Γ = {u
1
, . . . , u
l
}. Each of the non-
terminal variables u
i
can be considered as an expression over the set of terminals
T constructed by a sequence of recursive compositions from the set of functions
F. Following [2] we denote by SLP (F, T ) the set of all slp’s over F and T.
Example 1. Let F be the set given by the three binary standard arithmetic oper-
ations F = {+, , ∗} and let T = {1, x
1
, x
2
, . . . , x
n
} be the set of terminals. Any
slp Γ in SLP (F, T ) represents a n-variate polynomial with integer coefficients.

New Outcomes in Linear Genetic Programming 3
An output set of a slp Γ {u
1
, . . . , u
l
} is any set of non-terminal variables of Γ ,
that is O(Γ ) = {u
i
1
, . . . , u
i
t
}. The function computed by a slp Γ = {u
1
, . . . , u
l
}
over F and T with set of terminal variables V = {x
1
, . . . , x
p
} and with output
set O(Γ ) = {u
i
1
, . . . , u
i
t
}, denoted by Φ
Γ
: I
p
O
t
, is defined recursively
in the natural way and satisfies Φ
Γ
(a
1
, . . . , a
p
) = (b
1
, . . . , b
t
), where b
j
stands
for the value of the expression over V of the non terminal variable u
i
j
when we
replace each variable x
k
with a
k
; 1 k p.
3 Vapnik-Chervonenkis Dimension of Families of slp’s
In the last years GP has been applied to a range of complex learning problems
including that of classification and symbolic regression in a variety of fields like
quantum computing, electronic design, sorting, searching, game playing, etc. A
common feature of both tasks is that they can be thought of as a supervised
learning problem (see [5]) where the hypothesis class C is the search space de-
scribed by the genotypes of the evolving structures. In the seventies the work by
Vapnik and Chervonenkis ([6], [7], [8]) provided a remarkable family of bounds
relating the performance of a learning machine (see [9] for a modern presentation
of the theory). The Vapnik- Chervonenkis dimension (VCD) is a measure of the
capacity of a family of functions (or learning machines) as classifiers. The VCD
depends on the class of classifiers. Hence, it does not make sense to calculate
VCD for GP in general, however it makes sense if we choose a particular class of
computer programs as classifiers. Our aim is to study in depth the formal prop-
erties of GP algorithms focusing on the analysis of the classification complexity
(VCD) of straight line programs.
3.1 Estimating the VC dimension of slp’s parameterized by real
numbers
The following definition of VC dimension is standard. See for instance [7].
Definition 2. Let C be a class of subsets of a set X. We say that C shatters a
set A X if for every subset E A there exists S C such that E = S A.
The VC dimension of C is the cardinality of the largest set that is shattered by
C.
Through this section we deal with concept classes C
k,n
such that concepts are
represented by k real numbers, w = (w
1
, ..., w
k
), instances are represented by
n real numbers, x = (x
1
, ..., x
n
), and the membership test to the family C
k,n
is expressed by a formula Φ
k,n
(w, x) taking as inputs the pair concept/instance
(w, x) and returning the value 1 if x belongs to the concept represented by w
and 0 otherwise
We can think of Φ
k,n
as a function from IR
k+n
to {0, 1}. So for each concept w,
define:
C
w
:= {x IR
n
: Φ
k,n
(w, x) = 1}. (1)

4 New Outcomes in Linear Genetic Programming
The goal is to obtain an upper bound on the VC dimension of the collection of
sets
C
k,n
= {C
w
: w IR
k
}. (2)
Now assume that the formula Φ
k,n
is a boolean combination of s atomic
formulas, each of them having the following forms:
τ
i
(w, x) > 0 (3)
or
τ
i
(w, x) = 0 (4)
where {τ
i
(w, x)}
1is
are infinitely differentiable functions from IR
k+n
to IR.
Next, make the following assumptions about the functions τ
i
. Let α
1
, ..., α
v
IR
n
. Form the sv functions τ
i
(w, α
j
) from IR
k
to IR. Choose Θ
1
, ..., Θ
r
among
these, and define
Θ : IR
k
IR
r
(5)
as
Θ(w) := (Θ
1
(w), ..., Θ
r
(w)) (6)
Assume there is a bound B independent of the α
i
, r and
1
, ...,
r
such that if
Θ
1
(
1
, ...,
r
) is a (kr)-dimensional C
- submanifold of IR
k
then Θ
1
(
1
, ...,
r
)
has at most B connected components.
With the above setup, the following result is proved in [10].
Theorem 1. The VC dimension V of a family of concepts C
k,n
whose mem-
bership test can be expressed by a formula Φ
k,n
satisfying the above conditions
satisfies:
V 2 log
2
B + 2k log
2
(2es) (7)
Next we state our main result concerning the VCD of a collection of subsets
accepted by a family of slp’s. We will say that a subset C IR
n
is accepted by
a slp Γ if the function computed by Γ , Φ
Γ
, expresses the membership test to C.
For slp’s Γ = (u
1
, . . . , u
l
) of length l accepting sets we assume that the output
is the last instruction u
l
and takes values in {0, 1}.
Theorem 2. Let T = {t
1
, . . . , t
n
} be a set of terminals and let F = {+, −∗, /,
sign} be a set of functionals where {+, , , /} denotes the set of standard arith-
metic operations and sign(x) is a function that outputs 1 if its input x IR
satisfies x 0 and outputs 0 otherwise. Let Γ
n,L
be the collection of slp
0
s Γ over
F and T using at most L non-scalar operations (i.e. operations in {∗, /, sign})
and a free number of scalar operations (i.e. operations in {+, −}) whose output
is obtained by applying the functional sign either to a previously computed result
or to a terminal t
j
, 1 j n. Let C
n,L
be the class of concepts defined by the
subsets of IR
n
accepted by some slp belonging to Γ
n,L
. Then
V C dim(C
n,L
) 2(n + 1)(n + L)L(2L + log
2
L + 9) (8)

New Outcomes in Linear Genetic Programming 5
Sketch of the proof. The first step in the proof consist of constructing of
a universal slp Γ
U
, over sets F
U
an T
U
, that parameterizes the elements of the
family Γ
n,L
. The definition of F
U
and T
U
depends only on the parameters n
and L and will be clear after the construction. The key idea in the definition of
Γ
U
is the introduction of a set of parameters α, β taking values in {0, 1}
k
, for a
suitable natural number k, such that each specialization of this set of parameters
yields a particular slp belonging to Γ
n,L
and conversely, each slp in Γ
n,L
can be
obtained specializing the parameters α, β. For this purpose define u
n+m
= t
m
for 1 m n. Note that any non-scalar assignment u
i
, 1 i L, in a slp Γ
belonging to Γ
n,L
is a function of t = (t
1
, . . . , t
n
) that can be parameterized as
follows.
u
i
= U
i
(α, β)(t) = α
i
n
(
i1
X
j=n+1
α
j
i
u
j
) (
i1
X
j=n+1
β
j
i
u
j
)+ (9)
+(1 α
i
n
)[β
i
n
P
i1
j=n+1
α
j
i
u
j
P
i1
j=n+1
β
j
i
u
j
+ (1 β
i
n
)sgn(
i1
X
j=n+1
β
j
i
u
j
)], (10)
for some suitable values α = (α
j
i
), β = (β
j
i
), with α
j
i
, β
j
i
{0, 1}.
Let us consider the family of parametric slp’s {Γ
(α,β)
} where for each (α, β)
the slp Γ
(α,β)
:= (U
1
(α, β), . . . , U
L
(α, β)). Next replace the family of concepts
C
n,L
with the class of subsets of IR
n
C := {C
(α,β)
} where for each (α, β), the set
C
(α,β)
is given as follows.
C
(α,β)
:= {t = (t
1
, . . . , t
n
) IR
n
: t is accepted by Γ
(α,β)
} (11)
In the new class C parameters α
j
i
, β
j
i
are allowed to take values in IR. Since
C
n,L
C it is enough to bound the VC dimension of C.
Claim A The number of parameters α, β is exactly
(n + 1)(n + L)L (12)
Claim B For each i, 1 i L, the following holds:
(1) The function U
i
(α, β)(t) is a piecewise rational function in the variables
α, β, t of formal degree bounded by 3.2
i
2.
(2) U
i
is defined up to a set of zero measure and there is a partition of the
domain of definition of U
i
by subsets (
i
j
)
1jn
i
with n
i
2
i
such that each
i
j
is defined by a conjunction of i rational inequalities of the form p 0 or
p < 0 with degree deg p 3.2
i
2. Moreover, the restriction of U
i
to the set
i
j
, U
i
|
i
j
, is a rational function of degree bounded by 3.2
i
2.
(3) Condition U
L
(α, β)(t) = 1 can be expressed by a boolean formula of the
following form:
_
1i2
L
1jL
p
i,j
i,j
0; (13)

Citations
More filters

Proceedings ArticleDOI
07 Jul 2010
TL;DR: Empirical comparisons between classical statistical methods adapted to Genetic Programming and the Structural Risk Minimization method based on Vapnik-Chervonenkis theory are presented and a new model complexity measure for the SRM method that tries to measure the non-linearity of the model is introduced.
Abstract: In this paper we discuss the problem of model selection in Genetic Programming. We present empirical comparisons between classical statistical methods (AIC, BIC) adapted to Genetic Programming and the Structural Risk Minimization method (SRM) based on Vapnik-Chervonenkis theory (VC), for symbolic regression problems with added noise. We also introduce a new model complexity measure for the SRM method that tries to measure the non-linearity of the model. The experimentation suggests practical advantages of using VC-based model selection with the new complexity measure, when using genetic training.

16 citations


Cites background from "Adaptation, Performance and Vapnik-..."

  • ...The exact relationship between non-scalar size of a GP-tree (more generally, a computer program) and its VC dimension is showed in [4]....

    [...]


Book ChapterDOI
20 Jun 2011
TL;DR: VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable, and precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.
Abstract: Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by this situation we review the problem of Symbolic Regression under the perspective of Machine Learning, a well founded mathematical toolbox for predictive learning. We perform empirical comparisons between classical statistical methods (AIC and BIC) and methods based on Vapnik-Chrevonenkis (VC) theory for regression problems under genetic training. Empirical comparisons of the different methods suggest practical advantages of VC-based model selection. We conclude that VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable. As main practical advantage, precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.

11 citations


Cites background from "Adaptation, Performance and Vapnik-..."

  • ...A further development of this point of view, including some experimental discussion, can be found in [7]....

    [...]


Proceedings ArticleDOI
02 Nov 2009
TL;DR: A general evolution strategy technique is proposed for approximating the optimal constants in a computer program representing the solution of a symbolic regression problem and the proposed algorithm improves such technique.
Abstract: Evolutionary computation methods have been used to solve several optimization and learning problems. This paper describes an application of evolutionary computation methods to constants optimization in Genetic Programming. A general evolution strategy technique is proposed for approximating the optimal constants in a computer program representing the solution of a symbolic regression problem. The new algorithm has been compared with a recent linear genetic programming approach based on straight-line programs. The experimental results show that the proposed algorithm improves such technique.

11 citations


Cites background from "Adaptation, Performance and Vapnik-..."

  • ...The exact relationship between non-scalar size of a slp (more generally, a computer program) and its VC dimension is showed in [6]....

    [...]


Dissertation
07 Nov 2016
TL;DR: This dissertation explores the task of computational learning and the related concepts of generalization and overtting, in the context of Genetic Programming (GP), a computational method inspired by natural evolution that considers a set of primitive functions and terminals that can be combined without any considerable constraints on the structure of the models being evolved.
Abstract: Computational learning refers to the task of inducing a general pattern from a provided set of examples. A learning method is expected to generalize to unseen examples of the same pattern. A common issue in computational learning is the possibility that the resulting models could be simply learning the provided set of examples, instead of learning the underlying pattern. A model that is incurring in such a behavior is commonly said to be overtting. This dissertation explores the task of computational learning and the related concepts of generalization and overtting, in the context of Genetic Programming (GP). GP is a computational method inspired by natural evolution that considers a set of primitive functions and terminals that can be combined without any considerable constraints on the structure of the models being evolved. This exibility can help in learning complex patterns but it also increases the risk of overtting. The contributions of this dissertation cover the most common form of GP (Standard GP), as well as the recently proposed Geometric Semantic GP (GSGP). The initial set of approaches relies on dynamically selecting different training data subsets during the evolutionary process. These approaches can avoid overtting and improve the resulting generalization without restricting the exibility of GP. Besides improving the generalization, these approaches also produce considerably smaller individuals. An analysis of the generalization ability of GSGP is performed, which shows that the generalization outcome is greatly dependent on particular characteristics of the mutation operator. It is shown that, as Standard GP, the original formulation of GSGP is prone to overtting. The necessary conditions to avoid overtting are presented. When such conditions are in place, GSGP can achieve a particularly competitive generalization. A novel geometric semantic mutation that substantially improves the effectiveness and efficiency of GSGP is proposed. Besides considerably improving the training data learning rate, it also achieves a competitive generalization with only a few applications of the mutation operator. The nal set of contributions covers the domain of Neural Networks (NNs). These contributions originated as an extension of the research conducted within GSGP. This set of contributions includes the denition of a NN construction algorithm based on an extension of the mutation operator dened in GSGP. Similarly to GSGP, the proposed algorithm searches over a space without local optima. This allows for an effective and efficient stochastic search in the space of NNs, without the need to use backpropagation to adjust the weights of the network. Finally, two search stopping criteria are proposed, which can be directly used in the proposed NN construction algorithm and in GSGP. These stopping criteria are able to detect when the risk of overtting increases signicantly. It is shown that the stopping points detected result in a competitive generalization.

10 citations


Cites background from "Adaptation, Performance and Vapnik-..."

  • ...Connections with Statistical Learning Theory (Amil et al., 2009; Chen et al., 2016; Montana et al., 2009), and PAC learning (Kötzing et al....

    [...]

  • ...Statistical Learning Theory (SLT) (Vapnik, 1995) is by now a mature eld that provides theoretical considerations to guide the design of learning algorithms....

    [...]


Book ChapterDOI
01 Jan 2012
TL;DR: The results show that in the presence of noise, the coevolutionary architecture with penalized fitness function outperforms the strategies where only the empirical error is considered in order to evaluate the symbolic expressions of the population.
Abstract: Frequently, when an evolutionary algorithm is applied to a population of symbolic expressions, the shapes of these symbolic expressions are very different at the first generations whereas they become more similar during the evolving process. In fact, when the evolutionary algorithm finishes most of the best symbolic expressions only differ in some of its coefficients. In this paper we present several coevolutionary strategies of a genetic program that evolves symbolic expressions represented by straight line programs and an evolution strategy that searches for good coefficients. The presented methods have been applied to solve instances of symbolic regression problem, corrupted by additive noise. A main contribution of the work is the introduction of a fitness function with a penalty term, besides the well known fitness function based on the empirical error over the sample set. The results show that in the presence of noise, the coevolutionary architecture with penalized fitness function outperforms the strategies where only the empirical error is considered in order to evaluate the symbolic expressions of the population.

1 citations


References
More filters

01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,121 citations


Book
John R. Koza1
01 Jan 1992
TL;DR: This book discusses the evolution of architecture, primitive functions, terminals, sufficiency, and closure, and the role of representation and the lens effect in genetic programming.
Abstract: Background on genetic algorithms, LISP, and genetic programming hierarchical problem-solving introduction to automatically-defined functions - the two-boxes problem problems that straddle the breakeven point for computational effort Boolean parity functions determining the architecture of the program the lawnmower problem the bumblebee problem the increasing benefits of ADFs as problems are scaled up finding an impulse response function artificial ant on the San Mateo trail obstacle-avoiding robot the minesweeper problem automatic discovery of detectors for letter recognition flushes and four-of-a-kinds in a pinochle deck introduction to biochemistry and molecular biology prediction of transmembrane domains in proteins prediction of omega loops in proteins lookahead version of the transmembrane problem evolutionary selection of the architecture of the program evolution of primitives and sufficiency evolutionary selection of terminals evolution of closure simultaneous evolution of architecture, primitive functions, terminals, sufficiency, and closure the role of representation and the lens effect Appendices: list of special symbols list of special functions list of type fonts default parameters computer implementation annotated bibliography of genetic programming electronic mailing list and public repository

13,137 citations


Book ChapterDOI
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

3,669 citations


Journal ArticleDOI
01 Feb 1964
Abstract: PROOF Approximate fi, * *, fm by real polynomials F1, * , Fm of the same degrees whose coefficients are algebraically independent Now consider the variety Vc in the complex Cartesian space C'defined by the equations F1 =0, * * *, Fm =0 It follows from van der Waerden [9, ?41 ] that the number of points in Vc is equal to (deg fi) (deg f2) (deg fm) Since each point of V0 lies close to some real point of Vc; this proves Lemma 1

554 citations


Book
11 Dec 2006
TL;DR: Typical GP phenomena, such as non-effective code, neutral variations, and code growth are investigated from the perspective of linear GP.
Abstract: Linear Genetic Programming presents a variant of Genetic Programming that evolves imperative computer programs as linear sequences of instructions, in contrast to the more traditional functional expressions or syntax trees. Typical GP phenomena, such as non-effective code, neutral variations, and code growth are investigated from the perspective of linear GP. This book serves as a reference for researchers; it includes sufficient introductory material for students and newcomers to the field.

368 citations