What is the next step of the research?

The next step of their research will be to carry out a thorough experimental evaluation of the modified SNGP algorithms with the primary objectives being the speed of convergence and the ability to react fast to the changes of the environment in order to be able to deploy the algorithm within the dynamic symbolic regression scenario.

What is the reason for the proposed modifications of the SNGP algorithm?

The proposed modifications of the SNGP algorithm are configured with the following parameters:– upToN ∈ {1, 5}, – selection is either random (denoted as ’r’) or depthwise (denoted as ’d’) – moveType is either moveLeft (denoted as ’l’), moveRight (denoted as ’r’)or no move (denoted as ’n’).

What are the main objectives of the proposed hybrid SNGP?

Further investigations will include utilization of new mutation operators, identification of suitable ”high-level” basic functions to the SNGP’s function set, design of mechanisms to evolve inner constants of the models and mechanisms for escaping from local optima.

What is the SNGP with generational replacement strategy?

Standard GP with generational replacement strategy was used with the following parameters:– Function set: {+, -, *, /} – Terminal set: {x1, x2, 1.0} – Population size: 500 – Initialization method: Ramped half-and-half – Tournament selection: 5 candidates – Number of generations: 55, i.e. 54 generations plus initialization of the wholepopulation – Crossover probability: 90% – Reproduction probability: 10% – Probability of choosing internal node as crossover point: 90%For the experiments with the GP the authors used the Java-based Evolutionary Computation Research System ECJ 225.

What is the significance level of the t-test?

Checked using the t-test calculated with the significance level α = 0.05It has widely been reported in the literature that the evolutionary algorithms work much better when hybridized with local search techniques, the concept known as the memetic algorithms [7].

What is the way to learn a linear regression model?

several methods emerged [1], [2], [15], [21], [22] that explicitly restrict the class of models to generalized linear models, i.e. to a linear combination of possibly non-linear basis functions.

What is the complexity of the LASSO model?

The complexity of the LASSO model is controlled by (1) the maximal depth of features evolved in the population and (2) the maximum number of features the LASSO model can be composed of.

What is the purpose of this paper?

This paper deals with the Single Node Genetic Programming method, proposes its modifications and ways of hybridization to improve its performance.

What is the residua of the SNGP?

5.Each feature fi is evolved in a separate run of the SNGP (line 6) such that it correlates the most with the residua R (i.e. the vector of error values over all training samples) produced by the current LASSO regression model composed of i−1 features.

What is the SNGP algorithm for f2?

there is a clear trend showing that the SNGP without LASSO is doing well on rather simple benchmarks f1 and f2 (it is even better than both hybrid algorithms on f2), i.e. the polynomials that involve only trivial integer constants.

(Open Access) Hybrid Single Node Genetic Programming for Symbolic Regression (2016) | Jiří Kubalík

Q: What contributions have the authors mentioned in the paper "Delft university of technology hybrid single node genetic programming for symbolic regression" ?

This paper presents a first step of their research on designing an effective and efficient GP-based method for symbolic regression. First, the authors propose three extensions of the standard Single Node GP, namely ( 1 ) a selection strategy for choosing nodes to be mutated based on depth and performance of the nodes, ( 2 ) operators for placing a compact version of the best-performing graph to the beginning and to the end of the population, respectively, and ( 3 ) a local search strategy with multiple mutations applied in each iteration. The authors then propose two variants of hybrid SNGP utilizing a linear regression technique, LASSO, to improve its performance. The achieved results are promising showing the potential of the proposed modifications to improve the performance of the SNGP algorithm.

Q: What is the reason for imposing high selection pressure on the root nodes?

In fact, imposing high selection pressure on the root nodes might be counter-productive in the end as the mutations applied on the root nodes are less likely to bring an improvement than mutations applied on the deeper structures of the trees.

Delft University of Technology

Hybrid single node genetic programming for symbolic regression

Kubalìk, Jiřì; Alibekov, Eduard; Žegklitz, Jan; Babuska, R.

DOI

10.1007/978-3-662-53525-7_4

Publication date

2016

Document Version

Accepted author manuscript

Published in

Transactions on Computational Collective Intelligence XXIV

Citation (APA)

Kubalìk, J., Alibekov, E., Žegklitz, J., & Babuska, R. (2016). Hybrid single node genetic programming for

symbolic regression. In NT. Nguyen, R. Kowalczyk, & J. Filipe (Eds.),

Transactions on Computational

Collective Intelligence XXIV

(Vol. LNCS 9770, pp. 61-82). (Lecture Notes in Computer Science (including

subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9770 LNCS).

Springer. https://doi.org/10.1007/978-3-662-53525-7_4

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent

of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights.

We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

Hybrid Single Node Genetic Programming for

Symbolic Regression

Jiˇr´ı Kubal´ık

, Eduard Alibekov

1,2

, Jan

Zegklitz

1,2

, and Robert Babuˇska

1,3

Czech Institute of Informatics, Robotics, and Cybernetics,

CTU in Prague, Prague, Czech Republic

{kubalik,babuska}@ciirc.cvut.cz

Department of Cybernetics, Faculty of Electrical Engineering,

CTU in Prague, Prague, Czech Republic

Delft Center for Systems and Control,

Delft University of Technology, Delft, the Netherlands

Abstract. This paper presents a ﬁrst step of our research on designing

an eﬀective and eﬃcient GP-based method for symbolic regression. First,

we propose three extensions of the standard Single Node GP, namely (1)

a selection strategy for choosing nodes to be mutated based on depth and

performance of the nodes, (2) operators for placing a compact version

of the best-performing graph to the beginning and to the end of the

population, respectively, and (3) a local search strategy with multiple

mutations applied in each iteration. All the proposed modiﬁcations have

been experimentally evaluated on ﬁve symbolic regression benchmarks

and compared with standard GP and SNGP. The achieved results are

promising showing the potential of the proposed modiﬁcations to improve

the performance of the SNGP algorithm. We then propose two variants of

hybrid SNGP utilizing a linear regression technique, LASSO, to improve

its performance. The proposed algorithms have been compared to the

state-of-the-art symbolic regression methods that also make use of the

linear regression techniques on four real-world benchmarks. The results

show the hybrid SNGP algorithms are at least competitive with or better

than the compared methods.

Keywords: Genetic Programming, Single Node Genetic Programming,

Symbolic Regression

1 Introduction

This paper presents a ﬁrst step of our research on genetic programming (GP) for

the symbolic regression problem. The ultimate goal of our project is to design an

eﬀective and eﬃcient GP-based method for solving dynamic symbolic regression

problems where the target function evolves in time. Symbolic regression (SR) is

a type of regression analysis that searches the space of mathematical expressions

The final publication is available at link.springer.com

DOI: 10.1007/978-3-662-53525-7_4

2 Jiˇr´ı Kubal´ık, Eduard Alibekov, Jan

Zegklitz, and Robert Babuˇska

to ﬁnd the model that best ﬁts a given dataset, both in terms of accuracy and

simplicity

Genetic programming belongs to eﬀective and eﬃcient methods for solving

the SR problem. Besides the standard Koza’s tree-based GP [12], many other

variants have been proposed. They include, for instance, Grammatical Evolution

(GE) [20] which evolves programs whose syntax is deﬁned by a user-speciﬁed

grammar (usually a grammar in Backus-Naur form). Gene Expression Program-

ming (GEP) [4] is another GP variant successful in solving the SR problems. Sim-

ilarly to GE it evolves linear chromosomes that are expressed as tree structures

through a genotype-phenotype mapping. A graph-based Cartesian GP (CGP)

[18], is a GP technique that uses a very simple integer based genetic representa-

tion of a program in the form of a directed graph. In its classic form, CGP uses

a variant of a simple algorithm called (1 + λ)-Evolution Strategy with a point

mutation variation operator. When searching the space of candidate solutions,

CGP makes use of so called neutral mutations, meaning that a move to the

new state is accepted if it does not worsen the quality of the current solution.

This allows an introduction of new pieces of genetic code that can be plugged

into the functional code later on and allows for traversing plateaus of the ﬁtness

landscape.

A Single Node GP (SNGP) [9], [10] is a rather new graph-based GP system

that evolves a population of individuals, each consisting of a single program node.

Similarly to CGP, the evolution is carried out via a hill-climbing mechanism

using a single reversible mutation operator. The ﬁrst experiments with SNGP

were very promising as they showed that SNGP signiﬁcantly outperforms the

standard GP on various problems including the SR problem. In this work we take

the standard SNGP as the baseline approach and propose several modiﬁcations

to further improve its performance.

The goals of this work are twofold. The ﬁrst goal is to verify performance

of the vanilla SNGP compared to the standard GP on various SR benchmarks

and to investigate the impact of the following three design aspects of the SNGP

algorithm:

– a strategy to select the nodes to be mutated,

– a strategy according to which the nodes of the best-performing expression

are treated in the population,

– and a type of the search strategy used to guide the optimization process.

The second goal is to propose a hybrid variant of SNGP which incorporates

the LASSO regression technique for creating linear-in-parameters nonlinear mod-

els. We compare its performance with other state-of-the-art symbolic regression

methods which also make use of linear regression techniques.

The paper is organized as follows. Section 2 describes the SNGP algorithm.

In Section 3, three modiﬁcations of the SNGP algorithm are proposed. Exper-

imental evaluation of the modiﬁed SNGP and its comparison to the standard

SNGP and standard Koza’s GP is presented in Section 4. Section 5 describes two

https://en.wikipedia.org/wiki/Symbolic regression

Hybrid Single Node Genetic Programming for Symbolic Regression 3

variants of the SNGP utilizing the linear regression technique, LASSO, to im-

prove its performance. The two versions of SNGP with LASSO are compared to

other symbolic regression methods making use of the linear regression techniques

in Section 6. Finally, Section 7 concludes the paper and proposes directions for

the further research on this topic.

2 Single Node Genetic Programming

2.1 Representation

The Single Node Genetic Programming is a GP system that evolves a population

of individuals, each consisting of a single program node. The node can be either

terminal, i.e. a constant or a variable node, or a function from a set of functions

deﬁned for the problem at hand. Importantly, individuals are not isolated in the

population, they are interlinked in a graph structure similar to that of CGP,

with population members acting as operands of other members [9].

Formally, a SNGP population is a set of N individuals M = {m

, m

, . . . , m

N −1

with each individual m

being a single node represented by the tuple m

, f

, Succ

, P red

, O

i, where

– u

∈ T ∪ F is either an element chosen from a function set F or a terminal

set T deﬁned for the problem,

– f

is the ﬁtness of the individual,

– Succ

is a set of successors of this node, i.e. the nodes whose output serves

as the input to the node,

– P red

is a set of predecessors of this node, i.e. the nodes that use this indi-

vidual as an operand,

– O

is a vector of outputs produced by this node.

Typically, the population is partitioned so that the ﬁrst N

term

nodes, at

positions 0 to N

term

−1, are terminals (variables and constants in case of the SR

problem), followed by function nodes. Importantly, a function node at position i

can use as its successor (i.e. the operand) any node that is positioned lower down

in the population relative to the node i. This means that for each s ∈ Succ

have 0 ≤ s < i [9]. Similarly, predecessors of individual i must occupy higher

positions in the population, i.e. for each p ∈ P red

we have i < p < N. Note

that each function node is in fact a root of a direct acyclic graph that can be

constructed by recursively traversing through successors until the leaf terminal

nodes.

2.2 Evolutionary model

In [9], a single evolutionary operator called successor mutate (smut) has been

proposed. It picks one individual of the population at random and then one of its

successors is replaced by a reference to another individual of the population mak-

ing sure that the constraint imposed on the successors is satisﬁed. Predecessor

4 Jiˇr´ı Kubal´ık, Eduard Alibekov, Jan

Zegklitz, and Robert Babuˇska

lists of all aﬀected individuals are updated accordingly. Moreover, all individuals

aﬀected by this action must be reevaluated as well. For more details refer to [9].

The evolution is carried out via a hill-climbing mechanism using a smut

operator and an acceptance rule, which can have various forms. In [9], it was

based on ﬁtness measurements across the whole population, rather than on sin-

gle individuals. This means that once the population has been changed by a

single application of the smut operator and all aﬀected individuals have been

re-evaluated, the new population is accepted if and only if the sum of the ﬁtness

values of all individuals in the population is no worse than the sum of ﬁtness

values before the mutation. Otherwise, the modiﬁcations made by the mutation

are reversed. In [10] the acceptance rule is based only on the best ﬁtness in the

population. The latter acceptance rule will be used in this work as well. The

reason for this choice is explained in Section 3.4.

3 Proposed Modiﬁcations

In this section, the following three modiﬁcations of the SNGP algorithm will be

proposed:

1. A selection strategy for choosing nodes to be mutated based on depth and

performance of nodes.

2. Operators for placing a compact version of the tree rooted in the best per-

forming node to the beginning and to the end of the population, respectively.

3. A local search strategy with multiple mutations applied in each iteration.

In the following text, the term ”best tree” is used to denote the tree rooted

in the best performing node.

3.1 Depthwise Selection Strategy

The ﬁrst modiﬁcation focuses on the strategy for selecting the nodes to be mu-

tated. In the standard SNGP, the node to be mutated is chosen at random.

This means that all function nodes have the same probability of selection ir-

respectively of (1) how well they are performing and (2) how well the trees of

which they are a part are performing. This is not in line with the evolutionary

paradigm where the well ﬁt individuals should have higher chance to take part

in the process of an evolution of the population.

One way to narrow this situation is to select nodes according to their ﬁtness.

However, this would prefer just the root nodes of trees with high ﬁtness while

neglecting the nodes at the deeper levels of such well-performing trees which

themselves have rather poor ﬁtness. In fact, imposing high selection pressure on

the root nodes might be counter-productive in the end as the mutations applied

on the root nodes are less likely to bring an improvement than mutations applied

on the deeper structures of the trees.

We propose a selection strategy that takes into account the quality of the

mutated trees, so that better performing trees are preferred, as well as the depth

Hybrid Single Node Genetic Programming for Symbolic Regression

Figures

Citations

Choosing function sets with better generalisation performance for symbolic regression models

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Symbolic regression driven by training data and prior knowledge

Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

Enhanced Symbolic Regression Through Local Variable Transformations

References

UCI Machine Learning Repository

Regularization and variable selection via the elastic net

Regularization Paths for Generalized Linear Models via Coordinate Descent

Genetic Programming: On the Programming of Computers by Means of Natural Selection

UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences

Related Papers (5)

Symbolic regression methods for control system synthesis

A fast parallel genetic programming framework with adaptively weighted primitives for symbolic regression

Binary variational genetic programming for the problem of synthesis of control system

Local search in parallel linear genetic programming for multiclass classification

Hybrid Genetic Algorithm for VLSI Macro Cell Layout

Frequently Asked Questions (12)

Q1. What contributions have the authors mentioned in the paper "Delft university of technology hybrid single node genetic programming for symbolic regression" ?

Q2. What is the next step of the research?

Q3. What is the reason for imposing high selection pressure on the root nodes?

Q4. What is the reason for the proposed modifications of the SNGP algorithm?

Q5. What are the main objectives of the proposed hybrid SNGP?

Q6. What is the SNGP with generational replacement strategy?

Q7. What is the significance level of the t-test?

Q8. What is the way to learn a linear regression model?

Q9. What is the complexity of the LASSO model?

Q10. What is the purpose of this paper?

Q11. What is the residua of the SNGP?

Q12. What is the SNGP algorithm for f2?