scispace - formally typeset
Open AccessJournal ArticleDOI

Learning to Solve Planning Problems Efficiently by Means of Genetic Programming

TLDR
Empirical results show that the proposed approach (EVOCK) is able to evolve heuristics in two planning domains (the blocks world and the logistics domain) that improve PRODIGY4.0 performance.
Abstract
Declarative problem solving, such as planning, poses interesting challenges for Genetic Programming (GP). There have been recent attempts to apply GP to planning that fit two approaches: (a) using GP to search in plan space or (b) to evolve a planner. In this article, we propose to evolve only the heuristics to make a particular planner more efficient. This approach is more feasible than (b) because it does not have to build a planner from scratch but can take advantage of already existing planning systems. It is also more efficient than (a) because once the heuristics have been evolved, they can be used to solve a whole class of different planning problems in a planning domain, instead of running GP for every new planning problem. Empirical results show that our approach (EVOCK) is able to evolve heuristics in two planning domains (the blocks world and the logistics domain) that improve PRODIGY4.0 performance. Additionally, we experiment with a new genetic operator - Instance-Based Crossover - that is able to use traces of the base planner as raw genetic material to be injected into the evolving population.

read more

Content maybe subject to copyright    Report

Learning to Solve Planning Problems E fciently
by Means of Genetic Programming
Ricardo Aler aler@inf.uc3m.es
Department of Computer Science, Universidad Carlos III de Madrid, 28911 Legan´es,
Madrid, Spain
Daniel Borrajo dborrajo@ia.uc3m.es
Department of Computer Science, Universidad Carlos III de Madrid, 28911 Legan´es,
Madrid, Spain
Pedro Isasi dborrajo@ia.uc3m.es
Department of Computer Science, Universidad Carlos III de Madrid, 28911 Lega n´es,
Madrid, Spain
Abstract
Declarative problem solving, such as planning, poses interesting challenges for Genetic
Programming (GP). There hav e been recent attempts to apply GP to planning that t
two approaches: (a) using GP to search in plan space or (b) to evolve a planner. In
this article, we propose to evolve only the heuristics to make a particular planner more
efcient. This approach is more feasible than (b) because it does not have to build a
planner from scratch but can take advantage of already existing planning systems. It
is also more efcient than (a) because once the heuristics hav e b een evolved, they can
be used to solve a whole class of different planning problems in a planning domain,
instead of running GP for every new planning problem. Empirical results show that
our approach (E
VO
CK) is able to evolve heuristic s in two planning domains (the blocks
world and the logistics domain ) that improve P
RODIG Y
4.0 performance. Additionally,
we exp eriment with a new genetic op erator Instance-Based Crossover that is able to
use traces of the base planner as raw genetic material to be injected into the evolving
population.
Keywords
Genetic planning, genetic programming, evolving heuristics, planning, search.
1 Introduction
AI Planners aim to achieve a set of goals, starting from an initial state, by using oper-
ators that represent the available actions of a task domain. Traditional approaches use
domain-independent planners for generating plans (Bonet and Geffner, 1999; Blum and
Furst, 1995; Penberthy and Weld, 1992; Veloso et al., 1995). Some recent approaches to
planning use Genetic Programming (GP). The genetic planning approach was started
by Koza, who evolved a planner that solved a very specic set of problems in the blocks
world domain (Ko za, 1989, 1992). Th e way s GP can be applied to planning can be sum-
marized as follows:
1
1
We follow a classication proposed by Spector (1994). We have added another item to the classication
where our own work is included.
1

To evolve a plan. In this context, a plan can be seen as a program that changes the
initial state of a planning problem (given as input) into the desired or goal state.
Considered as a program, it can be evolv ed by GP.
To evolve a planning program for a particular domain. The planner should be,
in principle, able to solve a set or a pre-dened subset of the problems in the do-
main. Taking this idea t o the extreme, a truly domain-independent planner could
be evolved, although this would require a dauntin g computer effort and seems
currently unfeasible.
To evo lve heuristics to improve the efciency of an already existing planner, which
is able to solve planning problems on its own, but in a very inefcient way.
We will now analyze each alternative in depth.
1.1 Evolving Plans
Handley (1994) used GP to evolve plan s for a specic subset of problems in the blocks
world domain. Muslea (1997) generalized, extended, and formalized this idea for his
SINERGY system and showed how any STRIPS-like planning problem could be tr ans-
lated into an equivalent GP problem. He tested it successfully in several domains of-
fering better performance than UCPOP (Penberthy and Weld, 1992) in difcul t prob-
lems. Westerberg and Levine (2000) followed a similar approach and also reported
good results. An important advantage of this approach is its exibility: planning op-
erators need not be coded using the STRIPS formalism; they can be arbitrary state-
transforming programs. This is also its main drawback, as they cannot use the under-
lying logical representation of the operators to reason about the w orld. For instance,
the tness function relies on a continuous measure of clo seness between the goals and
the current state of the world. If no such measure can be specied , the tness function
will be of little help to GP. This is the case of the “switch goal” mentioned by Handley
(a switch can be on or off, but not “close to on”).
Also, we believe that this kind of genetic planner would be easily deceived by
simplistic tness functions where closeness measures give a false idea of the actual
distance to the desired goal. Consider for instance a robot moving in a labyrinth (the
Euclidean distance to the goal position is very misleading) or the classical 8-puzzle.
Some deliberative planning approaches based on s earch (McDermott, 1999; Bonet and
Geffner, 1999) have to build complex heuristic functions from the domain information
so that good distance estimatio ns to the goal can be made. This show s that the problem
is not a trivial one.
A more important problem is that search has to be done every time a problem
needs to be solved. As GP is a weak search method and planning is NP-hard (Bylander,
1994), it is not expected that this approach will perform well on very big problems. It
is possible though, that the crossover operator is a good heuristic way to explore the
space of plans in some domains, although we do not know of any explicit empirical
support for t his.
1.2 Evolving Domain-Dependent Planners
Koza was the rst to follow this approach by evolving a planner that solved a very
speci c subset of problems in the blocks world domain (Koza, 1992). Spector built a
better system that was able to achieve a range of goal conditions from a range of initial
conditions (Spector, 1994). However, only problems with three and four blocks were
2

tested. It seems that evolving a full-blown planner is a hard task. On the other hand,
this approach solves a pro blem the previous one had, because its tness function eval -
uates many different tness cases (i.e., planning problems). Therefore, even if all the
information w e can get from a single tness case is 1 or 0 (goal solved or not solved),
the tness function can still have a wide range of values to rank a population of indi-
viduals (i.e. ,
solves 3 out of 300 tness cases, 100 of them, etc.).
So, many tness cases may be able to compensate for poor closeness measures. Also,
once a planner for a domain has been learned, only a small amount of search should
be necessary to solve particular problems in the d omain, as opposed to the previous
approach. For instance, it is well known that in the classical blocks world domain, no
search is required for solving problems of any arbitrary size: all the planner has to do
is to move all the blocks to the table and then build the desired towers. Of course, nd-
ing optimum paths is another matter altogether (G upta and Nau, 1992), although there
are simple algorithms that can nd near-optimal plans in the blocks world (Slaney and
Thiebaux, 2001).
1.3 Evolving Planning Heuristics
This is the approach that is described in this article (Aler et al., 1998a; Aler et al.,
1998b), which has been implemented in a system called E
VO
CK (Evolving Control
Knowledge). Instead of evolving the whole domain-dependent planner, we start with
a domain-independent planner. Domain-independent planning is known to be inef-
cient because of the unguide d search it has to carry out. However, d omain-dependent
heuristics can be supplied to the planner, so that it makes informed decisions during
search (i.e., t o prune the search graph).
In this work, we have used GP for evolving such heuristics. We believe that
this task should be easier for GP becaus e classical planners are not brute force prob-
lem solvers but include powerful domain-independent heuristics (such as means-ends
analysis). Therefore, GP has t o do only part of the work. Besides, GP can indirectly use
the reasoning abilities of such planners, which the plan evolvers cannot do. In prac-
tice, thi s approach is like the evolving the planner approach, bu t only a smaller par t
of the planner needs to be evolved the heuristics. Therefore, besides (allegedly) be-
ing an easier problem for GP, it enjoys the advantages of the previous approach: once
a domain-dependent planning system has been found, obtaining plans for individual
problems in that domain should be computationally less expensive than carrying out
the whole search process anew for every planning problem, as the plan evolvers must
do. The basic intuition here is that problems of different size in a domain can use the
same heuristics. For instance, solving a 50-block problem in the blocks world should
need about the same heuristics than solving a 49-block problem. In particular, this is
useful for solving large problems in the domain, wher e a pure search process will get
bogged down. In previous experiments, for example, it has been shown that simple
heuristics scale well in certain domains (Borrajo and Veloso, 1997).
In the GP context, a heuristic is best viewed as follows. Let us suppose that we
already have a program
that could benet from advice give n by a function at some
points in its ex ecution. For instance, if
is a planner that mindlessly searches the state
space by applying planning operators forward, then
could call to get so me advice
about which operator to apply next, instead of applying one at random.
’s input is
(part of) the internal state of
. If is the forward planner mentioned before, would
bene t from having as inputs the current planning situation, the desire d goal(s), and
some additional information about the internal state of
(e.g., what planning situa-
3

tions or nodes have already been explored). At this point, standard GP co uld already
be applied to the problem of evolving
. Conceptually, it is no different than evolv-
ing a function for a wall-following robot or a program for the Santa Fe trail. However,
instead of evaluating
directly, has to be evaluated instead. In the GP jargon,
is a wrapper around . In our work, we have utilized a planning system called
P
ROD IGY
4.0 (Veloso et al., 1995), which is much more sophist icated than the random-
walk planner mentioned above and allows writing heuristics declaratively.
Additionally, evolving heuristics has a characteristic that might be exploited. Com-
plete domain-independent planners can solve any solvable planning problem, given
enough time. Therefore, tness cases (i.e., training planning problems) can be pre-
solved before being supplied to GP for learning. This preprocessing can be useful
for extracting some information about the tness cases that can be used in different
ways. For instance, we can know how much memory or how long it takes the domain-
independent planner to solve the tness case. This can be used in the tness function to
compare the performance of an individual to the bas e planner performance. However,
pre-processing the tness problems offers a more interesting opportunity for evolving
heuristics. Once a tness case has been solved, all the steps and decisions the plan-
ner followed when solving it (i.e., the trace) are available. By analyzing this trace, the
advice
that the planner would have beneted from for solving this problem can be
obtained. Of course, this advice is useful only for this particular tness case, but it
could be used by the GP engine as a starting point to build more general heurist ics.
In this paper, we study a way to inject such “raw advice” into a GP system, without
modifying the GP algorithm substantially. We use the standard crossover operator for
this purpose, but the second parent individual is taken from a non-evolving popula-
tion that contains raw advice heuristic s. We call this operator the instance-based operator
(IBC), because it uses instances previously acquired after analyzing several planning
traces.
2
2 Evolving Planning Heuristics for P
ROD IGY
4.0
As explained before, the goal of this article is to evolv e heuristics
for programs
that can use them. In particular, our framework can be used for those programs that
contain decision or backtracking points. For instance, programs that carry out search in a
state-space are very representative of the latter denition.
In a decision point, a program must choose one alternative from a set of them
before continuing its execution. Usually,
has little or no information about which
alternative is preferable. Depending on the problem, if
makes the wrong decision,
it might never nd a solution. A decision point can also be a backtracking point. This
means t hat if
made the wrong decision at this point, it will eventually backtrack to it
so that another alternative can be tried. In that case,
would have saved time if it had
made the right decision from the start. Heuristics can help such programs
in different
ways. For this paper, we de ne a heuristic
as a function that can help a program
to nd solutions and/or improve efciency. More formally, if a program can choose
from a set of alternatives
at decision point , a heuristic function can be dene d
for that point as:
, where represents the set of possible int ernal states of the program
However, a heuristic function need not return a single decision, but a set of them:
2
In Aler et al. (1998a), we referred to it as a knowledge-based operator, because in a general sense, it uses
knowledge previously acquired by another learning tool.
4

, where represents the set of all s ubsets from ( )
In the latter case,
would make more efcient if for many .
There are other denitions for
that could be used. For instance, a heuristic funct ion
could return an ordered set. Also, heuristics can have other purposes besides improv-
ing performance, like improving the quality of a solution, which might depend on de-
cisions made at certain decision points. But as this article focuses on the
kind of heuristics, we will not make the formalism more complex than necessary.
It is not difc ult to pose the problem of evolving such heuristics in a GP setting.
For instance:
Individuals: programs that represent a heuristic for a decision point . They
take as input (i.e., terminals) features that characterize the internal state
of a
program
and return a decision (or a subset of them) from .
Fitness cases: several problems appropriate for . For instance, if the goal is to
solve a Rubik cube, different starting points could be provided. If the goal is to
solve blocks world problems, different problems made of pairs of initial situations
and goals could be provided.
Fitness function: if the goal is to solve as many problems as possible, the tness
function could evaluate individuals by counting how many tness cases are solved
when
is helped by the individual. If the goal is to improve the efcienc y of ,
then the time/space required to solve the tness cases could be measured . Multi-
objective tness functio ns could try to achieve both goal s at the same time.
The aim of this paper is to evolve heuristics for a planning program called
P
ROD IGY
4.0. P
ROD IGY
4.0 is a STRIPS-based, domain-independent planning system
that carries out bidirectional search in a state space. P
ROD IGY
4.0 inputs are a descrip-
tion of the domain and a planning problem
, where is the initial situation and
is the goal to be solved. A domain description has two main c omponents:
a taxonomy of objects in the domain. For instance, in a logistics transportation
domain (Veloso, 1994), there can be carriers, locations, and packages. In turn, there
can be several types of carriers (ships, planes , trucks, etc.) as well as several types
of locations (airports, post-ofces, ports, etc.).
a list of schema operators for the domain. They are describe d using the
STRIPS syntax, although PDL4.0 (P
ROD IGY
4.0 Description Language) allows for
more complex logical expressions that involve quantiers. For instance, in a lo-
gistics transportation domain, there could be operators for loading a plane, for
unloading it, for moving it to a different locatio n, etc. Schema operators can con-
tain free variables. An schema operator
whose free variables have been
bound by a binding
is called a grounded operator and w ill be represented as .
P
ROD IGY
4.0 output is a plan. Formally, a pla n is a sequence of grounded planning
operators
that transforms into another state where the goal
is fullled:
True
In P
RODIGY
4.0, states
, goals , and operators are represented using extended-
STRIPS. We refer readers to Fikes and Nilsson (1971) for details about STRIPS and
5

Citations
More filters
Proceedings Article

Learning macro-actions for arbitrary planners and domains

TL;DR: This paper presents an offline macro learning method that works with arbitrarily chosen planners and domains, and generates macros from plans of some of the given problems under the guidance of a genetic algorithm.
Journal ArticleDOI

Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms

TL;DR: The main contribution of the paper is to contrast meta-learning and hyper-heuristics methods and concepts, in order to promote awareness and cross-fertilisation of ideas across the (by and large, non-overlapping) different communities of meta- learning andhyper-heuristic researchers.
Journal ArticleDOI

Bounded rationality in agent‐based models: experiments with evolutionary programs

TL;DR: This paper reports on how changing parameters in one variant of evolutionary programming, genetic programming, affects the representation of bounded rationality in software agents.
Journal ArticleDOI

Land use in the southern Yucatán peninsular region of Mexico: Scenarios of population and institutional change

TL;DR: This article examines an application of the Southern Yucatan Peninsular Region Integrated Assessment (SYPRIA), a scenario-based spatially explicit model designed to examine and project land use in Mexico that combines Geographic Information Systems with agent-based modeling, cellular modeling, and genetic programming.
Proceedings ArticleDOI

Correcting and improving imitation models of humans for Robosoccer agents

TL;DR: The aim of this paper is to use machine learning techniques to obtain models of humans playing Robosoccer, and to adapt them against more difficult opponents than the ones beatable by the human.
References
More filters
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book

Genetic Programming: On the Programming of Computers by Means of Natural Selection

TL;DR: This book discusses the evolution of architecture, primitive functions, terminals, sufficiency, and closure, and the role of representation and the lens effect in genetic programming.
Journal ArticleDOI

Strips: A new approach to the application of theorem proving to problem solving

TL;DR: In this paper, the authors describe a problem solver called STRIPS that attempts to find a sequence of operators in a space of world models to transform a given initial world model in which a given goal formula can be proven to be true.
Proceedings Article

Fast planning through planning graph analysis

TL;DR: A new approach to planning in STRIPS-like domains based on constructing and analyzing a compact structure the authors call a Planning Graph is introduced, and a new planner, Graphplan, is described that uses this paradigm.
Book

Genetic Programming II: Automatic Discovery of Reusable Programs

TL;DR: This book presents a method to automatically decompose a program into solvable components, called automatically defined functions (ADF), and then presents case studies of the application of this method to a variety of problems.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions in "Learning to solve planning problems efŽciently by means of genetic programming" ?

In this article, the authors propose to evolve only the heuristics to make a particular planner more efŽcient. Empirical results show that their approach ( EVOCK ) is able to evolve heuristics in two planning domains ( the blocks world and the logistics domain ) that improve PRODIGY4. 

The problems the authors are solving in testing time are hard enough that other modern planners, like UCPOP and GRAPHPLAN, can not solve them either. Additionally, the authors have experimented with a new genetic operator – the InstanceBased Crossover ( IBC ) – that is able to use traces of the base planner as raw genetic material to be injected to the evolving population. Finally, although the authors have focused on STRIPS-planning, they would like to extend their approach to evolving heuristics for other search problems. It may be possi- ble to learn heuristics using the simpler problems as tness cases that could be used to guide the search through program space to solve tougher problems, which is what the authors have done in this article to solve planning problems. 

The most important requirement for the base planner is that it can use heuristics and that they can be loaded into the system easily. 

The related operators (i.e., disjoin and hierarchy specialization) are not included in the operator set because the authors believe that join and hierarchy generalization are good biases for EVOCK. 

Perhaps using Pareto optimization techniques, where selection of the actual best individual is deferred until the end of the run, could be used to solve this problem. 

The authors believe that co-evolution techniques (Berlanga, 2000) and dynamic training subset selection policies (Gathercole and Ross, 1994) could be used for that purpose. 

it was expected that injecting parts of these individuals into the main population could be useful, because it would add useful code that could be crossed over and mutated by the genetic operators. 

AI Planners aim to achieve a set of goals, starting from an initial state, by using operators that represent the available actions of a task domain. 

many such symbolic regression problems can be varied from simpler problems to more dif cult problems, just like in the blocks world, problem dif culty can go from 3 blocks to 50 blocks. 

If the actual difference is smaller or equal to , then the probability of obtaining such a small difference by mere chance is smaller than 0.01,and that hypothesis is rejected.