What have the authors stated for future works in "Learning to solve planning problems efciently by means of genetic programming" ?

The problems the authors are solving in testing time are hard enough that other modern planners, like UCPOP and GRAPHPLAN, can not solve them either. Additionally, the authors have experimented with a new genetic operator – the InstanceBased Crossover ( IBC ) – that is able to use traces of the base planner as raw genetic material to be injected to the evolving population. Finally, although the authors have focused on STRIPS-planning, they would like to extend their approach to evolving heuristics for other search problems. It may be possi- ble to learn heuristics using the simpler problems as tness cases that could be used to guide the search through program space to solve tougher problems, which is what the authors have done in this article to solve planning problems.

What is the important requirement for the base planner?

The most important requirement for the base planner is that it can use heuristics and that they can be loaded into the system easily.

Why are the related operators not included in the operator set?

The related operators (i.e., disjoin and hierarchy specialization) are not included in the operator set because the authors believe that join and hierarchy generalization are good biases for EVOCK.

What could be used to solve this problem?

Perhaps using Pareto optimization techniques, where selection of the actual best individual is deferred until the end of the run, could be used to solve this problem.

What are the main reasons why the authors believe that EVOCK could be used for that purpose?

The authors believe that co-evolution techniques (Berlanga, 2000) and dynamic training subset selection policies (Gathercole and Ross, 1994) could be used for that purpose.

Why was it expected that injecting parts of these individuals into the main population would be useful?

it was expected that injecting parts of these individuals into the main population could be useful, because it would add useful code that could be crossed over and mutated by the genetic operators.

How many symbolic regression problems can be varied?

many such symbolic regression problems can be varied from simpler problems to more dif cult problems, just like in the blocks world, problem dif culty can go from 3 blocks to 50 blocks.

What is the probability of obtaining a small difference by mere chance?

If the actual difference is smaller or equal to , then the probability of obtaining such a small difference by mere chance is smaller than 0.01,and that hypothesis is rejected.

(Open Access) Learning to Solve Planning Problems Efficiently by Means of Genetic Programming (2001) | Ricardo Aler

Q: What are the contributions in "Learning to solve planning problems efciently by means of genetic programming" ?

In this article, the authors propose to evolve only the heuristics to make a particular planner more efcient. Empirical results show that their approach ( EVOCK ) is able to evolve heuristics in two planning domains ( the blocks world and the logistics domain ) that improve PRODIGY4.

Learning to Solve Planning Problems E fciently

by Means of Genetic Programming

Ricardo Aler aler@inf.uc3m.es

Department of Computer Science, Universidad Carlos III de Madrid, 28911 Legan´es,

Madrid, Spain

Daniel Borrajo dborrajo@ia.uc3m.es

Department of Computer Science, Universidad Carlos III de Madrid, 28911 Legan´es,

Madrid, Spain

Pedro Isasi dborrajo@ia.uc3m.es

Department of Computer Science, Universidad Carlos III de Madrid, 28911 Lega n´es,

Madrid, Spain

Abstract

Declarative problem solving, such as planning, poses interesting challenges for Genetic

Programming (GP). There hav e been recent attempts to apply GP to planning that t

two approaches: (a) using GP to search in plan space or (b) to evolve a planner. In

this article, we propose to evolve only the heuristics to make a particular planner more

efcient. This approach is more feasible than (b) because it does not have to build a

planner from scratch but can take advantage of already existing planning systems. It

is also more efcient than (a) because once the heuristics hav e b een evolved, they can

be used to solve a whole class of different planning problems in a planning domain,

instead of running GP for every new planning problem. Empirical results show that

our approach (E

CK) is able to evolve heuristic s in two planning domains (the blocks

world and the logistics domain ) that improve P

RODIG Y

4.0 performance. Additionally,

we exp eriment with a new genetic op erator – Instance-Based Crossover – that is able to

use traces of the base planner as raw genetic material to be injected into the evolving

population.

Keywords

Genetic planning, genetic programming, evolving heuristics, planning, search.

1 Introduction

AI Planners aim to achieve a set of goals, starting from an initial state, by using oper-

ators that represent the available actions of a task domain. Traditional approaches use

domain-independent planners for generating plans (Bonet and Geffner, 1999; Blum and

Furst, 1995; Penberthy and Weld, 1992; Veloso et al., 1995). Some recent approaches to

planning use Genetic Programming (GP). The genetic planning approach was started

by Koza, who evolved a planner that solved a very specic set of problems in the blocks

world domain (Ko za, 1989, 1992). Th e way s GP can be applied to planning can be sum-

marized as follows:

We follow a classication proposed by Spector (1994). We have added another item to the classication

where our own work is included.

To evolve a plan. In this context, a plan can be seen as a program that changes the

initial state of a planning problem (given as input) into the desired or goal state.

Considered as a program, it can be evolv ed by GP.

To evolve a planning program for a particular domain. The planner should be,

in principle, able to solve a set or a pre-dened subset of the problems in the do-

main. Taking this idea t o the extreme, a truly domain-independent planner could

be evolved, although this would require a dauntin g computer effort and seems

currently unfeasible.

To evo lve heuristics to improve the efciency of an already existing planner, which

is able to solve planning problems on its own, but in a very inefcient way.

We will now analyze each alternative in depth.

1.1 Evolving Plans

Handley (1994) used GP to evolve plan s for a specic subset of problems in the blocks

world domain. Muslea (1997) generalized, extended, and formalized this idea for his

SINERGY system and showed how any STRIPS-like planning problem could be tr ans-

lated into an equivalent GP problem. He tested it successfully in several domains of-

fering better performance than UCPOP (Penberthy and Weld, 1992) in difcul t prob-

lems. Westerberg and Levine (2000) followed a similar approach and also reported

good results. An important advantage of this approach is its exibility: planning op-

erators need not be coded using the STRIPS formalism; they can be arbitrary state-

transforming programs. This is also its main drawback, as they cannot use the under-

lying logical representation of the operators to reason about the w orld. For instance,

the tness function relies on a continuous measure of clo seness between the goals and

the current state of the world. If no such measure can be specied , the tness function

will be of little help to GP. This is the case of the “switch goal” mentioned by Handley

(a switch can be on or off, but not “close to on”).

Also, we believe that this kind of genetic planner would be easily deceived by

simplistic tness functions where closeness measures give a false idea of the actual

distance to the desired goal. Consider for instance a robot moving in a labyrinth (the

Euclidean distance to the goal position is very misleading) or the classical 8-puzzle.

Some deliberative planning approaches based on s earch (McDermott, 1999; Bonet and

Geffner, 1999) have to build complex heuristic functions from the domain information

so that good distance estimatio ns to the goal can be made. This show s that the problem

is not a trivial one.

A more important problem is that search has to be done every time a problem

needs to be solved. As GP is a weak search method and planning is NP-hard (Bylander,

1994), it is not expected that this approach will perform well on very big problems. It

is possible though, that the crossover operator is a good heuristic way to explore the

space of plans in some domains, although we do not know of any explicit empirical

support for t his.

1.2 Evolving Domain-Dependent Planners

Koza was the rst to follow this approach by evolving a planner that solved a very

speci c subset of problems in the blocks world domain (Koza, 1992). Spector built a

better system that was able to achieve a range of goal conditions from a range of initial

conditions (Spector, 1994). However, only problems with three and four blocks were

tested. It seems that evolving a full-blown planner is a hard task. On the other hand,

this approach solves a pro blem the previous one had, because its tness function eval -

uates many different tness cases (i.e., planning problems). Therefore, even if all the

information w e can get from a single tness case is 1 or 0 (goal solved or not solved),

the tness function can still have a wide range of values to rank a population of indi-

viduals (i.e. ,

solves 3 out of 300 tness cases, 100 of them, etc.).

So, many tness cases may be able to compensate for poor closeness measures. Also,

once a planner for a domain has been learned, only a small amount of search should

be necessary to solve particular problems in the d omain, as opposed to the previous

approach. For instance, it is well known that in the classical blocks world domain, no

search is required for solving problems of any arbitrary size: all the planner has to do

is to move all the blocks to the table and then build the desired towers. Of course, nd-

ing optimum paths is another matter altogether (G upta and Nau, 1992), although there

are simple algorithms that can nd near-optimal plans in the blocks world (Slaney and

Thiebaux, 2001).

1.3 Evolving Planning Heuristics

This is the approach that is described in this article (Aler et al., 1998a; Aler et al.,

1998b), which has been implemented in a system called E

CK (Evolving Control

Knowledge). Instead of evolving the whole domain-dependent planner, we start with

a domain-independent planner. Domain-independent planning is known to be inef-

cient because of the unguide d search it has to carry out. However, d omain-dependent

heuristics can be supplied to the planner, so that it makes informed decisions during

search (i.e., t o prune the search graph).

In this work, we have used GP for evolving such heuristics. We believe that

this task should be easier for GP becaus e classical planners are not brute force prob-

lem solvers but include powerful domain-independent heuristics (such as means-ends

analysis). Therefore, GP has t o do only part of the work. Besides, GP can indirectly use

the reasoning abilities of such planners, which the plan evolvers cannot do. In prac-

tice, thi s approach is like the “evolving the planner” approach, bu t only a smaller par t

of the planner needs to be evolved – the heuristics. Therefore, besides (allegedly) be-

ing an easier problem for GP, it enjoys the advantages of the previous approach: once

a domain-dependent planning system has been found, obtaining plans for individual

problems in that domain should be computationally less expensive than carrying out

the whole search process anew for every planning problem, as the plan evolvers must

do. The basic intuition here is that problems of different size in a domain can use the

same heuristics. For instance, solving a 50-block problem in the blocks world should

need about the same heuristics than solving a 49-block problem. In particular, this is

useful for solving large problems in the domain, wher e a pure search process will get

bogged down. In previous experiments, for example, it has been shown that simple

heuristics scale well in certain domains (Borrajo and Veloso, 1997).

In the GP context, a heuristic is best viewed as follows. Let us suppose that we

already have a program

that could benet from advice give n by a function at some

points in its ex ecution. For instance, if

is a planner that mindlessly searches the state

space by applying planning operators forward, then

could call to get so me advice

about which operator to apply next, instead of applying one at random.

’s input is

(part of) the internal state of

. If is the forward planner mentioned before, would

bene t from having as inputs the current planning situation, the desire d goal(s), and

some additional information about the internal state of

(e.g., what planning situa-

tions or nodes have already been explored). At this point, standard GP co uld already

be applied to the problem of evolving

. Conceptually, it is no different than evolv-

ing a function for a wall-following robot or a program for the Santa Fe trail. However,

instead of evaluating

directly, has to be evaluated instead. In the GP jargon,

is a wrapper around . In our work, we have utilized a planning system called

ROD IGY

4.0 (Veloso et al., 1995), which is much more sophist icated than the random-

walk planner mentioned above and allows writing heuristics declaratively.

Additionally, evolving heuristics has a characteristic that might be exploited. Com-

plete domain-independent planners can solve any solvable planning problem, given

enough time. Therefore, tness cases (i.e., training planning problems) can be pre-

solved before being supplied to GP for learning. This preprocessing can be useful

for extracting some information about the tness cases that can be used in different

ways. For instance, we can know how much memory or how long it takes the domain-

independent planner to solve the tness case. This can be used in the tness function to

compare the performance of an individual to the bas e planner performance. However,

pre-processing the tness problems offers a more interesting opportunity for evolving

heuristics. Once a tness case has been solved, all the steps and decisions the plan-

ner followed when solving it (i.e., the trace) are available. By analyzing this trace, the

advice

that the planner would have beneted from for solving this problem can be

obtained. Of course, this advice is useful only for this particular tness case, but it

could be used by the GP engine as a starting point to build more general heurist ics.

In this paper, we study a way to inject such “raw advice” into a GP system, without

modifying the GP algorithm substantially. We use the standard crossover operator for

this purpose, but the second parent individual is taken from a non-evolving popula-

tion that contains raw advice heuristic s. We call this operator the instance-based operator

(IBC), because it uses instances previously acquired after analyzing several planning

traces.

2 Evolving Planning Heuristics for P

ROD IGY

4.0

As explained before, the goal of this article is to evolv e heuristics

for programs

that can use them. In particular, our framework can be used for those programs that

contain decision or backtracking points. For instance, programs that carry out search in a

state-space are very representative of the latter denition.

In a decision point, a program must choose one alternative from a set of them

before continuing its execution. Usually,

has little or no information about which

alternative is preferable. Depending on the problem, if

makes the wrong decision,

it might never nd a solution. A decision point can also be a backtracking point. This

means t hat if

made the wrong decision at this point, it will eventually backtrack to it

so that another alternative can be tried. In that case,

would have saved time if it had

made the right decision from the start. Heuristics can help such programs

in different

ways. For this paper, we de ne a heuristic

as a function that can help a program

to nd solutions and/or improve efciency. More formally, if a program can choose

from a set of alternatives

at decision point , a heuristic function can be dene d

for that point as:

, where represents the set of possible int ernal states of the program

However, a heuristic function need not return a single decision, but a set of them:

In Aler et al. (1998a), we referred to it as a knowledge-based operator, because in a general sense, it uses

knowledge previously acquired by another learning tool.

, where represents the set of all s ubsets from ( )

In the latter case,

would make more efcient if for many .

There are other denitions for

that could be used. For instance, a heuristic funct ion

could return an ordered set. Also, heuristics can have other purposes besides improv-

ing performance, like improving the quality of a solution, which might depend on de-

cisions made at certain decision points. But as this article focuses on the

kind of heuristics, we will not make the formalism more complex than necessary.

It is not difc ult to pose the problem of evolving such heuristics in a GP setting.

For instance:

Individuals: programs that represent a heuristic for a decision point . They

take as input (i.e., terminals) features that characterize the internal state

of a

program

and return a decision (or a subset of them) from .

Fitness cases: several problems appropriate for . For instance, if the goal is to

solve a Rubik cube, different starting points could be provided. If the goal is to

solve blocks world problems, different problems made of pairs of initial situations

and goals could be provided.

Fitness function: if the goal is to solve as many problems as possible, the tness

function could evaluate individuals by counting how many tness cases are solved

when

is helped by the individual. If the goal is to improve the efcienc y of ,

then the time/space required to solve the tness cases could be measured . Multi-

objective  tness functio ns could try to achieve both goal s at the same time.

The aim of this paper is to evolve heuristics for a planning program called

ROD IGY

4.0. P

ROD IGY

4.0 is a STRIPS-based, domain-independent planning system

that carries out bidirectional search in a state space. P

ROD IGY

4.0 inputs are a descrip-

tion of the domain and a planning problem

, where is the initial situation and

is the goal to be solved. A domain description has two main c omponents:

a taxonomy of objects in the domain. For instance, in a logistics transportation

domain (Veloso, 1994), there can be carriers, locations, and packages. In turn, there

can be several types of carriers (ships, planes , trucks, etc.) as well as several types

of locations (airports, post-ofces, ports, etc.).

a list of schema operators for the domain. They are describe d using the

STRIPS syntax, although PDL4.0 (P

ROD IGY

4.0 Description Language) allows for

more complex logical expressions that involve quantiers. For instance, in a lo-

gistics transportation domain, there could be operators for loading a plane, for

unloading it, for moving it to a different locatio n, etc. Schema operators can con-

tain free variables. An schema operator

whose free variables have been

bound by a binding

is called a grounded operator and w ill be represented as .

ROD IGY

4.0 output is a plan. Formally, a pla n is a sequence of grounded planning

operators

that transforms into another state where the goal

is fullled:

True

In P

RODIGY

4.0, states

, goals , and operators are represented using extended-

STRIPS. We refer readers to Fikes and Nilsson (1971) for details about STRIPS and

Learning to Solve Planning Problems Efficiently by Means of Genetic Programming

Figures

Citations

Learning macro-actions for arbitrary planners and domains

Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms

Bounded rationality in agent‐based models: experiments with evolutionary programs

Land use in the southern Yucatán peninsular region of Mexico: Scenarios of population and institutional change

Correcting and improving imitation models of humans for Robosoccer agents

References

C4.5: Programs for Machine Learning

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Strips: A new approach to the application of theorem proving to problem solving

Fast planning through planning graph analysis

Genetic Programming II: Automatic Discovery of Reusable Programs

Related Papers (5)

Lazy incremental learning of control knowledge for efficiently obtaining quality plans

Learning action strategies for planning domains using genetic programming

Using genetic programming to learn and improve control knowledge

The FF planning system: fast plan generation through heuristic search

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Frequently Asked Questions (10)

Q1. What are the contributions in "Learning to solve planning problems efciently by means of genetic programming" ?

Q2. What have the authors stated for future works in "Learning to solve planning problems efciently by means of genetic programming" ?

Q3. What is the important requirement for the base planner?

Q4. Why are the related operators not included in the operator set?

Q5. What could be used to solve this problem?

Q6. What are the main reasons why the authors believe that EVOCK could be used for that purpose?

Q7. Why was it expected that injecting parts of these individuals into the main population would be useful?

Q8. What is the purpose of a planner?

Q9. How many symbolic regression problems can be varied?

Q10. What is the probability of obtaining a small difference by mere chance?

Learning to Solve Planning Problems Efficiently by Means of Genetic Programming

Figures

Citations

Learning macro-actions for arbitrary planners and domains

Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms

Bounded rationality in agent‐based models: experiments with evolutionary programs

Land use in the southern Yucatán peninsular region of Mexico: Scenarios of population and institutional change

Correcting and improving imitation models of humans for Robosoccer agents

References

C4.5: Programs for Machine Learning

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Strips: A new approach to the application of theorem proving to problem solving

Fast planning through planning graph analysis

Genetic Programming II: Automatic Discovery of Reusable Programs

Related Papers (5)

Lazy incremental learning of control knowledge for efficiently obtaining quality plans

Learning action strategies for planning domains using genetic programming

Using genetic programming to learn and improve control knowledge

The FF planning system: fast plan generation through heuristic search

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Frequently Asked Questions (10)

Q1. What are the contributions in "Learning to solve planning problems efciently by means of genetic programming" ?

Q2. What have the authors stated for future works in "Learning to solve planning problems efciently by means of genetic programming" ?

Q3. What is the important requirement for the base planner?

Q4. Why are the related operators not included in the operator set?

Q5. What could be used to solve this problem?

Q6. What are the main reasons why the authors believe that EVOCK could be used for that purpose?

Q7. Why was it expected that injecting parts of these individuals into the main population would be useful?

Q8. What is the purpose of a planner?

Q9. How many symbolic regression problems can be varied?

Q10. What is the probability of obtaining a small difference by mere chance?

Q1. What are the contributions in "Learning to solve planning problems efciently by means of genetic programming" ?

Q2. What have the authors stated for future works in "Learning to solve planning problems efciently by means of genetic programming" ?