What are the contributions in "A bayesian optimization algorithm for the nurse scheduling problem" ?

A Bayesian optimization algorithm for the nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse ’ s assignment. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. It is also suggested that the learning mechanism in the proposed approach might be suitable for other scheduling problems.

(Open Access) A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem (2003) | Jingpeng Li

A Bayesian Optimization Algorithm for the Nurse Scheduling Problem

Proceedings of 2003 Congress on Evolutionary Computation (CEC2003), pp. 2149-2156, IEEE Press, Canberra, Australia, 2003.

Jingpeng Li

School of Computer Science

University of Nottingham

NG8 1BB UK

jpl@cs.nott.ac.uk

Uwe Aickelin

School of Computer Science

University of Nottingham

NG8 1BB UK

uxa@cs.nott.ac.uk

Abstract- A Bayesian optimization algorithm for the

nurse scheduling problem is presented, which involves

choosing a suitable scheduling rule from a set for each

nurse’s assignment. Unlike our previous work that

used GAs to implement implicit learning, the learning

in the proposed algorithm is explicit, i.e. eventually, we

will be able to identify and mix building blocks

directly. The Bayesian optimization algorithm is

applied to implement such explicit learning by

building a Bayesian network of the joint distribution

of solutions. The conditional probability of each

variable in the network is computed according to an

initial set of promising solutions. Subsequently, each

new instance for each variable is generated by using

the corresponding conditional probabilities, until all

variables have been generated, i.e. in our case, a new

rule string has been obtained. Another set of rule

strings will be generated in this way, some of which

will replace previous strings based on fitness selection.

If stopping conditions are not met, the conditional

probabilities for all nodes in the Bayesian network are

updated again using the current set of promising rule

strings. Computational results from 52 real data

instances demonstrate the success of this approach. It

is also suggested that the learning mechanism in the

proposed approach might be suitable for other

scheduling problems.

1 Introduction

Scheduling problems are generally NP-hard combinatorial

problems, and a lot of research has been done to solve

these problems heuristically (Aickelin and Dowsland,

2002 and 2003; Li and Kwan, 2001a and 2003). However,

most previous approaches are problem-specific and

research into the development of a general scheduling

algorithm is still in its infancy.

Genetic Algorithms (GAs) (Holland 1975; Goldberg

1989), mimicking the natural evolutionary process of the

survival of the fittest, have attracted much attention in

solving difficult scheduling problems in recent years.

Some obstacles exist when using GAs: there is no

canonical mechanism to deal with constraints, which are

commonly met in most real-world scheduling problems,

and small changes to a solution are difficult. To overcome

both difficulties, indirect approaches have been presented

(Aickelin and Dowsland, 2003; Li and Kwan, 2001b and

2003) for nurse and driver scheduling. In these indirect

GAs, the solution space is mapped and then a separate

decoding routine builds solutions to the original problem.

In our previous indirect GAs, learning was implicit

(‘black-box’) and restricted to the efficient adjustment of

weights for a set of rules that are used to construct

schedules. The major limitation of those approaches is

that they learn in a non-human way. Like most existing

construction algorithms, once the best weight combination

is found, the rules used in the construction process are

fixed at each iteration. However, normally a long

sequence of moves is needed to construct a schedule and

using fixed rules at each move is thus unreasonable and

not coherent with the human learning processes.

When a human scheduler works, he normally builds a

schedule systematically following a set of rules. After

much practice, the scheduler gradually masters the

knowledge of which solution parts go well with others. He

can identify good parts and is aware of the solution

quality even if the scheduling process is not completed

yet, thus having the ability to finish a schedule by using

flexible, rather than fixed, rules. In this paper, we design a

more human-like scheduling algorithm, by using a

Bayesian optimization algorithm to implement explicit

learning from past solutions. A nurse scheduling problem

with 52 real data instances gathered from a UK hospital is

used as the test problem.

Nurse scheduling has been widely studied in recent

years, and an extensive summary of the approaches can be

found in Hung (1995) and Sitompul and Randhawa

(1990). This problem is highly constrained, making it

extremely difficult for most local search algorithms to

find feasible solutions, let alone optimal ones. In our

nurse scheduling problem, the number of the nurses is

fixed (up to 30), and the target is to create a weekly

schedule by assigning each nurse one out of up to 411

shift patterns in the most efficient way. The proposed

Bayesian approach achieves this by choosing a suitable

rule, from a rule set containing a number of available

rules, for each nurse. A potential solution is therefore

represented as a rule string, or a sequence of rules

corresponding to nurses from the first one to the last.

As a model of the selected strings, a Bayesian network

(Pearl 1998) is used in the proposed Bayesian

optimization algorithm to solve the nurse scheduling

problem. A Bayesian network is a directed acyclic graph

with each node corresponding to one variable, and each

variable corresponding to the individual rule by which a

schedule will be constructed step by step. The causal

relationship between two variables is represented by a

directed edge between the two corresponding nodes.

The Bayesian optimization algorithm is applied to

learn to identify good partial solutions and to complete

them by building a Bayesian network of the joint

distribution of solutions (Pelikan et al, 1999; Pelikan and

Goldberg, 2000). The conditional probabilities are

computed according to an initial set of promising

solutions. Subsequently, each new instance for each node

is generated by using the corresponding conditional

probabilities, until values for all nodes have been

generated, i.e. a new rule string has been generated.

Another set of rule strings will be generated in the

same way, some of which will replace previous strings

based on roulette-wheel fitness selection. If stopping

conditions are not met, the conditional probabilities for all

nodes in the Bayesian network are updated again using

the current set of rule strings. The algorithm thereby tries

to explicitly identify and mix promising building blocks.

It should be noted that for most scheduling problems,

the structure of the network model is known and all

variables are fully observed. In this case, the goal of

learning is to find the rule values that maximize the

likelihood of the training data. Thus, learning can amount

to ‘counting’ in the case of multinomial distributions.

The rest of this paper is organized as follows. Section 2

gives an overview on the nurse scheduling problem, and

the following section 3 introduces the general concepts

about graphical models and Bayesian networks. Section 4

discuses the proposed Bayesian optimization algorithm,

describing the construction of a Bayesian network,

learning based on the Bayesian network, and the four

building rules in detail. Computational results using 52

data instances gathered from a UK hospital are presented

in section 5. Concluding remarks are in section 6.

2 The Nurse Scheduling Problem

2.1 General Problem

Our nurse scheduling problem is to create weekly

schedules for wards of nurses by assigning one of a

number of possible shift patterns to each nurse. These

schedules have to satisfy working contracts and meet the

demand for a given number of nurses of different grades

on each shift, while being seen to be fair by the staff

concerned. The latter objective is achieved by meeting as

many of the nurses’ requests as possible and considering

historical information to ensure that unsatisfied requests

and unpopular shifts are evenly distributed.

The problem is complicated by the fact that higher

qualified nurses can substitute less qualified nurses but

not vice versa. Thus scheduling the different grades

independently is not possible. Furthermore, the problem

has a special day-night structure as most of the nurses are

contracted to work either days or nights in one week but

not both. However due to working contracts, the number

of days worked is not usually the same as the number of

nights. Therefore, it becomes important to schedule the

‘correct’ nurses onto days and nights respectively. The

latter two characteristics make this problem challenging

for any local search algorithm, because finding and

maintaining feasible solutions is extremely difficult.

The numbers of days or nights to be worked by each

nurse defines the set of feasible weekly work patterns for

that nurse. These will be referred to as shift patterns or

shift pattern vectors in the following. For each nurse i and

each shift pattern j all the information concerning the

desirability of the pattern for this nurse is captured in a

single numeric preference cost p

. These costs were

determined in close consultation with the hospital and are

a weighted sum of the following factors: basic shift-

pattern cost, general day/night preferences, specific

requests, continuity problems, number of successive

working day, rotating nights/weekends and other working

history information. Patterns that violate mandatory

contractual requirements are marked as infeasible for a

particular nurse and week by giving them a suitably high

value.

2.2 Integer Programming

The problem can be formulated as an integer linear

program as follows.

Indices:

i = 1...n nurse index;

j = 1...m shift pattern index;

k = 1...14 day and night index (1...7 are days and 8...14

are nights);

s = 1...p grade index.

Decision variables:







else ,0

pattern shift works nurse 1, ji

Parameters:

m = Number of shift patterns;

n = Number of nurses;

p = Number of grades;







else ,0

day/night covers pattern shift 1, kj

;







else ,0

higheror grade of is nurse 1, si

;

= Preference cost of nurse i working shift pattern j;

= Demand of nurses with grade s on day/night k;

= Working shifts per week of nurse i if night shifts are

worked;

= Working shifts per week of nurse i if day shifts are

worked;

= Working shifts per week of nurse i if both day and

night shifts are worked {for special nurses};

F(i) = Set of feasible shift patterns for nurse i, where F(i)

is defined as

shifts combined ,

shiftsnight ,

shiftsday ,

)(

jBa

jNa

jDa

ijk

∀





















∈∀=

∑

(1)

Target function:

Minimize total preference cost of all nurses, denoted as

min!

1 )(

→

∑ ∑

= ∈

iFj

ijij

. (2)

Subject to:

1. Every nurse works exactly one feasible shift pattern:

iFj

∀=

∑

∈

)(

; (3)

2. The demand for nurses is fulfilled for every grade on

every day and night:

∑ ∑

∈ =

∀≥

)( 1

iFj

ksijjkis

skRxaq

(4)

Constraint set (3) ensures that every nurse works

exactly one shift pattern from his/her feasible set, and

constraint set (4) ensures that the demand for nurses is

covered for every grade on every day and night. Note that

the definition of q

is such that higher graded nurses can

substitute those at lower grades if necessary.

Typical problem dimensions are 30 nurses of three

grades and 400 shift patterns. Thus, the Integer

Programming formulation has about 12000 binary

variables and 100 constraints. This is a moderately sized

problem. However, some problem cases remain unsolved

after overnight computation using professional software.

3 Graphical Models and Bayesian Networks

In this section, we introduce concepts from graphical

models in general and Bayesian networks in particular.

Section 4 will then explain how we applied these concepts

to our nurse scheduling problem.

Graphical models are graphs in which nodes represent

random variables, and the lack of edges represents

conditional independence assumptions (Edwards 2000).

They have important applications in many multivariate

probabilistic systems in fields such as statistics, systems

engineering, information theory and pattern recognition.

In particular, they are playing an increasingly important

role in the design and analysis of machine learning

algorithms.

As described by Jordon (1999), graphical models are a

marriage between probability theory and graph theory.

They provide a natural tool for dealing with uncertainty

and complexity that occur throughout applied

mathematics and engineering. In a graphical model, the

fundamental notion of modularity is used to build a

complex system by combining simpler parts. Probability

theory provides the glue to combine the parts, ensuring

that the whole system is consistent, and providing ways to

interface models to data. The graph theory provides an

intuitively appealing interface by which humans can

model highly interacting sets of variables, and a data

structure that leads itself naturally to the design of

general-purpose algorithms.

There are two main kinds of graphical models:

undirected and directed. Undirected graphical models are

more popular with the physics and vision communities.

Directed graphical model, also called Bayesian networks,

are more popular with the artificial intelligence and

machine learning communities. Bayesian networks are

often used to model multinomial data with both discrete

and continuous variables by encoding the relationship

between the variables contained in the modelled data,

which represents the structure of a problem.

Moreover, Bayesian networks can be used to generate

new instances of the variables with similar properties as

those of given data. Each node in the network corresponds

to one variable, and each variable corresponds to one

position in the strings representing the solutions. The

relationship between two variables is represented by a

directed edge between the two corresponding nodes.

Any complete probabilistic model of a domain must

represent the joint distribution, the probability of every

possible event as defined by the values of all the variables.

The number of such events is exponential. To achieve

compactness, Bayesian networks factor the joint

distribution into local conditional distributions for each

variable given its parents.

Mathematically, an acyclic Bayesian network encodes

a full joint probability distribution by the product

))(|(),...,(

1 ∏

iin

XpaxPxxP

, (5)

where x

denotes some values of the variable X

, pa(X

)

denotes a set of values for parents of X

in the network

(the set of nodes from which there exists an individual

edge to X

), and P(x

| pa(X

)) denotes the conditional

probability of X

conditioned on variables pa(X

). This

distribution can be used to generate new instances using

the marginal and conditional probabilities.

4 A Bayesian Optimization Algorithm for

Nurse Scheduling

This section discusses the proposed Bayesian optimization

algorithm for the nurse scheduling problem, including the

construction of a Bayesian network, learning based on the

Bayesian network and the four building rules used.

4.1 The Construction of a Bayesian Network

In our nurse scheduling problem, the number of the nurse

is fixed (up to 30), and the target is to create a weekly

schedule by assigning each nurse one shift pattern in the

most efficient way. The proposed approach achieves this

by using one suitable rule, from a rule set that contains a

number of available rules, for each nurse’s assignment.

Thus, a potential solution is represented as a rule string, or

a sequence of rules corresponding to nurses from the first

one to the last one individually.

We chose this approach, as the longer-term aim of our

research is to model the explicit learning of a human

scheduler. Human schedulers can provide high quality

solutions, but the task is tedious and often requires a large

amount of time. Typically, they construct schedules based

on rules learnt during scheduling. Due to human

limitations, these rules are typically simple. Hence, our

rules will be relatively simple, too. Nevertheless, human

generated schedules are of high quality due to the ability

of the scheduler to switch between the rules, based on the

state of the current solution. We envisage the Bayesian

optimisation algorithm to perform this role.

Figure 1: A Bayesian network for nurse scheduling

Figure 1 is the Bayesian network constructed for the

nurse scheduling problem, which is a hierarchical and

acyclic directed graph representing the solution structure

of the problem.

The node

}),...,2,1{};,...,2,1{( njmiN

∈∈

in the network

denotes that nurse i is assigned using rule j, where m is the

number of nurses to be scheduled and n is the number of

rules to be used in the building process. The directed edge

from node N

to node N

i+1,j’

denotes a causal relationship

of “N

causing N

i+1,j’

”. In our particular implementation,

an edge denotes a construction unit (or rule sub-string) for

nurse i where the previous rule is j and the current rule is

j’. In this network, a possible solution (a complete rule

string) is represented as a directed path from nurse 1 to

nurse m connecting m nodes.

4.2 Learning based on the Bayesian Network

According to whether the structure (topology) of the

model is known or unknown, and whether all variables are

fully observed or some of them are hidden, there are four

kinds of learning (Heckerman 1998). According to

Heckerman, the learning process for the proposed

approach belongs to the category of “known structure and

full observation,” and the learning goal is to find the

variable values of all nodes N

that maximize the

likelihood of the training date containing T independent

cases.

In the proposed approach, learning amounts to

counting and hence we use the symbol ‘#’ meaning ‘the

number of’ in the following equations. It calculates the

conditional probabilities of each possible value for each

node given all possible values of its parents. For example,

for node N

i+1,j’

with a parent node N

, its conditional

probability is

),(#),(#

),(#

)(

),(

)|(

,1,1

trueNfalseNtrueNtrueN

trueNtrueN

NNP

ijjiijji

ijji

==+==

′+′+

′+

.(6)

Note that nodes N

have no parents. In this

circumstance, their probabilities are computed as

trueN

falseNtrueN

trueN

)(#

)(#)(#

)(#

)(

=+=

.(7)

These probability values can be used to generate new

rule strings, or new solutions. Since the first rule in a

solution has no parents, it will be chosen from nodes N

according to their probabilities. The next rule will be

chosen from nodes N

according to their probabilities

conditioned on the previous nodes. This building process

is repeated until the last node has been chosen from nodes

, where m is number of the nurses. A link from nurse 1

to nurse m is thus created, representing a new possible

solution. Since all the probability values are normalized,

the roulette-wheel method is good strategy for rule

selection.

For clarity, consider the following toy example of

scheduling five nurses with two rules (1: random

allocation, 2: allocate nurse to low-cost shifts). In the

beginning of the search, the probabilities of choosing rule

1 or 2 for each nurse is equal, i.e. 50%. After a few

iterations, due to the selection pressure and reinforcement

learning, we experience two solution pathways: Because

pure low-cost or random allocation produces low quality

solutions, either rule 1 is used for the first 2-3 nurses and

rule 2 on remainder or vice versa. In essence, BOA learns

‘use rule 2 after 2-3x using rule 1’ or vice versa.

4.3 A Bayesian Optimization Algorithm

Based on the estimation of conditional probabilities, this

section introduces a Bayesian optimization algorithm for

the nurse scheduling problem. It uses techniques from the

field of modelling data by Bayesian networks to estimate

the joint distribution of promising solutions. The nodes, or

variables, in the Bayesian network correspond to the

individual rules by which a schedule will be built step by

step.

In the proposed Bayesian optimization algorithm, the

first population of rule strings is generated at random.

From the current population, a set of better rule strings is

. . .

. . . . . .

. . .

1,n

2,n

3,n

m-1,1

m-1,2

m-1,n

m,1

m,2

m,n

selected. Any selection method biased towards better

fitness can be used, and in this paper, the traditional

roulette-wheel Selection is applied. The conditional

probabilities of each node in the Bayesian network are

computed. New rule strings are generated by using these

conditional probability values, and are added into the old

population, replacing some of the old rule strings. In more

detail, the steps of the Bayesian optimization algorithm

for nurse scheduling are:

1. Set t = 0, and generate an initial population P(0) at

random;

2. Use roulette-wheel to select a set of promising

rule strings S(t) from P(t);

3. Compute the conditional probabilities of each

node according to this set of promising solutions ;

4. For the assignment of each nurse, the roulette-

wheel method is used to select one rule according

to the conditional probabilities of all available

nodes, thus obtaining a new rule string. A set of

new rule strings O(t) will be generated in this

way;

5. Create a new population P(t+1) by replacing some

rule strings from P(t) with O(t), and set t = t+1;

6. If the termination conditions are not met (we use

2000 generations), go to step 2.

4.4 Four Building Rules

Similar to the working pattern of a human scheduler, the

proposed schedule-constructing process uses a set of rules

to build a schedule step by step. As far as the domain

knowledge of nurse scheduling is concerned, the

following four rules are currently investigated.

4.4.1 Random Rule

The first rule, called ‘Random’ rule, is used to select a

nurse’s shift pattern at random. Its purpose is to introduce

randomness into the search thus enlarging the search

space, and most importantly to ensure that the proposed

algorithm has the ability to escape from local optimum.

This rule mirrors much of a scheduler’s creativeness to

come up with different solutions if required.

4.4.2 k-Cheapest Rule

The second rule is the ‘k-Cheapest’ rule. Disregarding the

feasibility of the schedule, it randomly selects a shift

pattern from a k-length list containing patterns with k-

cheapest cost p

, in an effort to reduce the total cost of a

schedule as more as possible.

4.4.3 Cover Rule

Compared with the first two rules, the ‘Cover’ rule and

last 'Contribution’ rule are relatively more complicated.

The third ‘Cover’ rule is designed to consider only the

feasibility of the schedule. It schedules one nurse at a time

in such a way as to cover those days and nights with the

highest number of uncovered shifts.

The ‘Cover’ rule constructs solutions as follows. For

each shift pattern in a nurse’s feasible set, calculate the

total number of uncovered shifts and would be covered if

the nurse worked that shift pattern. For simplicity, this

calculation does not take into account how many nurses

are still required in a particular shift. For instance, assume

that a shift pattern covers Monday to Friday nights.

Further assume that the current requirements for the

nights from Monday to Sunday are as follows: (-3, 0, +1, -

2, -1, -2, 0), where a negative number means undercover

and a positive over cover. The Monday to Friday shift

pattern hence has a cover value of 3, as the most negative

value it covers is -3. In this example, a Tuesday to

Saturday pattern would have a value of 2.

In order to ensure that high-grade nurses are not

‘wasted’ covering unnecessarily for nurses of lower

grades, for nurses of grade s, only the shifts requiring

grade s nurses are counted as long as there is a single

uncovered shift for this grade. If all these are covered,

shifts of the next lower grade are considered and once

these are filled those of the next lower grade. Due to the

nature of this approach, nurses’ preference costs p

are not

taken into account by this rule. However, they will

influence decisions indirectly via the fitness function.

Hence, the ‘Cover’ rule can be summarised as finding

those shift patterns with corresponding largest amount of

undercover.

4.4.4 Contribution Rule

The fourth rule, called ‘Contribution’ rule, is biased

towards solution quality but includes some aspects of

feasibility by computing an overall score for each feasible

pattern for the nurse currently being scheduled.

The ‘Contribution’ rule is designed to take into account

the nurses’ preferences. It therefore works with shift

patterns rather than individual shifts. It also takes into

account some of the covering constraints in which it gives

preference to patterns that cover shifts that have not yet

been allocated sufficient nurses to meet their total

requirements. This is achieved by going through the entire

set of feasible shift patterns for a nurse and assigning each

one a score. The one with the highest (i.e. best) score is

chosen. If there is more than one shift pattern with the

best score, the first such shift pattern is chosen.

The score of a shift pattern is calculated as the

weighted sum of the nurse’s p

value for that particular

shift pattern and its contribution to the cover of all three

grades. The latter is measured as a weighted sum of grade

one, two and three uncovered shifts that would be covered

if the nurse worked this shift pattern, i.e. the reduction in

shortfall. Obviously, nurses can only contribute to

uncovered demand of their own grade or below. More

precisely and using the same notation as before, the score

of shift pattern j for nurse i is calculated with the

following parameters:

• d

= 1 if there are still nurses needed on day k of

grade s otherwise d

= 0;

• a

= 1 if shift pattern j covers day k otherwise a

= 0;

• w

is the weight of covering an uncovered shift of

grade s;

• w

is the weight of the nurse’s p

value for the shift

pattern.

Finally, (100- p

) must be used in the score, as higher p

values are worse and the maximum for p

is 100. Note

that (- w

) could also have been used, but would have

led to some scores being negative. Thus, the scores are

calculated as follows:

A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem

Figures

Citations

A hybrid heuristic ordering and variable neighbourhood search for the nurse rostering problem

Hierarchical Bayesian Optimization Algorithm

An electromagnetic meta-heuristic for the nurse scheduling problem

A hybrid metaheuristic case-based reasoning system for nurse rostering

Finding good nurse duty schedules: a case study

References

Genetic Algorithms

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Learning in graphical models

Nurse scheduling with knapsacks, networks and tabu search

Related Papers (5)

The State of the Art of Nurse Rostering

An indirect genetic algorithm for a nurse-scheduling problem

Nurse scheduling with tabu search and strategic oscillation

Exploiting problem structure in a genetic algorithm approach to a nurse rostering problem

A 0-1 goal programming model for nurse scheduling

Frequently Asked Questions (1)

Q1. What are the contributions in "A bayesian optimization algorithm for the nurse scheduling problem" ?