Load forecasting using support vector Machines: a study on EUNITE competition 2001

doi:10.1109/TPWRS.2004.835679

1

Load Forecasting Using Support Vector

Machines: A Study on EUNITE

Competition 2001

Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen Lin

Department of Computer Science and

Information Engineering

National Taiwan University

Taipei 106, Taiwan (cjlin@csie.ntu.edu.tw)

Abstract—Load forecasting is usually made by constructing mod-

els on relative information, such as climate and previous load de-

mand data. In 2001 EUNITE network organized a competition

aiming at mid-term load forecasting (predicting daily maximum

load of the next 31 days). During the competition we proposed

a support vector machine (SVM) model, which was the winning

entry, to solve the problem. In this paper, we discuss in detail

how SVM, a new learning technique, is successfully applied to

load forecasting. In addition, motivated by the competition results

and the approaches by other participants, more experiments and

deeper analyses are conducted and presented here. Some impor-

tant conclusions from the results are that temperature (or other

types of climate information) might not be useful in such a mid-

term load forecasting problem and that the introduction of time-

series concept may improve the forecasting.

Index Terms— Load forecasting, Regression, Support vector

machines, Time series.

I. INTRODUCTION

Electricity load forecasting has always been an important is-

sue in the power industry. Load forecasting is usually made

by constructing models on relative information, such as climate

and previous load demand data. Such forecast is usually aimed

at short-term prediction (e.g. [10], [18], [12] and references

therein), like one-day ahead prediction, since longer period pre-

diction (mid-term or long term) may not be reliant due to error

propagation. Mid-term and long-term prediction on load de-

mand, however, may still be very useful in some situations.

In 2001, EUNITE



network organized a competition on the

similar problem. The goal is to predict daily load demand of a

month. Given information includes the past two-year load de-

mand data, the previous four-year daily temperature and the lo-

cal holiday events. Dealing with such a mid-term forecast prob-

lem, reliant prediction and error propagation are both competi-

tors’ concern. During the competition, we proposed a model,

which was the winning entry, to solve the problem. Though

the competition has been closed, we ﬁnd this topic interesting

and useful. Moreover, we would like to ﬁgure out the perfor-

mance of mid-term load forecasting with such limited infor-

mation. In this paper, therefore, we present our approach and

discuss more on this problem. The main technique used in our

solution is support vector machine (SVM) [8], a new machine

learning method. To the best of our knowledge, this is the ﬁrst

work to successfully apply SVM on load forecasting. Some im-

portant conclusions from our experiments are that, temperature

should not be considered and the time-series modeling scheme



EUropean Network on Intelligent TEchnologies for Smart Adap-

tive Systems (http://www.eunite.org). The competition page is

http://neuron.tuke.sk/competition/.

is better. For the prediction of a 30-day period during which the

temperature does not vary much, trying to predict the tempera-

ture and incorporate it into the model is not useful.

This paper is organizedas follows. In Section II, we describe

the goal of the competition task and the data provided. In addi-

tion, the analysis of data is also presented. Section III demon-

strates the techniques we employed. Experiments and results of

different models are showed in Section IV. Finally, the conclu-

sion of our research and the comparison with other competitors

are in Section V.

II. DATA AND TASK DESCRIPTION

A. Competition Task Description

The organizer of the EUNITE load competition provides

competitors the following data:



Electricity load demand recorded every half hour, from

1997 to 1998.



Average daily temperature, from 1995 to 1998.



Dates of holidays, from 1997 to 1999.

The task of competitors is to supply the prediction of maxi-

mum daily values of electrical loads for January 1999. Evalu-

ation of submissions would mainly depend on the error metric

of the results:

MAPE









 







(1)

where



and

!



are the real and the predicted value of max-

imum daily electrical load on the

"

th day of the year 1999 re-

spectively, and



is the number of days in January 1999. The

goal of the competition is to forecast electrical load with mini-

mum MAPE.

#

B. Data Analysis

Before delving further into the solution we proposed, some

observations about the data are examined ﬁrst. Like many other

literatures working on load forecasting, some relations between

load demand and other information, such as climate or local

events, are also investigated. The following observations are

somehow speciﬁc to the competition, yet they can be applied to

general load forecasting.

1) Properties of Load Demand: Load demand data given

are half-hour recorded. Figure 1 gives a simple description of

the maximum daily load demand from 1997 to 1998. By simple

analysis, one can easily observe some properties of the load

demand. First, the demand has some seasonal patterns: high

demand for electricity in the winter while low demand in the

summer. This pattern implies the relation between electricity

usage and weather conditions in different seasons.

Moreover, if scrutinizing the data further, another load pat-

tern could also be observed: a load periodicity exists in every

$

Originally in the competition there are two error metrics used. One is

MAPE, and the other one, ﬁnally not really used, is the “maximal error”:

Maximal error

%'&)(+*-, .0/2143.0/5, 67%98:6<;=;<;<6?>@8

2

400

450

500

550

600

650

700

750

800

850

900

0 100 200 300 400 500 600 700

max_load

time

Fig. 1. Maximum Daily Load

week. Load demand in weekend is usually lower than that of

weekdays (Monday through Friday). In addition, electricity de-

mand on Saturday is a little higher than that on Sunday.

Further detailed examination of load data, such as daily pat-

terns, can also be found since the dataset contains more details

(half-hour recorded). However, since the aim of the competi-

tion is maximum values of daily load demand, this paper would,

without loss of generosity, focus on the maximum values only.

2) Climate Inﬂuence: In load forecasting, climate condi-

tions have always played an important role. Previous works

on short-term load forecasting [10], [18], [12] also indicate

the relation between climate and load demand. Climate condi-

tions considered may include temperature, humidity, illumina-

tion, and some special events like typhoon or sleet occurrences.

These considerations may be regardedon different levels due to

different localities. However, while forecasting load demand,

more climate information usually give predictions more conﬁ-

dence.

Taking our data for example, as we mentioned earlier, the

load data have some seasonal variation, which indicates a great

inﬂuence of climate conditions. A negative correlation between

load demand and daily temperature can be easily observed in

Figure 2. The correlation coefﬁcient between the maximum

load and the temperature is -0.868. In our dataset, it is clear to

see that because of the heating use, higher temperature causes

lowerdemands. Yet, unfortunately the only climate information

providedin the competition is daily temperature. Such informa-

tion limitation also affects solutions to such a problem.

There is another interesting observation: the temperature at

December 31st, 1998 is the lowest from 1997 to 1998. This

might imply the high uncertainty of the temperature and load of

the incoming January 1999, and thus increase the difﬁculty of

the load prediction.

450

500

550

600

650

700

750

800

850

900

-15 -10 -5 0 5 10 15 20 25 30

max_load

temperture

Fig. 2. Correlation between the maximum load and the temperature

3) Holiday Effects: Local events, including holidays and

festivities, also affect the load demand. These events may lead

to higher demand for extra usage of electricity, or otherwise.

Inﬂuences of these events are usually local and highly depend

on the customs of the area. From the two-year load data, it is

easy to ﬁnd out that the load usually lowers down on holidays.

With further scrutiny, the load also depends on what holiday it

is. On some major holidays such as Christmas or New Year,

the demand for electricity may be affected more compared with

other holidays.

III. METHODS

Fig. 3. Support Vector Regression

Support vector machine (SVM) is a new and promising tech-

nique for data classiﬁcation and regression [23]. In this section

we brieﬂy introduce support vector regression (SVR) which can

be used for time series prediction. Given training data

AB



=C

ED

,

...,

AFBHG

=C

G

D

, where

B



are input vectors and

C



are the associ-

ated output value of

B



, the support vector regression solves an

optimization problem:

IKJL

MON P5N QEN Q:R



SUTVTXWXY

G

Z

[

A\



W]\





D

(2)

subject to

C

U^

AFT

V`_

AFB

?D

Wba

Ddce

W]\





AFTV

_

AFB

fD

Wba

DO^

C

`ce

W]\







\





\



Kg





"

h

jiijiE+k<

where

B



is mapped to a higher dimensional space by the func-

tion

_

,

\



is the upper training error (

\





is the lower) subject to

the

e

-insensitive tube



C

^

AFT

V

_

AFB

D

Wla

D



cme

. The parameters

which control the regression quality are the cost of error

Y

, the

width of the tube

e

, and the mapping function

_

.

The constraints of (2) imply that we would like to put most

data

B



in the tube



C

^

AT

V

_

AFB

D

Wna

D



cXe

. This can be clearly

seen from Figure 3. If

B



is not in the tube, there is an error

\



or

\





which we would like to minimize in the objective func-

tion. SVR avoids underﬁtting and overﬁtting the training data

by minimizing the training error

Y



G

[

AF\



W\





D

as well as

the regularization term



o

T

V

T

. For traditional least-square re-

gression

e

is always zero and data are not mapped into higher

dimensional spaces. Hence SVR is a more general and ﬂexi-

ble treatment on regression problems. For experiments in this

paper, we use the software LIBSVM [6], which is a library for

support vector machines.

3

IV. EXPERIMENTS

Though the competition has been closed, the problem still

presents an interesting issue in load forecasting. In order to

study mid-term load forecasting more, we conducted some ex-

periments on the competition problem. In this section, these

experiments and results would be described.

A. Data Preparation

In Section III, we have described the SVM technique. We

next need to prepare datasets to build SVM models. While

preparing the datasets, we need to encode useful information

into the data entries (i.e.

B



in (2)). Also, different data en-

codings affect the selection of modeling schemes. Here we will

discuss these issues in detail.

1) Feature selection: Each component of the training data

is called a feature (attribute). Here, we consider what kind of

information should be included. Assuming that

C



is the load of

the

"

th day, in general we incorporate information at the same

day or earlier as features of

B



. There are a few choices for the

feature:

Basic information: calendar attributes. In Section II-B,

we discuss the weekly periodicity of the load demand. Also,

as we pointed out earlier, the load demand on holidays is lower

than that on non-holidays. Therefore, encoding these informa-

tion (weekdays and holidays) in the training entries might be

useful to model the problem. Actually, among literatures of

load forecasting, many works (e.g. [12] and [21]) have used the

calendar information (time, dates or holidays)to model the load

forecasting problem.

Temperature. Another possible feature is the temperature

data. This is quite a straightforward choice, since load demand

and temperature have a causal relation in between. In most

short-term load forecasting (STLF) works, meteorological in-

formation which includes temperature, wind speed, sky cover

and etc., has also been used to predict the load demand. How-

ever, to include the temperature in the training entries, there is

one difﬁculty: in this competition, the real temperature data of

January 1999 are not provided. In other words, for such mid-

term load forecasting, temperature several weeks away are gen-

erally not available by weather forecasting. If we want to en-

code the temperature in our training entries, we will also need

to predict or estimate the temperature of January 1999. Yet,

temperature forecast is not easy, especially with such limited

data. The use of temperature, therefore, would be a dilemma.

Time series style or not. Besides the weekdays, holidays

and temperature, there is another information we consider to

encode as the attributes: the past load demand. That is to in-

troduce the concept of time-series into our models. To be more

precise, if

C



is the target value for prediction, the vector

B



includes several previous target values

C







ijijij5C



0p

as at-

tributes. In the training phase all

C



are known but for future

prediction,

C







ijiiE5C



0p

can be values from previous predic-

tions. For example, after obtaining an approximate load of Jan-

uary 1, 1999, if

q

sr

, it is used with loads of December

26-31, 1998 for predicting that of January 2. We continue this

way until ﬁnding an approximate load of January 31. An ear-

lier example using SVR for time series prediction is [17]. As

we know, the past load demand could affect and imply the fu-

ture load demand. Therefore, considering to include such an

information in the models might help forecast the load demand.

In fact, in some load forecasting works, time series models have

been explored ([1], [18]).

2) Data segmentation: Besides the features choices, Sec-

tion II-B also shows the seasonal pattern for load demand. Pre-

vious works [10], [18], [21] on load forecasting also propose

models built on data of different seasons. This inspires us to do

some analyses for the data segmentation.

Usually people model time-series data by using the formula-

tion,

Cut

wv

AB

t

D

<x0y



q

jiii[+k<

where

B

t



A

C

t





=C

t



o

jii[i C

t

0p

D

, and

q

is the embedding di-

mension. However, this formulation is not suitable for non-

stationary time series, because the characteristic of the time se-

ries may change with time. For such time series which alternate

in time, we can consider a mixture model where

Cut

wv

Fz

tF{

AFB

t

D

x|y



q

i[ii=k<i

Note that the formulation allows different characteristic func-

tions in different time. Given the series

Ct}=y

~

jiijiE+k

, the

task is to identify all

"}A

y

D

. We call this unsupervised segmen-

tation where earlier work can be found in, for example, [19].

In other words, the method breaks the series into different seg-

ments where points in the same segment can be modeled by the

same

v



. Recently, [7] states a similar framework using SVR

with different parameters tuning. At any time point

y

, these

methods consider different weights representingthe probability

that

C

t

belongs to corresponding functions. The sum of weight

at any given time point

y

is always ﬁxed to one. The weights

are iteratively updated until one weight is close to one but oth-

ers are close to zero. That means eventually

Ct

is associated to

one particular time series.

0

0.2

0.4

0.6

0.8

1

1.2

0 100 200 300 400 500 600 700 800

weight

time

SVR_1

SVR_2

Fig. 4. Unsupervise segmentation for EUNITE data

Now, we can consider the loads as a time series. First we

linearly scale all load data to







j

. Then we can get time series

style data by incorporating load of last seven days and weekday

information to attributes. After that we follow the framework

of [7] to analyze the data. We consider two possible time series

so at each time point there are two weights. The experimental

result is in Figure 4. The x-axis indicates days from January

1997 to December 1998 and the y-axis indicates the weights

of two time series. Interestingly, “winter” and “summer” data

are automatic separated without any seasonal information. The

ﬁgure shows that the loads in the summer and in the winter have

different characteristics.

4

Unsupervised data segmentation has been very useful for

time series prediction (e.g. [19]). If the training data are associ-

ated with different time series, it is better to consider only data

segments related to the same series of the last segment. Now

the objective of the competition is to predict the load demand

of January 1999, so we consider to use only the winter segment

for training. Here, we choose January to March, and October

to December to be our “winter” period, as the analytical result

indicates. That is, the “winter” dataset would contain half of the

data in 1997 and 1998. Also, we further extract data of January

and February to form another possible training dataset. This

dataset is much smaller than the “winter” one, and it would fo-

cus more on the load pattern in the period of our target concern.

3) Data Representation: After selecting useful information

and proper data segments for encoding, we can prepare several

combinations of training datasets. In these datasets, we encode

a training entry (i.e.

B



in (2)) for the particular

"

th day, as

follows:

(calendar, temperature (optional), past load (optional))

Here we use seven binaries to encode calendar information

which includes weekdays, weekends and holidays, where six

are for weekdays and weekends, and the other one for holidays.

The six binaries stand for Monday to Saturday respectively and

Sunday is represented as all six attributes are set to zero. Also,

one numerical attribute is used for normalized temperaturedata,

if the temperature is encoded. As for the past load, if encoded,

we use seven numerics for the past seven daily maximum loads.

The reason for using “seven” instead of other numbers is the

complexity of model selection. We will elaborate more on this

later. And ﬁnally, for such an entry, its target value (

C



in (2)) is

assigned to the maximum load demand of the

"

th day.

Yet, for the models built with temperature information, the

lack of temperature of January 1999 will pose a problem in the

predicting phase. As we mentioned before, this may lead us

to the prediction or estimation of the temperature in 1999 and

that will be a more difﬁcult problem, especially with only daily

temperature of the past four years (1995-1998). A straightfor-

ward idea is to use the average of the past temperature data for

the estimation. That is, the temperature of each day in January

1999 is estimated by averaging the past January daily temper-

ature data (from 1995 to 1998). Competitors like [9], [5] and

[13] used the averaged temperature in their models. Also [14]

uses temperature of the other cities close to Slovakia for the

estimation. In their report, the temperature for January 1999

was calculated through a linear combination of that of the other

three cities. This is a kind of cheating as we are supposed to

do the prediction at December 31, 1998 so temperature infor-

mation of January, 1999 at any place is not available. For ex-

periments here, these two estimations for temperature will be

employed in the predicting phase, provided the temperature is

encoded in our training datasets.

There is one remark about the testing data of January 1999.

In this period, there are two holidays, January 1 and 6. How-

ever, in the data entries prepared for the prediction, we remove

the holiday ﬂag of these two entries. In other words, we treat

them as non-holidays. The reason is that our models cannot

learn well about the load demand on holidays. The number

TABLE IV.1

MAPE USING DIFFERENT DATA PREPARATION

(seg., TS style) W/out TW/ avg. TW/ 3c TW/ real T

(Winter, yes) 1.95% 3.14% 2.71% 2.70%

(Jan-Feb, yes) 2.54% 3.21% 2.93% 2.96%

(Winter, no) 2.86% 3.48% 2.97% 3.10%

of holidays in the training data are too few to provide enough

information. Moreover, for the time-series-based approach, in-

accurate prediction at one day could affect the succeeding fore-

casting.

B. Implementation and Results

With different schemes of model construction, a series of ex-

periments are conducted. MAPEs of different model schemes

are the main concern of our comparison.

Upon the datasets we prepare, SVM models are built for load

forecasting. When training an SVM model, there are some pa-

rameters to choose. They would inﬂuence the performance of

an SVM model. Therefore, in order to get a “good” model,

these parameters need to be selected properly. Some important

ones are

1) cost of error

Y

,

2) the width of the

e

-insensitive tube,

3) the mapping function

_

, and

4) load of how many previous days includedfor one training

data.

In our experiments, as we mentioned earlier, for each train-

ing data we simply include the maximum load of the previous

seven days. In addition, we consider only the Radial Basis

Function (RBF) function, which is one of the most commonly

used mapping functions. The RBF function has the property

that

_

AB

D

V

_

AFB

D



02f



0j?

. Note that



is a parameter

associated with the RBF function which has to be tuned. Also,

we ﬁx

e



i

which is the default of LIBSVM [6]. Thus, pa-

rameters left are

Y

and



. Choosing parameters can be time

consuming so in practice we decide some of them by using

knowledge or simply guessing. Then, the search space is re-

duced.

As searching for the proper parameters, we need to access

the performance of models. With their performance, then the

suitable parameters are chosen. To do this, usually we divide

the training data into two sets. One of them is used to train

a model while the other, called the validation set, is used for

evaluating the model. According to their performance on the

validation set, we try to infer the proper values of

Y

and



.

Here, due to the different characteristics of the data encoding

schemes, we employ two procedures for the validation.

For time-series-based approaches, we respectively extract the

data entries of January 1997 and 1998 to form the validation set

and evaluate the models on them. The performance is decided

by averaging the errors of these two validations. As for the non-

time-series models, we simply conduct 10-fold cross validation

to infer the parameters. That is, we randomly divide the training

sets into 10 sets. Using each set as a validation set, we then train

5

a model on the rest. The performance of a model would be the

average of the 10 validating predictions.

With this procedure, proper

Y

and



are selected to build a

model for the future prediction. We then evaluate its perfor-

mance by forecasting the load demand for January 1999.

Now we are ready to present experimental results. Table IV.1

shows the prediction errors generated by different data encod-

ings and segmentations. In the table, the ﬁrst column shows the

data segments used and if the past load demand is encoded.

Then the next four columns indicate the predictions with or

without the temperature (T): “avg. T” for average temperature,

“3c T” for the estimation derived from the three other cities’

data, and “real T” for the real temperature of the January, 1999.

1) Time-series Models with Winter Data: In Table IV.1, it

can be observed that using the “winter” data along with the past

loads (time-series information), the model built without tem-

perature outperforms all others. In fact, the winning entry in

the competition is generated by such a model



. The MAPE

using the temperature of three other cities is smaller than that

using the average temperature. Moreover, after the competition

is closed, we get the real temperature data of January 1999. We

employ the temperature-incorporated models on datasets with

the real temperature. The result is shown in the last column

of Table IV.1. Its performance, compared with those two us-

ing different temperature estimations, is not better. That means,

even assuming the real temperature is known, the forecasting

result is still not satisfactory.

2) Time-series Models with Jan-Feb Data: With the train-

ing set containing only data of January and February, the model

built without temperature also performs better than all others

built with temperature. This is the same as using the “winter”

data segment. Again, the prediction error with temperature es-

timation coming from the other three cities’ data is smaller than

that with the other estimation or even that with the real temper-

ature.

3) Non-time-series Models with Winter Data: Besides time-

series-based approaches, we build models without taking the

past load demand as attributes. The training set of these mod-

els is limited to the “winter” segment. In Table IV.1, we show

the test errors of the forecast with or without temperature in-

formation. Just like the aforementioned experimental results,

the MAPE of prediction on the dataset without temperature is

better than that with. Actually as one can see in Figure 8, the

forecast generated by the model using only calendar attributes

is the same weekly. That is because besides the calendar at-

tributes, the dataset does not provide any other information for

the model. Using the temperature estimation derived from lin-

ear combination of the three cities’ temperature data is again

better than using the real temperature or the average estimation.

4) Remarks: From the results, an interesting observation

can be made: models built without the temperature generally

perform better than those built with. This renders the issue

about the usage of temperature information. As we mentioned

earlier, temperature information is important for load forecast-

ing. However, models constructed with such information may



During the competition, our validation procedure is different from what we

employ in this paper. Thus, the actual parameters that generate the winning

entry is not the same as those we present here. However, the modeling scheme

is the same.

be really sensitive to the temperatureand thus the estimation for

the temperature in 1999 would surely affect the performance of

the models.

In Figure 5, the real temperature data and the two estimations

are shown. Figures 6-8 plot the predicted values as well as the

real load demand of January 1999. Observing these ﬁgures, we

can ﬁnd that the higher the temperature is, indeed, the lower the

demand is. That is, the models we build do catch the causal

relation between the load demand and the temperature. Never-

theless, using real temperature cannot predict the load demand

more precisely than using the estimation coming from the data

of the three other cities. This result somehow implies the in-

appropriateness of incorporation of the temperature. In fact, as

we compute the correlation coefﬁcient between the maximum

load and the temperature for each month instead of the whole

two year, we ﬁnd they are variant (ranging from -0.64 to 0.32).

This also indicates the fuzzy correlation between the load and

the temperature in a shorter period.

-12

-10

-8

-6

-4

-2

0

2

5 10 15 20 25 30

T

day

real and estimate temperature of JAN. 1999

real 99

avg 99

3 cities 99

Fig. 5. Estimates and real temperature in Jan. 1999

Models built on data segments of January and February per-

form not as well as those built on the “winter” segments. We

think the main reason is that, with the limited data given (only

two years), such data segments cannot provide enough informa-

tion for models compared with the “winter” segments that con-

tains more entries. Therefore, though dataset containing data

only from January and February may represent the period of

the prediction better, due to the limited data, models built on

such segments are somehow not as competitive as those built

with more information.

For comparison, models without time-series information are

also built. The performance of such models is less competitive.

This shows that models built without the past loads may not be

able to learn the tendency of the load demand. Moreover, if the

temperature is used to built the model, it would be the main at-

tribute that affects the predicted values. Due to the limited data

and the fuzzy correlation between the temperature and the load

demand, such incorporation of temperature information could

introduce higher variance into the model and result in unreli-

able prediction. That is why even using the real temperature,

the error is higher than that of using only calendar information.

Recall that the temperatureof last day in 1998 is the lowest of

the whole dataset, but the load is not extremely high. Moreover,

the correlation coefﬁcient between the load and the temperature

in December 1998 is 0.092. Therefore, we suspect the behavior

Load forecasting using support vector Machines: a study on EUNITE competition 2001

Figures

Citations

Travel-time prediction with support vector regression

Time Series Prediction Using Support Vector Machines: A Survey

Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach

Probabilistic electric load forecasting: A tutorial review

A review on applications of ANN and SVM for building electrical energy consumption forecasting

References

LIBSVM: A library for support vector machines

Support-Vector Networks

Statistical learning theory

Modern Applied Statistics with S

Neural networks for short-term load forecasting: a review and evaluation

Related Papers (5)

Neural networks for short-term load forecasting: a review and evaluation

The Nature of Statistical Learning Theory

Electric load forecasting using an artificial neural network

A regression-based approach to short-term system load forecasting

Short-term load forecasting via ARMA model identification including non-Gaussian process considerations