Travel-time prediction with support vector regression

doi:10.1109/TITS.2004.837813

276 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004

Travel-Time Prediction With Support

Vector Regression

Chun-Hsin Wu, Member, IEEE, Jan-Ming Ho, Member, IEEE, and D. T. Lee, Fellow, IEEE

Abstract—Travel time is a fundamental measure in transporta-

tion. Accurate travel-time prediction also is crucial to the develop-

ment of intelligent transportation systems and advanced traveler

information systems. In this paper, we apply support vector regres-

sion (SVR) for travel-time prediction and compare its results to

other baseline travel-time prediction methods using real highway

trafﬁc data. Since support vector machines have greater gener-

alization ability and guarantee global minima for given training

data, it is believed that SVR will perform well for time series anal-

ysis. Compared to other baseline predictors, our results show that

the SVR predictor can signiﬁcantly reduce both relative mean er-

rors and root-mean-squared errors of predicted travel times. We

demonstrate the feasibility of applying SVR in travel-time predic-

tion and prove that SVR is applicable and performs well for trafﬁc

data analysis.

Index Terms—Intelligent transportation systems (ITSs), support

vector machines, support vector regression (SVR), time series anal-

ysis, travel-time prediction.

I. INTRODUCTION

T

RAVEL-TIME data are the raw elements for a number of

performance measures in many transportation analyzes.

They can be used in transportation planning, design and oper-

ations, and evaluation. Especially, travel-time data are critical

pretrip and

en route information in advanced traveler informa-

tion systems. They are very informative to drivers and travelers

to make decision or plan schedules. With precise travel-time

prediction, a route-guidance system can suggest optimal alter-

nate routes or warn of potential trafﬁc congestion to users; users

can then decide the best departure time or estimate their ex-

pected arrival time based on predicted travel times.

Travel-time calculation depends on vehicle speed, trafﬁc ﬂow,

and occupancy, which are highly sensitive to weather condi-

tions and trafﬁc incidents. These features make travel-time pre-

dictions very complex and difﬁcult to reach optimal accuracy.

Nonetheless, daily, weekly, and seasonal patterns can still be

observed at a large scale. For instance, daily patterns distin-

guish rush hour and late-night trafﬁc and weekly patterns dis-

tinguish weekday and weekend trafﬁc, while seasonal patterns

distinguish winter and summer trafﬁc. The time-varying feature

germane to trafﬁc behavior is the key to travel-time modeling.

Manuscript received December 1, 2003; revised August 1, 2004. This work

was supported in part by the Academia Sinica, Taiwan, under Thematic Program

2001–2003. The Associate Editor for this paper was F.-Y. Wang.

C. H. Wu is with the Department of Computer Science and Information

Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, and

with the Institute of Information Science, Academia Sinica, Taipei 115, Taiwan

(e-mail: wuch@iis.sinica.edu.tw).

J.-M. Ho and D. T. Lee are with the Institute of Information Science,

Academia Sinica, Taipei 115, Taiwan (e-mail: hoho@iis.sinica.edu.tw;

dtlee@iis.sinica. edu.tw).

Digital Object Identiﬁer 10.1109/TITS.2004.837813

Since the creation of support vector machine (SVM) theory

by Vapnik of the AT&T Bell Laboratories [1], [2], there have

been intensive studies on SVM for classiﬁcation and regres-

sion [3]–[5]. SVM is quite satisfying from a theoretical point of

view and can lead to great potential and superior performance

in practical applications. This is largely due to the structural risk

minimization (SRM) principle in SVM, which has greater gen-

eralization ability and is superior to the empirical risk minimiza-

tion (ERM) principle as adopted in neural networks. In SVM,

the results guarantee global minima, whereas ERM can only lo-

cate local minima. For example, in the training process of neural

networks, the results give out any number of local minima that

are not promised to include global minima. Furthermore, SVM

is adaptive to complex systems and robust in dealing with cor-

rupted data. This feature offers SVM a greater generalization

ability that is the bottleneck of its predecessor, the neural net-

work approach.

The rapid development of SVMs in statistical learning theory

encourages researchers to actively apply SVM to various re-

search ﬁelds. Traditionally, many studies focus on the applica-

tion of SVM to document classiﬁcation and pattern recognition

[2]. For intelligent transportation systems (ITSs), there also are

many works applying SVM to vision-based intelligent vehicles,

such as vehicle detection [6], [7], trafﬁc-pattern recognition [8],

and head recognition [9]. These research results evidence the

feasibility of SVM in ITS.

Recently, the application of SVM to time-series forecasting,

called support vector regression (SVR), has also shown many

breakthroughs and plausible performance, such as forecasting

of ﬁnancial market [10], forecasting of electricity price [11],

estimation of power consumption [12], and reconstruction of

chaotic systems [13]. Except for trafﬁc-ﬂow prediction [14],

however, there are few SVR results on time-series analysis for

ITS. Since there are many successful results of time-varying

applications with SVR prediction, it motivates our research in

using SVR for travel-time modeling.

In this paper, we use SVR to predict travel time for highway

users. It demonstrates that SVR is applicable to travel-time pre-

diction and outperforms many previous methods. In Section II,

we describe the travel-time prediction problem more formally.

In Section III, we introduce SVR brieﬂy. In Section VI, we ex-

plain our experimental procedure. Then, we present the methods

and results of different travel-time predictors in Sections V and

VI, respectively. Section VII concludes this paper.

II. T

RAVEL-TIME CALCULATION AND PREDICTION

Travel time is the time required to traverse a link or a route

between any two points of interest. There are two approaches

WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 277

Fig. 1. Travel-time prediction problem. Assume the current time is

t

.

to calculating travel times: link measurement and point mea-

surement [15]. In the link-measurement approach, link or

route travel time is directly measured between two points of

interest by using active test vehicles, passive probe vehicles,

or license-plate matching. In the point-measurement approach,

however, travel time is estimated or inferred indirectly from the

trafﬁc data measured by point-detection devices on the roadway

or roadside, such as loop detectors, laser detectors, and video

cameras. Generally speaking, link-measurement approaches

can collect more precise and experienced travel-time data, but

point-measurement approaches can be deployed more cost

effectively to obtain real-time travel-time data.

There are three categories of trafﬁc data: historical, current,

and predictive [16]. Usually, travel-time prediction can be dis-

tinguished into two main approaches: statistical models and an-

alytical models. Statistical models can be characterized as data-

driven methods that generally use a time series of historical and

current trafﬁc variables such as travel times, speeds, and vol-

umes as input. In Fig. 1, suppose that it currently is time

.Given

the historical travel-time data

, , and

at time , , respectively, we can predict the

future values of

, by analyzing historical

data set. Hence, future values can be forecast based on the cor-

relation between the time-variant historical data set and its out-

comes. Numerous statistical methods on the accurate prediction

of travel time have been proposed, such as the ARIMA model

[17], linear model [18]–[21], and neural networks [22]–[24].

The main idea of trafﬁc forecasting in statistical models is

based on the fact that trafﬁc behaviors possess both partially de-

terministic and partially chaotic properties. Forecasting results

can be obtained by reconstructing the deterministic trafﬁc mo-

tion and predicting the random behaviors caused by unantici-

pated factors. On the other hand, analytical models predict travel

times by using microscopic or macroscopic trafﬁc simulators,

such as METANET [25], [26], NETCELL [27], and MITSIM

[28]. They usually require dynamic outside diameter (OD) ma-

trices as input and the predicted travel times evolve naturally

from the simulation results.

III. SVR

As shown in Fig. 2, the basic idea of SVM is to map the

training data from the input space into a higher dimensional

feature space via function

and then construct a separating

hyperplane with maximum margin in the feature space. Given

a training set of data

, , where corre-

sponds to the size of the training data and

class labels,

SVM will ﬁnd a hyperplane direction

and an offset scalar

such that for positive examples and

Fig. 2. Basic idea of SVM to solve the binary classiﬁcation problem,

separating circular balls from square tiles.

for negative examples. Consequently,

although we cannot ﬁnd a linear function in the input space to

decide what type the given data is, we can easily ﬁnd an optimal

hyperplane that can clearly discriminate between the two types

of data.

Consider a set of training data

, where

each

denotes the input space of the sample and has a

corresponding target value

for , where

corresponds to the size of the training data [4], [5]. The idea

of the regression problem is to determine a function that can

approximate future values accurately.

The generic SVR estimating function takes the form

(1)

where

, , and denotes a nonlinear transfor-

mation from

to high-dimensional space. Our goal is to ﬁnd

the value of

and such that values of can be determined by

minimizing the regression risk

(2)

where

is a cost function, is a constant, and vector can

be written in terms of data points as

(3)

By substituting (3) into (1), the generic equation can be

rewritten as

(4)

In (4), the dot product can be replaced with function

,

known as the kernel function. Kernel functions enable the dot

product to be performed in high-dimensional feature space

using low-dimensional space data input without knowing the

transformation

. All kernel functions must satisfy Mercer’s

condition that corresponds to the inner product of some feature

space. The RBF is commonly used as the kernel for regression

(5)

Some common kernels are shown in Table I. In our studies,

we have experimented with these three kernels.

278 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004

TABLE I

C

OMMON KERNEL

FUNCTIONS

The -insensitive loss function is the most widely used cost

function [5]. The function is in the form

for

otherwise.

(6)

By solving the quadratic optimization problem, the regres-

sion risk in (2) and the

-insensitive loss function (6) can be

minimized

subject to

(7)

The Lagrange multipliers

and represent solutions to the

above quadratic problem, which act as forces pushing predic-

tions toward target value

. Only the nonzero values of the La-

grange multipliers in (7) are useful in forecasting the regression

line and are known as support vectors. For all points inside the

tube, the Lagrange multipliers equal to zero do not contribute to

the regression function. Only if the requirement

(see Fig. 3) is fulﬁlled, Lagrange multipliers may be nonzero

values and used as support vectors.

The constant

introduced in (2) determines penalties to es-

timation errors. A large

assigns higher penalties to errors so

that the regression is trained to minimize error with lower gener-

alization, while a small

assigns fewer penalties to errors. This

allows the minimization of margin with errors, thus higher gen-

eralization ability. If

goes to inﬁnity, SVR would not allow the

occurrence of any error and results in a complex model, whereas

when

goes to 0, the result would tolerate a large amount of

errors and the model would be less complex.

Now, we have solved the value of w in terms of the Lagrange

multipliers. For the variable

, it can be computed by applying

the Karush–Kuhn–Tucker (KKT) conditions that, in this case,

imply that the product of the Lagrange multipliers and con-

strains has to equal to 0

(8)

and

Fig. 3. SVR to ﬁt a tube with radius

"

to the data and positive slack variables



measuring the points lying outside of the tube.

(9)

where

and are slack variables used to measure errors out-

side the

tube. Since , , and for ,

can be computed as

for

for (10)

Putting it all together, we can use SVM and SVR without

knowing the transformation. We need to experiment kernel

functions; penalty C, which determines the penalties to estima-

tion errors; and radius

, which determines the data inside the

tube to be ignored in regression.

IV. E

XPERIMENTAL PROCEDURE

A. Data Preparation

The trafﬁc data is provided by the Intelligent Transportation

Web Service Project (ITWS) [29], [30] at Academia Sinica,

a governmental research center based in Taipei, Taiwan. The

Taiwan Area National Freeway Bureau (TANFB) constantly

collects vehicle speed information from loop detectors that are

deployed at 1-km intervals along the Sun Yet-Sen Highway.

The TANFB web site provides the raw trafﬁc information

source, which is updated once every 3 min. The loop detector

data is employed to derive travel time indirectly: the travel-time

information is computed from the variable speed and the known

distance between detectors.

Since trafﬁc data may be missed or corrupted, we select a

better portion of the dataset of the highway between February 15

and March 21, 2003. During this ﬁve-week period, there are no

special holidays and the data loss rate is not over some threshold

value, which could bias our results if not properly managed.

We use data from the ﬁrst 28 d as the training set and use the

last 7 d as our testing set. We examine the travel times over

three different distances: from Taipei to Chungli, Taichung and

Kaohsiung, which cover 45-, 178-, and 350-km stretches, re-

spectively. In addition, we examine the travel times of a 45-km

distance between 7:00 and 10:00

AM

further, since the travel

time of a short distance in rush hour changes more dynamically.

Fig. 4 shows the travel-time distribution of the short distance

on a daily and weekly basis, respectively. We can ﬁnd the daily

WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 279

Fig. 4. Daily and weekly travel-time distributions traveling from Taipei to

Chungli, a 45-km stretch, between 7:00 and 10:00

AM for ﬁve Wednesdays and

ﬁve weeks between February 15 and March 21, 2003.

similarities and the instant dynamics from the daily and weekly

patterns.

B. Prediction Methodology and Error Measurements

Suppose that the current time is

and we want to predict

at the future time with the knowledge of the value ,

for past time , ,

respectively. The prediction function is expressed as

We examine the travel times of different prediction methods

for departing from 7:00–10:00

AM during the last week between

March 15 and March 21, 2003. Relative mean errors (RME)

and root-mean-squared errors (rmse) are applied as performance

indices

where is the observation value and is the predicted value.

V. T RAVEL-TIME PREDICTING METHODS

To evaluate the applicability of travel-time prediction with

SVR, some common baseline travel-time prediction methods

are exploited for performance comparison.

Fig. 5. Comparisons of predicted travel times over short distance in rush hour

using different predicting methods.

A. SVR Prediction Method

As discussed previously, there are many parameters that must

be set for travel-time prediction with SVR. We have tried several

combinations and ﬁnally chose a linear function as the kernel for

performance comparison with

0.01 and 1000. In our

experiences, however, the RBF kernel also performed as well

as a linear kernel in many cases. The SVR experiments were

done by running mySVM software kit with training window size

equal to ﬁve [31].

B. Current Travel-Time Prediction Method

This method computes travel time from the data available at

the instant when prediction is performed [24]. The travel time

is deﬁned by

where is the data delay, is the number of sections,

denotes the distance of a section of a highway, and

is the speed at the start of the highway section.

280 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004

TABLE II

P

REDICTION RESULTS IN

RME

AND RMSE OF

DIFFERENT

PREDICTORS FOR

TRAVELING

DIFFERENT DISTANCES (ALL

TESTING DATA

POINTS)

TABLE III

P

REDICTION

RESULTS FOR THE

TESTING DATA

POINTS

THAT HAV E

GREATER

PREDICTION

ERRORS

(

>

=5%)

IN

ANY ONE OF THE

PREDICTORS

C. Historical Mean Prediction Method

This is the travel time obtained from the average travel time

of the historical trafﬁc data at the same time of day and day of

week

where is the number of weeks trained and is the past

travel time at time

of historical week .

VI. R

ESULTS

The experimental results of travel-time prediction over a short

distance in rush hour are shown in Fig. 5. As expected, the his-

torical-mean predictor cannot reﬂect the trafﬁc patterns that are

quite different from the past average and the current-time pre-

dictor is usually slow to reﬂect the changes of trafﬁc patterns.

Since SVR can converge rapidly and avoid local minimum, the

SVR predictor performs very well in our experiments.

The results in Table II show the RME and rmse of different

predictors for different travel distances over all the data points of

the testing set. They show that the SVR predictor reduces both

RME and rmse to less than half of those achieved by the current-

time and historical-mean predictors for all different distances.

In our experiments, as the traveling distance increases, the

number of free sections increases more than the number of busy

sections, such that the travel time of long distance is dominated

by the time to travel-free sections. So it is not surprising that all

three of the predictors predict well for long distance (350 km),

but this makes it difﬁcult to compare the performances of the

three predictors. For this reason, we speciﬁcally examine the

testing data points where the predicted error of any predictor

is larger than or equal to 5%. As shown in Table III, the SVR

predictor not only improves the overall performance, but also

signiﬁcantly reduces the prediction errors for the cases where

there are worse prediction errors in any one of the predictors.

VII. C

ONCLUSION

Support vector machine and SVR have demonstrated their

success in time-series analysis and statistical learning. However,

little work has been done for trafﬁc data analysis. In this paper,

we examine the feasibility of applying SVR to travel-time pre-

diction. After numerous experiments, we propose a set of SVR

parameters that can predict travel times very well. The results

show that the SVR predictor signiﬁcantly outperforms the other

baseline predictors. This evidences the applicability of SVR to

trafﬁc data analysis.

R

EFERENCES

[1] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:

Springer, 1995.

[2]

, “An overview of statistical learning theory,” IEEE Trans. Neural

Networks, vol. 10, pp. 988–999, Sept. 1999.

[3] S. R. Gunn, “Support vector machine for classiﬁcation and regression,”

Tech. Rep., Univ. Southampton, Southampton, U.K., May 1998.

[4] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and

V. Vapnik, “Predicting time series with support vector machine,” in Proc.

Int. Conf. Artiﬁcial Neural Networks (ICANN’97), W. Gerstner, A. Ger-

mond, M. Hasler, and J.-D. Nicoud, Eds., 1997, pp. 999–1004. Springer

LNCS 1327.

[5]

, “Using support vector support machines for time series predic-

tion,” in Advances in Kernel Methods, B. Schölkopf, C. J. C. Burges, and

A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 242–253.

[6] Z. Sun, G. Bebis, and R. Miller, “Improving the performance of on-road

vehicle detection by combining Gabor and wavelet features,” in Proc.

IEEE 5th Int. Conf. Intelligent Transportation Systems, 2002, pp.

130–135.

[7] D. Gao, J. Zhou, and L. Xin, “SVM-based detection of moving vehicles

for automatic trafﬁc monitoring,” in Proc. IEEE 4th Int. Conf. Intelligent

Transportation Systems, 2001, pp. 745–749.

[8] J. T. Ren, X. L. Ou, Y. Zhang, and D. C. Hu, “Research on network-

level trafﬁc pattern recognition,” in Proc. IEEE 5th Int. Conf. Intelligent

Transportation Systems, 2002, pp. 500–504.

Travel-time prediction with support vector regression

Figures

Citations

Support Vector Regression

LSTM network: a deep learning approach for short-term traffic forecast

T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction

Short-term traffic forecasting: Where we are and where we’re going

Time Series Prediction Using Support Vector Machines: A Survey

References

The Nature of Statistical Learning Theory

An overview of statistical learning theory

The Nature of Statistical Learning

Support Vector Machines for Classification and Regression

Predicting Time Series with Support Vector Machines

Related Papers (5)

Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results

The Nature of Statistical Learning Theory

A tutorial on support vector regression

Comparison of parametric and nonparametric models for traffic flow forecasting

Traffic Flow Prediction With Big Data: A Deep Learning Approach