276 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
Travel-Time Prediction With Support
Vector Regression
Chun-Hsin Wu, Member, IEEE, Jan-Ming Ho, Member, IEEE, and D. T. Lee, Fellow, IEEE
Abstract—Travel time is a fundamental measure in transporta-
tion. Accurate travel-time prediction also is crucial to the develop-
ment of intelligent transportation systems and advanced traveler
information systems. In this paper, we apply support vector regres-
sion (SVR) for travel-time prediction and compare its results to
other baseline travel-time prediction methods using real highway
traffic data. Since support vector machines have greater gener-
alization ability and guarantee global minima for given training
data, it is believed that SVR will perform well for time series anal-
ysis. Compared to other baseline predictors, our results show that
the SVR predictor can significantly reduce both relative mean er-
rors and root-mean-squared errors of predicted travel times. We
demonstrate the feasibility of applying SVR in travel-time predic-
tion and prove that SVR is applicable and performs well for traffic
data analysis.
Index Terms—Intelligent transportation systems (ITSs), support
vector machines, support vector regression (SVR), time series anal-
ysis, travel-time prediction.
I. INTRODUCTION
T
RAVEL-TIME data are the raw elements for a number of
performance measures in many transportation analyzes.
They can be used in transportation planning, design and oper-
ations, and evaluation. Especially, travel-time data are critical
pretrip and
en route information in advanced traveler informa-
tion systems. They are very informative to drivers and travelers
to make decision or plan schedules. With precise travel-time
prediction, a route-guidance system can suggest optimal alter-
nate routes or warn of potential traffic congestion to users; users
can then decide the best departure time or estimate their ex-
pected arrival time based on predicted travel times.
Travel-time calculation depends on vehicle speed, traffic flow,
and occupancy, which are highly sensitive to weather condi-
tions and traffic incidents. These features make travel-time pre-
dictions very complex and difficult to reach optimal accuracy.
Nonetheless, daily, weekly, and seasonal patterns can still be
observed at a large scale. For instance, daily patterns distin-
guish rush hour and late-night traffic and weekly patterns dis-
tinguish weekday and weekend traffic, while seasonal patterns
distinguish winter and summer traffic. The time-varying feature
germane to traffic behavior is the key to travel-time modeling.
Manuscript received December 1, 2003; revised August 1, 2004. This work
was supported in part by the Academia Sinica, Taiwan, under Thematic Program
2001–2003. The Associate Editor for this paper was F.-Y. Wang.
C. H. Wu is with the Department of Computer Science and Information
Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, and
with the Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
(e-mail: wuch@iis.sinica.edu.tw).
J.-M. Ho and D. T. Lee are with the Institute of Information Science,
Academia Sinica, Taipei 115, Taiwan (e-mail: hoho@iis.sinica.edu.tw;
dtlee@iis.sinica. edu.tw).
Digital Object Identifier 10.1109/TITS.2004.837813
Since the creation of support vector machine (SVM) theory
by Vapnik of the AT&T Bell Laboratories [1], [2], there have
been intensive studies on SVM for classification and regres-
sion [3]–[5]. SVM is quite satisfying from a theoretical point of
view and can lead to great potential and superior performance
in practical applications. This is largely due to the structural risk
minimization (SRM) principle in SVM, which has greater gen-
eralization ability and is superior to the empirical risk minimiza-
tion (ERM) principle as adopted in neural networks. In SVM,
the results guarantee global minima, whereas ERM can only lo-
cate local minima. For example, in the training process of neural
networks, the results give out any number of local minima that
are not promised to include global minima. Furthermore, SVM
is adaptive to complex systems and robust in dealing with cor-
rupted data. This feature offers SVM a greater generalization
ability that is the bottleneck of its predecessor, the neural net-
work approach.
The rapid development of SVMs in statistical learning theory
encourages researchers to actively apply SVM to various re-
search fields. Traditionally, many studies focus on the applica-
tion of SVM to document classification and pattern recognition
[2]. For intelligent transportation systems (ITSs), there also are
many works applying SVM to vision-based intelligent vehicles,
such as vehicle detection [6], [7], traffic-pattern recognition [8],
and head recognition [9]. These research results evidence the
feasibility of SVM in ITS.
Recently, the application of SVM to time-series forecasting,
called support vector regression (SVR), has also shown many
breakthroughs and plausible performance, such as forecasting
of financial market [10], forecasting of electricity price [11],
estimation of power consumption [12], and reconstruction of
chaotic systems [13]. Except for traffic-flow prediction [14],
however, there are few SVR results on time-series analysis for
ITS. Since there are many successful results of time-varying
applications with SVR prediction, it motivates our research in
using SVR for travel-time modeling.
In this paper, we use SVR to predict travel time for highway
users. It demonstrates that SVR is applicable to travel-time pre-
diction and outperforms many previous methods. In Section II,
we describe the travel-time prediction problem more formally.
In Section III, we introduce SVR briefly. In Section VI, we ex-
plain our experimental procedure. Then, we present the methods
and results of different travel-time predictors in Sections V and
VI, respectively. Section VII concludes this paper.
II. T
RAVEL-TIME CALCULATION AND PREDICTION
Travel time is the time required to traverse a link or a route
between any two points of interest. There are two approaches
1524-9050/04$20.00 © 2004 IEEE
WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 277
Fig. 1. Travel-time prediction problem. Assume the current time is
t
.
to calculating travel times: link measurement and point mea-
surement [15]. In the link-measurement approach, link or
route travel time is directly measured between two points of
interest by using active test vehicles, passive probe vehicles,
or license-plate matching. In the point-measurement approach,
however, travel time is estimated or inferred indirectly from the
traffic data measured by point-detection devices on the roadway
or roadside, such as loop detectors, laser detectors, and video
cameras. Generally speaking, link-measurement approaches
can collect more precise and experienced travel-time data, but
point-measurement approaches can be deployed more cost
effectively to obtain real-time travel-time data.
There are three categories of traffic data: historical, current,
and predictive [16]. Usually, travel-time prediction can be dis-
tinguished into two main approaches: statistical models and an-
alytical models. Statistical models can be characterized as data-
driven methods that generally use a time series of historical and
current traffic variables such as travel times, speeds, and vol-
umes as input. In Fig. 1, suppose that it currently is time
.Given
the historical travel-time data
, , and
at time , , respectively, we can predict the
future values of
, by analyzing historical
data set. Hence, future values can be forecast based on the cor-
relation between the time-variant historical data set and its out-
comes. Numerous statistical methods on the accurate prediction
of travel time have been proposed, such as the ARIMA model
[17], linear model [18]–[21], and neural networks [22]–[24].
The main idea of traffic forecasting in statistical models is
based on the fact that traffic behaviors possess both partially de-
terministic and partially chaotic properties. Forecasting results
can be obtained by reconstructing the deterministic traffic mo-
tion and predicting the random behaviors caused by unantici-
pated factors. On the other hand, analytical models predict travel
times by using microscopic or macroscopic traffic simulators,
such as METANET [25], [26], NETCELL [27], and MITSIM
[28]. They usually require dynamic outside diameter (OD) ma-
trices as input and the predicted travel times evolve naturally
from the simulation results.
III. SVR
As shown in Fig. 2, the basic idea of SVM is to map the
training data from the input space into a higher dimensional
feature space via function
and then construct a separating
hyperplane with maximum margin in the feature space. Given
a training set of data
, , where corre-
sponds to the size of the training data and
class labels,
SVM will find a hyperplane direction
and an offset scalar
such that for positive examples and
Fig. 2. Basic idea of SVM to solve the binary classification problem,
separating circular balls from square tiles.
for negative examples. Consequently,
although we cannot find a linear function in the input space to
decide what type the given data is, we can easily find an optimal
hyperplane that can clearly discriminate between the two types
of data.
Consider a set of training data
, where
each
denotes the input space of the sample and has a
corresponding target value
for , where
corresponds to the size of the training data [4], [5]. The idea
of the regression problem is to determine a function that can
approximate future values accurately.
The generic SVR estimating function takes the form
(1)
where
, , and denotes a nonlinear transfor-
mation from
to high-dimensional space. Our goal is to find
the value of
and such that values of can be determined by
minimizing the regression risk
(2)
where
is a cost function, is a constant, and vector can
be written in terms of data points as
(3)
By substituting (3) into (1), the generic equation can be
rewritten as
(4)
In (4), the dot product can be replaced with function
,
known as the kernel function. Kernel functions enable the dot
product to be performed in high-dimensional feature space
using low-dimensional space data input without knowing the
transformation
. All kernel functions must satisfy Mercer’s
condition that corresponds to the inner product of some feature
space. The RBF is commonly used as the kernel for regression
(5)
Some common kernels are shown in Table I. In our studies,
we have experimented with these three kernels.
278 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
TABLE I
C
OMMON KERNEL
FUNCTIONS
The -insensitive loss function is the most widely used cost
function [5]. The function is in the form
for
otherwise.
(6)
By solving the quadratic optimization problem, the regres-
sion risk in (2) and the
-insensitive loss function (6) can be
minimized
subject to
(7)
The Lagrange multipliers
and represent solutions to the
above quadratic problem, which act as forces pushing predic-
tions toward target value
. Only the nonzero values of the La-
grange multipliers in (7) are useful in forecasting the regression
line and are known as support vectors. For all points inside the
tube, the Lagrange multipliers equal to zero do not contribute to
the regression function. Only if the requirement
(see Fig. 3) is fulfilled, Lagrange multipliers may be nonzero
values and used as support vectors.
The constant
introduced in (2) determines penalties to es-
timation errors. A large
assigns higher penalties to errors so
that the regression is trained to minimize error with lower gener-
alization, while a small
assigns fewer penalties to errors. This
allows the minimization of margin with errors, thus higher gen-
eralization ability. If
goes to infinity, SVR would not allow the
occurrence of any error and results in a complex model, whereas
when
goes to 0, the result would tolerate a large amount of
errors and the model would be less complex.
Now, we have solved the value of w in terms of the Lagrange
multipliers. For the variable
, it can be computed by applying
the Karush–Kuhn–Tucker (KKT) conditions that, in this case,
imply that the product of the Lagrange multipliers and con-
strains has to equal to 0
(8)
and
Fig. 3. SVR to fit a tube with radius
"
to the data and positive slack variables
measuring the points lying outside of the tube.
(9)
where
and are slack variables used to measure errors out-
side the
tube. Since , , and for ,
can be computed as
for
for (10)
Putting it all together, we can use SVM and SVR without
knowing the transformation. We need to experiment kernel
functions; penalty C, which determines the penalties to estima-
tion errors; and radius
, which determines the data inside the
tube to be ignored in regression.
IV. E
XPERIMENTAL PROCEDURE
A. Data Preparation
The traffic data is provided by the Intelligent Transportation
Web Service Project (ITWS) [29], [30] at Academia Sinica,
a governmental research center based in Taipei, Taiwan. The
Taiwan Area National Freeway Bureau (TANFB) constantly
collects vehicle speed information from loop detectors that are
deployed at 1-km intervals along the Sun Yet-Sen Highway.
The TANFB web site provides the raw traffic information
source, which is updated once every 3 min. The loop detector
data is employed to derive travel time indirectly: the travel-time
information is computed from the variable speed and the known
distance between detectors.
Since traffic data may be missed or corrupted, we select a
better portion of the dataset of the highway between February 15
and March 21, 2003. During this five-week period, there are no
special holidays and the data loss rate is not over some threshold
value, which could bias our results if not properly managed.
We use data from the first 28 d as the training set and use the
last 7 d as our testing set. We examine the travel times over
three different distances: from Taipei to Chungli, Taichung and
Kaohsiung, which cover 45-, 178-, and 350-km stretches, re-
spectively. In addition, we examine the travel times of a 45-km
distance between 7:00 and 10:00
AM
further, since the travel
time of a short distance in rush hour changes more dynamically.
Fig. 4 shows the travel-time distribution of the short distance
on a daily and weekly basis, respectively. We can find the daily
WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 279
Fig. 4. Daily and weekly travel-time distributions traveling from Taipei to
Chungli, a 45-km stretch, between 7:00 and 10:00
AM for five Wednesdays and
five weeks between February 15 and March 21, 2003.
similarities and the instant dynamics from the daily and weekly
patterns.
B. Prediction Methodology and Error Measurements
Suppose that the current time is
and we want to predict
at the future time with the knowledge of the value ,
for past time , ,
respectively. The prediction function is expressed as
We examine the travel times of different prediction methods
for departing from 7:00–10:00
AM during the last week between
March 15 and March 21, 2003. Relative mean errors (RME)
and root-mean-squared errors (rmse) are applied as performance
indices
where is the observation value and is the predicted value.
V. T RAVEL-TIME PREDICTING METHODS
To evaluate the applicability of travel-time prediction with
SVR, some common baseline travel-time prediction methods
are exploited for performance comparison.
Fig. 5. Comparisons of predicted travel times over short distance in rush hour
using different predicting methods.
A. SVR Prediction Method
As discussed previously, there are many parameters that must
be set for travel-time prediction with SVR. We have tried several
combinations and finally chose a linear function as the kernel for
performance comparison with
0.01 and 1000. In our
experiences, however, the RBF kernel also performed as well
as a linear kernel in many cases. The SVR experiments were
done by running mySVM software kit with training window size
equal to five [31].
B. Current Travel-Time Prediction Method
This method computes travel time from the data available at
the instant when prediction is performed [24]. The travel time
is defined by
where is the data delay, is the number of sections,
denotes the distance of a section of a highway, and
is the speed at the start of the highway section.
280 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
TABLE II
P
REDICTION RESULTS IN
RME
AND RMSE OF
DIFFERENT
PREDICTORS FOR
TRAVELING
DIFFERENT DISTANCES (ALL
TESTING DATA
POINTS)
TABLE III
P
REDICTION
RESULTS FOR THE
TESTING DATA
POINTS
THAT HAV E
GREATER
PREDICTION
ERRORS
(
>
=5%)
IN
ANY ONE OF THE
PREDICTORS
C. Historical Mean Prediction Method
This is the travel time obtained from the average travel time
of the historical traffic data at the same time of day and day of
week
where is the number of weeks trained and is the past
travel time at time
of historical week .
VI. R
ESULTS
The experimental results of travel-time prediction over a short
distance in rush hour are shown in Fig. 5. As expected, the his-
torical-mean predictor cannot reflect the traffic patterns that are
quite different from the past average and the current-time pre-
dictor is usually slow to reflect the changes of traffic patterns.
Since SVR can converge rapidly and avoid local minimum, the
SVR predictor performs very well in our experiments.
The results in Table II show the RME and rmse of different
predictors for different travel distances over all the data points of
the testing set. They show that the SVR predictor reduces both
RME and rmse to less than half of those achieved by the current-
time and historical-mean predictors for all different distances.
In our experiments, as the traveling distance increases, the
number of free sections increases more than the number of busy
sections, such that the travel time of long distance is dominated
by the time to travel-free sections. So it is not surprising that all
three of the predictors predict well for long distance (350 km),
but this makes it difficult to compare the performances of the
three predictors. For this reason, we specifically examine the
testing data points where the predicted error of any predictor
is larger than or equal to 5%. As shown in Table III, the SVR
predictor not only improves the overall performance, but also
significantly reduces the prediction errors for the cases where
there are worse prediction errors in any one of the predictors.
VII. C
ONCLUSION
Support vector machine and SVR have demonstrated their
success in time-series analysis and statistical learning. However,
little work has been done for traffic data analysis. In this paper,
we examine the feasibility of applying SVR to travel-time pre-
diction. After numerous experiments, we propose a set of SVR
parameters that can predict travel times very well. The results
show that the SVR predictor significantly outperforms the other
baseline predictors. This evidences the applicability of SVR to
traffic data analysis.
R
EFERENCES
[1] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
Springer, 1995.
[2]
, “An overview of statistical learning theory,” IEEE Trans. Neural
Networks, vol. 10, pp. 988–999, Sept. 1999.
[3] S. R. Gunn, “Support vector machine for classification and regression,”
Tech. Rep., Univ. Southampton, Southampton, U.K., May 1998.
[4] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and
V. Vapnik, “Predicting time series with support vector machine,” in Proc.
Int. Conf. Artificial Neural Networks (ICANN’97), W. Gerstner, A. Ger-
mond, M. Hasler, and J.-D. Nicoud, Eds., 1997, pp. 999–1004. Springer
LNCS 1327.
[5]
, “Using support vector support machines for time series predic-
tion,” in Advances in Kernel Methods, B. Schölkopf, C. J. C. Burges, and
A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 242–253.
[6] Z. Sun, G. Bebis, and R. Miller, “Improving the performance of on-road
vehicle detection by combining Gabor and wavelet features,” in Proc.
IEEE 5th Int. Conf. Intelligent Transportation Systems, 2002, pp.
130–135.
[7] D. Gao, J. Zhou, and L. Xin, “SVM-based detection of moving vehicles
for automatic traffic monitoring,” in Proc. IEEE 4th Int. Conf. Intelligent
Transportation Systems, 2001, pp. 745–749.
[8] J. T. Ren, X. L. Ou, Y. Zhang, and D. C. Hu, “Research on network-
level traffic pattern recognition,” in Proc. IEEE 5th Int. Conf. Intelligent
Transportation Systems, 2002, pp. 500–504.