scispace - formally typeset
Open AccessJournal ArticleDOI

Travel-time prediction with support vector regression

Chun-Hsin Wu, +2 more
- 01 Dec 2004 - 
- Vol. 5, Iss: 4, pp 276-281
Reads0
Chats0
TLDR
The feasibility of applying SVR in travel-time prediction is demonstrated and it is proved that SVR is applicable and performs well for traffic data analysis.
Abstract
Travel time is a fundamental measure in transportation. Accurate travel-time prediction also is crucial to the development of intelligent transportation systems and advanced traveler information systems. We apply support vector regression (SVR) for travel-time prediction and compare its results to other baseline travel-time prediction methods using real highway traffic data. Since support vector machines have greater generalization ability and guarantee global minima for given training data, it is believed that SVR will perform well for time series analysis. Compared to other baseline predictors, our results show that the SVR predictor can significantly reduce both relative mean errors and root-mean-squared errors of predicted travel times. We demonstrate the feasibility of applying SVR in travel-time prediction and prove that SVR is applicable and performs well for traffic data analysis.

read more

Content maybe subject to copyright    Report

276 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
Travel-Time Prediction With Support
Vector Regression
Chun-Hsin Wu, Member, IEEE, Jan-Ming Ho, Member, IEEE, and D. T. Lee, Fellow, IEEE
Abstract—Travel time is a fundamental measure in transporta-
tion. Accurate travel-time prediction also is crucial to the develop-
ment of intelligent transportation systems and advanced traveler
information systems. In this paper, we apply support vector regres-
sion (SVR) for travel-time prediction and compare its results to
other baseline travel-time prediction methods using real highway
traffic data. Since support vector machines have greater gener-
alization ability and guarantee global minima for given training
data, it is believed that SVR will perform well for time series anal-
ysis. Compared to other baseline predictors, our results show that
the SVR predictor can significantly reduce both relative mean er-
rors and root-mean-squared errors of predicted travel times. We
demonstrate the feasibility of applying SVR in travel-time predic-
tion and prove that SVR is applicable and performs well for traffic
data analysis.
Index Terms—Intelligent transportation systems (ITSs), support
vector machines, support vector regression (SVR), time series anal-
ysis, travel-time prediction.
I. INTRODUCTION
T
RAVEL-TIME data are the raw elements for a number of
performance measures in many transportation analyzes.
They can be used in transportation planning, design and oper-
ations, and evaluation. Especially, travel-time data are critical
pretrip and
en route information in advanced traveler informa-
tion systems. They are very informative to drivers and travelers
to make decision or plan schedules. With precise travel-time
prediction, a route-guidance system can suggest optimal alter-
nate routes or warn of potential traffic congestion to users; users
can then decide the best departure time or estimate their ex-
pected arrival time based on predicted travel times.
Travel-time calculation depends on vehicle speed, traffic flow,
and occupancy, which are highly sensitive to weather condi-
tions and traffic incidents. These features make travel-time pre-
dictions very complex and difficult to reach optimal accuracy.
Nonetheless, daily, weekly, and seasonal patterns can still be
observed at a large scale. For instance, daily patterns distin-
guish rush hour and late-night traffic and weekly patterns dis-
tinguish weekday and weekend traffic, while seasonal patterns
distinguish winter and summer traffic. The time-varying feature
germane to traffic behavior is the key to travel-time modeling.
Manuscript received December 1, 2003; revised August 1, 2004. This work
was supported in part by the Academia Sinica, Taiwan, under Thematic Program
2001–2003. The Associate Editor for this paper was F.-Y. Wang.
C. H. Wu is with the Department of Computer Science and Information
Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, and
with the Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
(e-mail: wuch@iis.sinica.edu.tw).
J.-M. Ho and D. T. Lee are with the Institute of Information Science,
Academia Sinica, Taipei 115, Taiwan (e-mail: hoho@iis.sinica.edu.tw;
dtlee@iis.sinica. edu.tw).
Digital Object Identifier 10.1109/TITS.2004.837813
Since the creation of support vector machine (SVM) theory
by Vapnik of the AT&T Bell Laboratories [1], [2], there have
been intensive studies on SVM for classification and regres-
sion [3]–[5]. SVM is quite satisfying from a theoretical point of
view and can lead to great potential and superior performance
in practical applications. This is largely due to the structural risk
minimization (SRM) principle in SVM, which has greater gen-
eralization ability and is superior to the empirical risk minimiza-
tion (ERM) principle as adopted in neural networks. In SVM,
the results guarantee global minima, whereas ERM can only lo-
cate local minima. For example, in the training process of neural
networks, the results give out any number of local minima that
are not promised to include global minima. Furthermore, SVM
is adaptive to complex systems and robust in dealing with cor-
rupted data. This feature offers SVM a greater generalization
ability that is the bottleneck of its predecessor, the neural net-
work approach.
The rapid development of SVMs in statistical learning theory
encourages researchers to actively apply SVM to various re-
search fields. Traditionally, many studies focus on the applica-
tion of SVM to document classification and pattern recognition
[2]. For intelligent transportation systems (ITSs), there also are
many works applying SVM to vision-based intelligent vehicles,
such as vehicle detection [6], [7], traffic-pattern recognition [8],
and head recognition [9]. These research results evidence the
feasibility of SVM in ITS.
Recently, the application of SVM to time-series forecasting,
called support vector regression (SVR), has also shown many
breakthroughs and plausible performance, such as forecasting
of financial market [10], forecasting of electricity price [11],
estimation of power consumption [12], and reconstruction of
chaotic systems [13]. Except for traffic-flow prediction [14],
however, there are few SVR results on time-series analysis for
ITS. Since there are many successful results of time-varying
applications with SVR prediction, it motivates our research in
using SVR for travel-time modeling.
In this paper, we use SVR to predict travel time for highway
users. It demonstrates that SVR is applicable to travel-time pre-
diction and outperforms many previous methods. In Section II,
we describe the travel-time prediction problem more formally.
In Section III, we introduce SVR briefly. In Section VI, we ex-
plain our experimental procedure. Then, we present the methods
and results of different travel-time predictors in Sections V and
VI, respectively. Section VII concludes this paper.
II. T
RAVEL-TIME CALCULATION AND PREDICTION
Travel time is the time required to traverse a link or a route
between any two points of interest. There are two approaches
1524-9050/04$20.00 © 2004 IEEE

WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 277
Fig. 1. Travel-time prediction problem. Assume the current time is
t
.
to calculating travel times: link measurement and point mea-
surement [15]. In the link-measurement approach, link or
route travel time is directly measured between two points of
interest by using active test vehicles, passive probe vehicles,
or license-plate matching. In the point-measurement approach,
however, travel time is estimated or inferred indirectly from the
trafc data measured by point-detection devices on the roadway
or roadside, such as loop detectors, laser detectors, and video
cameras. Generally speaking, link-measurement approaches
can collect more precise and experienced travel-time data, but
point-measurement approaches can be deployed more cost
effectively to obtain real-time travel-time data.
There are three categories of trafc data: historical, current,
and predictive [16]. Usually, travel-time prediction can be dis-
tinguished into two main approaches: statistical models and an-
alytical models. Statistical models can be characterized as data-
driven methods that generally use a time series of historical and
current trafc variables such as travel times, speeds, and vol-
umes as input. In Fig. 1, suppose that it currently is time
.Given
the historical travel-time data
, , and
at time , , respectively, we can predict the
future values of
, by analyzing historical
data set. Hence, future values can be forecast based on the cor-
relation between the time-variant historical data set and its out-
comes. Numerous statistical methods on the accurate prediction
of travel time have been proposed, such as the ARIMA model
[17], linear model [18][21], and neural networks [22][24].
The main idea of trafc forecasting in statistical models is
based on the fact that trafc behaviors possess both partially de-
terministic and partially chaotic properties. Forecasting results
can be obtained by reconstructing the deterministic trafc mo-
tion and predicting the random behaviors caused by unantici-
pated factors. On the other hand, analytical models predict travel
times by using microscopic or macroscopic trafc simulators,
such as METANET [25], [26], NETCELL [27], and MITSIM
[28]. They usually require dynamic outside diameter (OD) ma-
trices as input and the predicted travel times evolve naturally
from the simulation results.
III. SVR
As shown in Fig. 2, the basic idea of SVM is to map the
training data from the input space into a higher dimensional
feature space via function
and then construct a separating
hyperplane with maximum margin in the feature space. Given
a training set of data
, , where corre-
sponds to the size of the training data and
class labels,
SVM will nd a hyperplane direction
and an offset scalar
such that for positive examples and
Fig. 2. Basic idea of SVM to solve the binary classication problem,
separating circular balls from square tiles.
for negative examples. Consequently,
although we cannot nd a linear function in the input space to
decide what type the given data is, we can easily nd an optimal
hyperplane that can clearly discriminate between the two types
of data.
Consider a set of training data
, where
each
denotes the input space of the sample and has a
corresponding target value
for , where
corresponds to the size of the training data [4], [5]. The idea
of the regression problem is to determine a function that can
approximate future values accurately.
The generic SVR estimating function takes the form
(1)
where
, , and denotes a nonlinear transfor-
mation from
to high-dimensional space. Our goal is to nd
the value of
and such that values of can be determined by
minimizing the regression risk
(2)
where
is a cost function, is a constant, and vector can
be written in terms of data points as
(3)
By substituting (3) into (1), the generic equation can be
rewritten as
(4)
In (4), the dot product can be replaced with function
,
known as the kernel function. Kernel functions enable the dot
product to be performed in high-dimensional feature space
using low-dimensional space data input without knowing the
transformation
. All kernel functions must satisfy Mercers
condition that corresponds to the inner product of some feature
space. The RBF is commonly used as the kernel for regression
(5)
Some common kernels are shown in Table I. In our studies,
we have experimented with these three kernels.

278 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
TABLE I
C
OMMON KERNEL
FUNCTIONS
The -insensitive loss function is the most widely used cost
function [5]. The function is in the form
for
otherwise.
(6)
By solving the quadratic optimization problem, the regres-
sion risk in (2) and the
-insensitive loss function (6) can be
minimized
subject to
(7)
The Lagrange multipliers
and represent solutions to the
above quadratic problem, which act as forces pushing predic-
tions toward target value
. Only the nonzero values of the La-
grange multipliers in (7) are useful in forecasting the regression
line and are known as support vectors. For all points inside the
tube, the Lagrange multipliers equal to zero do not contribute to
the regression function. Only if the requirement
(see Fig. 3) is fullled, Lagrange multipliers may be nonzero
values and used as support vectors.
The constant
introduced in (2) determines penalties to es-
timation errors. A large
assigns higher penalties to errors so
that the regression is trained to minimize error with lower gener-
alization, while a small
assigns fewer penalties to errors. This
allows the minimization of margin with errors, thus higher gen-
eralization ability. If
goes to innity, SVR would not allow the
occurrence of any error and results in a complex model, whereas
when
goes to 0, the result would tolerate a large amount of
errors and the model would be less complex.
Now, we have solved the value of w in terms of the Lagrange
multipliers. For the variable
, it can be computed by applying
the KarushKuhnTucker (KKT) conditions that, in this case,
imply that the product of the Lagrange multipliers and con-
strains has to equal to 0
(8)
and
Fig. 3. SVR to t a tube with radius
"
to the data and positive slack variables
measuring the points lying outside of the tube.
(9)
where
and are slack variables used to measure errors out-
side the
tube. Since , , and for ,
can be computed as
for
for (10)
Putting it all together, we can use SVM and SVR without
knowing the transformation. We need to experiment kernel
functions; penalty C, which determines the penalties to estima-
tion errors; and radius
, which determines the data inside the
tube to be ignored in regression.
IV. E
XPERIMENTAL PROCEDURE
A. Data Preparation
The trafc data is provided by the Intelligent Transportation
Web Service Project (ITWS) [29], [30] at Academia Sinica,
a governmental research center based in Taipei, Taiwan. The
Taiwan Area National Freeway Bureau (TANFB) constantly
collects vehicle speed information from loop detectors that are
deployed at 1-km intervals along the Sun Yet-Sen Highway.
The TANFB web site provides the raw trafc information
source, which is updated once every 3 min. The loop detector
data is employed to derive travel time indirectly: the travel-time
information is computed from the variable speed and the known
distance between detectors.
Since trafc data may be missed or corrupted, we select a
better portion of the dataset of the highway between February 15
and March 21, 2003. During this ve-week period, there are no
special holidays and the data loss rate is not over some threshold
value, which could bias our results if not properly managed.
We use data from the rst 28 d as the training set and use the
last 7 d as our testing set. We examine the travel times over
three different distances: from Taipei to Chungli, Taichung and
Kaohsiung, which cover 45-, 178-, and 350-km stretches, re-
spectively. In addition, we examine the travel times of a 45-km
distance between 7:00 and 10:00
AM
further, since the travel
time of a short distance in rush hour changes more dynamically.
Fig. 4 shows the travel-time distribution of the short distance
on a daily and weekly basis, respectively. We can nd the daily

WU et al.: TRAVEL-TIME PREDICTION WITH SUPPORT VECTOR REGRESSION 279
Fig. 4. Daily and weekly travel-time distributions traveling from Taipei to
Chungli, a 45-km stretch, between 7:00 and 10:00
AM for ve Wednesdays and
ve weeks between February 15 and March 21, 2003.
similarities and the instant dynamics from the daily and weekly
patterns.
B. Prediction Methodology and Error Measurements
Suppose that the current time is
and we want to predict
at the future time with the knowledge of the value ,
for past time , ,
respectively. The prediction function is expressed as
We examine the travel times of different prediction methods
for departing from 7:0010:00
AM during the last week between
March 15 and March 21, 2003. Relative mean errors (RME)
and root-mean-squared errors (rmse) are applied as performance
indices
where is the observation value and is the predicted value.
V. T RAVEL-TIME PREDICTING METHODS
To evaluate the applicability of travel-time prediction with
SVR, some common baseline travel-time prediction methods
are exploited for performance comparison.
Fig. 5. Comparisons of predicted travel times over short distance in rush hour
using different predicting methods.
A. SVR Prediction Method
As discussed previously, there are many parameters that must
be set for travel-time prediction with SVR. We have tried several
combinations and nally chose a linear function as the kernel for
performance comparison with
0.01 and 1000. In our
experiences, however, the RBF kernel also performed as well
as a linear kernel in many cases. The SVR experiments were
done by running mySVM software kit with training window size
equal to ve [31].
B. Current Travel-Time Prediction Method
This method computes travel time from the data available at
the instant when prediction is performed [24]. The travel time
is dened by
where is the data delay, is the number of sections,
denotes the distance of a section of a highway, and
is the speed at the start of the highway section.

280 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 5, NO. 4, DECEMBER 2004
TABLE II
P
REDICTION RESULTS IN
RME
AND RMSE OF
DIFFERENT
PREDICTORS FOR
TRAVELING
DIFFERENT DISTANCES (ALL
TESTING DATA
POINTS)
TABLE III
P
REDICTION
RESULTS FOR THE
TESTING DATA
POINTS
THAT HAV E
GREATER
PREDICTION
ERRORS
(
>
=5%)
IN
ANY ONE OF THE
PREDICTORS
C. Historical Mean Prediction Method
This is the travel time obtained from the average travel time
of the historical trafc data at the same time of day and day of
week
where is the number of weeks trained and is the past
travel time at time
of historical week .
VI. R
ESULTS
The experimental results of travel-time prediction over a short
distance in rush hour are shown in Fig. 5. As expected, the his-
torical-mean predictor cannot reect the trafc patterns that are
quite different from the past average and the current-time pre-
dictor is usually slow to reect the changes of trafc patterns.
Since SVR can converge rapidly and avoid local minimum, the
SVR predictor performs very well in our experiments.
The results in Table II show the RME and rmse of different
predictors for different travel distances over all the data points of
the testing set. They show that the SVR predictor reduces both
RME and rmse to less than half of those achieved by the current-
time and historical-mean predictors for all different distances.
In our experiments, as the traveling distance increases, the
number of free sections increases more than the number of busy
sections, such that the travel time of long distance is dominated
by the time to travel-free sections. So it is not surprising that all
three of the predictors predict well for long distance (350 km),
but this makes it difcult to compare the performances of the
three predictors. For this reason, we specically examine the
testing data points where the predicted error of any predictor
is larger than or equal to 5%. As shown in Table III, the SVR
predictor not only improves the overall performance, but also
signicantly reduces the prediction errors for the cases where
there are worse prediction errors in any one of the predictors.
VII. C
ONCLUSION
Support vector machine and SVR have demonstrated their
success in time-series analysis and statistical learning. However,
little work has been done for trafc data analysis. In this paper,
we examine the feasibility of applying SVR to travel-time pre-
diction. After numerous experiments, we propose a set of SVR
parameters that can predict travel times very well. The results
show that the SVR predictor signicantly outperforms the other
baseline predictors. This evidences the applicability of SVR to
trafc data analysis.
R
EFERENCES
[1] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
Springer, 1995.
[2]
, An overview of statistical learning theory, IEEE Trans. Neural
Networks, vol. 10, pp. 988999, Sept. 1999.
[3] S. R. Gunn, Support vector machine for classication and regression,
Tech. Rep., Univ. Southampton, Southampton, U.K., May 1998.
[4] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and
V. Vapnik, Predicting time series with support vector machine, in Proc.
Int. Conf. Artificial Neural Networks (ICANN’97), W. Gerstner, A. Ger-
mond, M. Hasler, and J.-D. Nicoud, Eds., 1997, pp. 9991004. Springer
LNCS 1327.
[5]
, Using support vector support machines for time series predic-
tion, in Advances in Kernel Methods, B. Schölkopf, C. J. C. Burges, and
A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 242253.
[6] Z. Sun, G. Bebis, and R. Miller, Improving the performance of on-road
vehicle detection by combining Gabor and wavelet features, in Proc.
IEEE 5th Int. Conf. Intelligent Transportation Systems, 2002, pp.
130135.
[7] D. Gao, J. Zhou, and L. Xin, SVM-based detection of moving vehicles
for automatic trafc monitoring, in Proc. IEEE 4th Int. Conf. Intelligent
Transportation Systems, 2001, pp. 745749.
[8] J. T. Ren, X. L. Ou, Y. Zhang, and D. C. Hu, Research on network-
level trafc pattern recognition, in Proc. IEEE 5th Int. Conf. Intelligent
Transportation Systems, 2002, pp. 500504.

Citations
More filters

Support Vector Regression

TL;DR: An attempt has been made to review the existing theory, methods, recent developments and scopes of Support Vector Regression.
Journal ArticleDOI

LSTM network: a deep learning approach for short-term traffic forecast

TL;DR: A novel traffic forecast model based on long short-term memory (LSTM) network is proposed, which considers temporal-spatial correlation in traffic system via a two-dimensional network which is composed of many memory units.
Journal ArticleDOI

T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction

TL;DR: In this article, a novel neural network-based traffic forecasting method, the temporal graph convolutional network (T-GCN) model, which is combined with the graph convolutionsal network and the gated recurrent unit (GRU), is proposed.
Journal ArticleDOI

Short-term traffic forecasting: Where we are and where we’re going

TL;DR: In this article, the authors present a review of the existing literature on short-term traffic forecasting and offer suggestions for future work, focusing on 10 challenging, yet relatively under researched, directions.
Journal ArticleDOI

Time Series Prediction Using Support Vector Machines: A Survey

TL;DR: A survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM).
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

An overview of statistical learning theory

TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.

Support Vector Machines for Classification and Regression

Steve R. Gunn
TL;DR: The Structural Risk Minimization (SRM) as discussed by the authors principle has been shown to be superior to traditional empirical risk minimization (ERM) principle employed by conventional neural networks, as opposed to ERM which minimizes the error on the training data.
Book ChapterDOI

Predicting Time Series with Support Vector Machines

TL;DR: Two different cost functions for Support Vectors are made use: training with an e insensitive loss and Huber's robust loss function and how to choose the regularization parameters in these models are discussed.
Related Papers (5)