scispace - formally typeset
Open AccessJournal ArticleDOI

Deep Learning Empowered Task Offloading for Mobile Edge Computing in Urban Informatics

Reads0
Chats0
TLDR
This work adopts a deep Q-learning approach for designing optimal offloading schemes and proposes an efficient redundant offloading algorithm to improve task offloading reliability in the case of vehicular data transmission failure and evaluates the proposed schemes based on real traffic data.
Abstract
Led by industrialization of smart cities, numerous interconnected mobile devices, and novel applications have emerged in the urban environment, providing great opportunities to realize industrial automation. In this context, autonomous driving is an attractive issue, which leverages large amounts of sensory information for smart navigation while posing intensive computation demands on resource constrained vehicles. Mobile edge computing (MEC) is a potential solution to alleviate the heavy burden on the devices. However, varying states of multiple edge servers as well as a variety of vehicular offloading modes make efficient task offloading a challenge. To cope with this challenge, we adopt a deep Q-learning approach for designing optimal offloading schemes, jointly considering selection of target server and determination of data transmission mode. Furthermore, we propose an efficient redundant offloading algorithm to improve task offloading reliability in the case of vehicular data transmission failure. We evaluate the proposed schemes based on real traffic data. Results indicate that our offloading schemes have great advantages in optimizing system utilities and improving offloading reliability.

read more

Content maybe subject to copyright    Report

1
Deep Learning Empowered Task Offloading for
Mobile Edge Computing in Urban Informatics
Ke Zhang, Yongxu Zhu, Member, IEEE, Supeng Leng, Member, IEEE, Yejun He, Senior Member, IEEE,
Sabita Maharjan, Member, IEEE, and Yan Zhang, Senior Member, IEEE
Abstract—Led by industrialization of smart cities, numer-
ous interconnected mobile devices and novel applications have
emerged in the urban environment, providing great opportunities
to realize industrial automation. In this context, autonomous
driving is an attractive issue, which leverages large amounts of
sensory information for smart navigation while posing intensive
computation demands on resource constrained vehicles. Mobile
Edge Computing (MEC) is a potential solution to alleviate the
heavy burden on the devices. However, varying states of multiple
edge servers as well as a variety of vehicular offloading modes
make efficient task offloading a challenge. To cope with this
challenge, we adopt a deep Q-learning approach for designing
optimal offloading schemes, jointly considering selection of target
server and determination of data transmission mode. Further-
more, we propose an efficient redundant offloading algorithm to
improve task offloading reliability in the case of vehicular data
transmission failure. We evaluate the proposed schemes based on
real traffic data. Results indicate that our offloading schemes have
great advantages in optimizing system utilities and improving
offloading reliability.
Index Terms—Offloading, Q-learning, reliability, vehicular
edge computing.
I. INTRODUCTION
Along with the advancement of Internet of Things (IoT) in
industrial application scenarios, urban life pattern is undergo-
ing a tremendous change [1]. A high level of interconnections
between heterogenous smart devices bring in the possibility
to enable industrial automation, which improves operation and
increases productivity with less or without human work [2].
Autonomous driving is one of the most attractive indus-
trial automation applications. With large number of on-board
sensors and actuators as well as advanced control systems, au-
tonomous vehicles are capable of interpreting sensory informa-
tion and identifying appropriate navigation paths. In addition,
with the aid of information infrastructure in urban area and the
introduction of Intelligent Transportation System (ITS), smart
vehicles facilitate us with a pervasive and promising platform
to realize a broad range of novel mobile applications, such as
K. Zhang and S. Leng are with the School of Information and Communica-
tion Engineering, University of Electronic Science and Technology of China
(e-mail:{zhangke, spleng}@uestc.edu.cn).
Y. Zhu is with Wolfson School of Mechanical, Electrical and Manufacturing
Engineering, Loughborough University, U.K. (e-mail: y.zhu4@lboro.ac.uk).
Y. He is with College of Electronics and Information Engineering, Shenzhen
University, China (e-mail: heyejun@126.com).
S. Maharjan is with Simula Metropolitan Center for Digital Engineering,
and University of Osloy, Norway (e-mail: sabita@simula.no).
Y. Zhang is with Department of Informatics, University of Oslo, Norway.
He is also with Simula Metropolitan Center for Digital Engineering, Norway
(e-mail: yanzhang@ieee.org).
Corresponding author: Y. Zhang (e-mail: yanzhang@ieee.org).
Fig. 1. MEC-enabled intelligent traffic applications.
augmented reality, natural language processing and interactive
gaming [3]. However, the process of understanding highly
dynamic and complex traffic environment while making real-
time driving decisions involves processing a great volume of
sensory data and requires intensive computations. Due to the
constraint of on-board computation power, supporting these
real-time and computationally intensive tasks and applications
on vehicles is a big challenge.
Mobile Edge Computing (MEC), where heavy computation
tasks are offloaded to the cloud resources placed at the edge of
mobile networks, has emerged as a promising approach to cope
with growing computing demands [4]. Fig. 1 represents a typ-
ical scenario of applying MEC technique for intelligent traffic
applications. Aided with MEC technique, safety-oriented tasks
in the context of autonomous driving, computation-intensive
vehicular applications and traffic sensory data analysis can be
offloaded from vehicles to MEC servers on various types of
proximate wireless access infrastructure, where real-time data
processing and feedback can be achieved [5].
In MEC process, the implementing efficiency is tightly
coupled with the scheduling and management of computing
and communication resources [6]. With the harmonization of
the global deployment of Long Term Evolution (LTE) systems,
LTE-V is a paradigm in vehicular communications and plays a
vital role in the design of task offloading schemes. However,
inherent characteristics of vehicular networks, such as high
speed mobility, time-varying topology and ephemeral interac-
tions, bring unprecedented challenges in managing vehicular
communication applied with MEC applications. In the context
of vehicular networks with multiple MEC servers, the mutual
effects between transmission mode selection and offloading
target server determination make MEC scheduling even more
complex. Furthermore, transmission may fail during vehicular
offloading process, which limits the performance and applica-
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

2
bility of MEC services. Thus, novel solutions are necessary
to address this issue in order to guarantee both reliability and
efficiency of the vehicular task offloading. However, very few
works have investigated integrating management of computing
and communication resources in a multi-server vehicular edge
computing network, and task offloading reliability has not been
incorporated in the recent literatures.
To bridge this gap, in this paper, we focus on task of-
floading in an MEC-enabled vehicular network, and present
an approach to optimize MEC system performance while also
improving offloading reliability. The main contributions of this
paper are as follows:
We present an MEC-enabled LTE-V network, where the
influence of various vehicular communication modes on
the task offloading performance is qualitatively analyzed.
By applying deep Q-learning approach, we propose op-
timal target MEC server determination and transmission
mode selection schemes, which maximize the utilities of
offloading system under given delay constraints.
To cope with transmission failure in vehicular networks,
we design an efficient redundant offloading algorithm,
which also ensures offloading reliability while improving
the gained utilities.
The remainder of the paper is organized as follows. In Sec-
tion II, we review related work. A vehicular edge computing
system model is presented in Section III. A deep Q-learning
based offloading scheme is described in Section IV. In Section
V, we investigate offloading reliability. Performance evaluation
is presented in Section VI. Finally, we conclude our work in
Section VII.
II. RELATED WORK
To meet the demands from computationally intensive ve-
hicular applications, some studies have investigated applying
MEC approach in vehicular networks. In [7], the authors
proposed an energy efficient resource allocation for vehicular
fog computing centers. In [8], an MEC-based architecture was
used in urban traffic management in a distributed and adaptive
service manner. In [9], the authors designed a fog vehicular
computing framework that integrates resources from both edge
server and remote cloud. In [10], the authors unveiled underuti-
lized vehicular computing resources, and put them into use for
providing efficient computational support to MEC servers. To
efficiently merge MEC technology in vehicular networks, the
authors in [11] introduced a collaborative task offloading and
output transmission mechanism. In [12], the authors designed
an MEC service migration scheme that ensures vehicles always
connect to the nearest MEC entities. Although these studies
have provided some insights about MEC-enabled vehicular
applications, the effects of vehicular communication on the
design of task offloading strategies have not been thoroughly
investigated.
Benefiting from fast commercialization of LTE system,
LTE-V has been one of the key technologies in vehicular
networks. Several recent works have focused on analytical
models and implementations of LTE-V. In [13], direct vehicle
communication was utilized to offload data transmission from
Fig. 2. Task offloading in an MEC-enabled vehicular network.
vehicles with poor quality link to infrastructures. Taking into
account high mobility of vehicles, the authors in [14] designed
a wireless link formulation mechanism, where beamwidths
between vehicular communication pairs were optimized. In
[15], the authors discussed key building blocks of 5G networks
in the context of vehicular communications. However, joint
V2I and V2V transmission schemes in a multiple MEC server
scenario have not been considered in the previous studies.
Learning is a branch of artificial intelligence, which studies
systems and acquires knowledge from data. Recently, various
learning techniques have been deployed for scheduling task
offloading. In [16], the authors proposed an online learning
based workload offloading scheme in mobile edge computing
systems with renewable power supply. In order to reduce
resource consumption in task offloading, the authors in [17]
formalized intelligent offloading metric prediction utilizing
a machine learning based approach. Deep Q-learning is a
powerful tool in policy optimization, and has been utilized
in various process decisions. For instance, the authors in [18]
designed an integrated resource management scheme for con-
nected vehicles using a deep reinforcement learning approach.
The authors in [19] used deep Q-learning in scheduling voltage
and frequency for real-time systems in embedded devices.
In [20], this learning approach was adopted in designing a
video streaming framework. Moreover, deep Q-learning also
can be used in traffic area. To relieve traffic congestion at
highway junctions, the authors in [21] applied deep Q-learning
in traffic simulation study and vehicle pathway optimization.
However, the potential of learning based approaches have
not been explored for designing scheduling algorithms for
vehicular edge computing applications. Furthermore, mobile
characteristics of vehicles and reliability of vehicular task
offloading have not been considered in the previous studies.
Different from these studies, in this paper we concentrate
on task offloading in an LTE-V network, and propose optimal
offloading schemes that jointly schedule vehicular communica-
tion and edge computing through a deep Q-learning approach.
III. SYSTEM MODEL
Fig. 2 shows the architecture of an MEC-enabled vehicular
network in an urban area. The autonomous navigation of
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

3
TABLE I
MAIN VARIABLES
Variables Description
G Number of task types
f
i
Size of task κ
i
s input
g
i
Amount of task κ
i
s required computation
t
max
i
Maximum delay tolerance of task κ
i
β
i
Probability of a task belonging to type-i
ρ Road traffic density
M Number of MEC servers
W
m
Computing capacity of MEC server m
c
c
Cost of a unit spectrum of cellular network
c
v
Cost of a unit spectrum of vehicular network
P
g
Probability of generating a task in a time frame
P
tx,b
Vehicle transmission power in V2B mode
P
tx,v
Vehicle transmission power in V2V and V2R
modes
q
c
, q
v
Amount of allocated spectrum resources
through cellular network and vehicular net-
work, respectively
the vehicles requires various sensory data processing. More-
over, urban informatic infrastructure-aided mobile applications
may also pose computing requirement on the vehicles. We
model these data processing and computing as computation
tasks. Various tasks may have different characteristics. For
instance, autonomous navigation has strict delay constraints,
while entertainment applications do not impose a critical delay
requirement. According to this consideration, we classify these
tasks into G types. A task is described in four terms as
κ
i
= {f
i
, g
i
, t
max
i
, ς
i
}, i G [22]. Here, f
i
and g
i
are the size
of task input data and the amount of required computation,
respectively. To provide timely response in various traffic
context, the tasks are time sensitive. t
max
i
is the maximum
delay tolerance of task κ
i
. Offloading system gets utility ς
i
t
from the completion of task κ
i
, where t is the saved time
in accomplishing κ
i
compared to t
max
i
. The probability of the
tasks belonging to type-i is denoted as β
i
with
P
i∈G
β
i
= 1.
The road is covered by a heterogeneous vehicular network.
Besides the cellular network provided by a Base Station (BS),
there is an LTE-V network composed of mobile vehicles and
M Road Side Units (RSUs) deployed along the road. The set
of these RSUs is denoted as M. The cellular network and the
vehicular network operate on different and non-overlapping
spectrum. Compared to the BS that has seamless coverage
and high data transmission cost, the RSUs conversely provide
spotty coverage and inexpensive access service. The costs for
using a unit spectrum of the cellular network and for that
belonging to the vehicular network in a unit time are c
c
and
c
v
, respectively. We have c
c
> c
v
.
The BS is equipped with an MEC server through wired
connections, which is denoted as Serv
0
. In addition, each
RSU hosts an MEC server. These servers are denoted
as Serv
1
, Serv
2
,...,Serv
M
, respectively. The MEC servers
get data from their attaching BS or RSUs directly. Let
{W
0
, W
1
, W
2
, ..., W
M
} denote the computing capacities of
these servers, respectively. As the server equipped on BS can
serve the vehicles located on the whole road, we consider that
the capacity of Serv
0
is much higher than that of the servers
deployed on the RSUs. Each MEC server is modeled as a
queuing network, where the input is the offloading task. The
arrived tasks is first cached in an MEC server, and then served
with the first-come-first-serve policy. A server utilizes all of
its computing resources to execute currently served task. The
cost for tasks using a computing resource in a unit time is c
x
.
Each vehicle has a cellular and an LTE-V radio interface,
which work on different spectrum and enable multiple com-
munication paradigms. In the heterogeneous network formed
by overlapping of the BS and the RSUs coverage, vehicles can
offload their tasks to MEC servers through multiple modes. We
name the task file transmission between a vehicle and the BS
as vehicle-to-BS (V2B). When a vehicle turns to the LTE-V
network for task offloading, the file can be transmitted to an
MEC server in a mode with joint vehicle-to-vehicle (V2V) and
vehicle-to-RSU (V2R) transmission.
In self-driving vehicles, real-time vehicle traffic information,
such as position, speed, heading directions can be gathered by
vehicular sensors [23]. Furthermore, channel state information
also can be detected by these vehicles. All this information
together with the description of generated vehicular tasks
are transmitted to a control center through cellular networks.
There is spectrum allocated for this information transmission
besides the spectrum used for task offloading. Based on the
collected information, the control center can utilize communi-
cation resources of the heterogeneous network as well as the
computing resources of MEC servers, and efficiently schedule
task offloading.
The scheduling and resource management are considered
to operate in a discrete time model with fixed length time
frames. The length of a frame is denoted as τ. In each time
frame, a vehicle generates a computing task with probability
P
g
. Enabled by advanced LTE technology, we consider that
the duration time of a task file transmission is within a time
frame. In addition, a task offloading vehicle can only choose
one transmission mode.
The communication topology between vehicles and in-
frastructure keeps constant during one frame. However, the
topology may change in different time frames due to the
mobility of the vehicles. To facilitate the modeling of the
dynamic relations, we divide the road into E segments. The
position of a vehicle on the road can be denoted by the index
of the segment e, where 1 e E. We consider that the
vehicles in the same road segment have an identical distance
to a communication infrastructure.
In assessing the network performance, we focus on the
upstream communication process that offloads tasks from
vehicles to MEC servers in various modes. We consider
that all vehicles have fixed transmission power for a given
transmission mode, i.e., power P
tx,b
in V2B mode and power
P
tx,v
in V2R and V2V modes. In addition, these vehicles have
enough storage for caching task files.
In the case of V2B mode, the assignment of spectrum to
vehicles is orthogonal, and there is no collision between V2B
communication vehicles. For receiving task file from a V2B
mode vehicle, the signal to noise and interference ratio (SINR)
at the BS is given as
γ
v,b
= P
tx,b
G
r
/L
0
d
α
v,b
P
w
,
(1)
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

4
where d
v,b
is the distance between the transmitting vehicle and
the BS. G
r
is the antenna gain at the BS. L
0
and α are the
path loss at a reference unit distance and path loss exponent,
respectively. P
w
is the power of additive white Gaussian noise.
When vehicles choose LTE-V communication in V2R or
V2V modes, collisions may occur due to the spectrum reuse
between communication pairs working in these modes. In such
a case, the SINR at receiver r is calculated as
γ
v,r
= (P
tx,v
/L
0
d
α
v,r
)/(P
w
+
X
j∈V
P
tx,v
/L
0
d
α
j,r
),
(2)
where V is the set of other vehicles that communicate in the
same spectrum within the interference range. The receiver r
can be either an RSU or a relay vehicle.
Let γ
min
be the minimum SINR at a receiver under the
premise that the received data can be decoded. Given a static
network topology and spectrum resource allocation, we can
get the feasible communication pairs whose SINR is no less
than γ
min
. These pairs form the potential way to offload task
files from vehicles to MEC servers.
IV. OPTIMAL OFFLOADING SCHEMES IN A LEARNING
APPROACH
In this section, we formulate an optimal offloading problem,
and then model it as a Markov decision process. Based on
deep Q-learning, an approach that incorporates deep learning
algorithm with Q-functions, joint MEC server selection and
offloading mode determination strategies are obtained.
A. Problem Formulation
In a given time frame, for a vehicle that is located in road
segment e and generates task κ
i
, we use x
i,e
= 1 to indicate
the task offloading to Serv
0
through V2B mode. Similarly, we
use y
i,e,m
= 1 and z
i,e,m
= 1 to indicate the task offloading
to Serv
m
in a V2R mode and in a joint V2V and V2R mode,
respectively. Otherwise, these indicators are set to 0.
The proposed optimal task offloading problem, which maxi-
mizes the utility of the offloading system under the constraints
of task delay, is formulated as follows:
max
{x,y,z}
U =
P
l=1
n
P
j=1
G
P
i=1
β
i
(ς
i
(t
max
i
t
total
i,e
j
,l
) x
l
i,e
j
(q
c
c
c
f
i
/R
v,b,e
j
+ g
i
c
x
/W
0
) y
l
i,e
j
,m
(q
v
c
v
f
i
/R
v,r,e
j
+g
i
c
x
/W
m
) z
l
i,e
j
,m
(
H
e
j
P
h=1
q
v
c
v
f
i
/R
v,j,e
j
+ g
i
c
x
/W
m
))
s.t. x
l
i,e
j
= {0, 1}, y
l
i,e
j
,m
= {0, 1}, z
l
i,e
j
,m
= {0, 1}
x
l
i,e
j
y
l
i,e
j
,m
= x
l
i,e
j
z
l
i,e
j
,m
= y
l
i,e
j
,m
z
l
i,e
j
,m
0
= 0
x
l
i,e
j
+ y
l
i,e
j
,m
+ z
l
i,e
j
,m
= 1
t
total
i,e
j
6 t
max
i
i κ, m, m
0
M
,
(3)
where, n is the number of tasks generated in a time frame.
e
j
is the road segment index of vehicle js location. H
e
j
is
the number of transmission hops. q
c
and q
v
are the amount
of spectrum resources allocated for each task file offloading
through cellular network and LTE-V network, respectively.
R
v,b,e
j
is the transmission rate of offloading task file from
vehicle at road segment e
j
to the BS, which can be given as
R
v,b,e
j
= q
c
log(1+γ
v,b
). R
v,r,e
j
and R
v,j,e
j
can be calculated
similarly based on the allocated spectrum q
v
and SINR γ
v,r
.
In (3), the first three constraints indicate that a task file can
only be transmitted through one mode. The fourth constraint
shows that the time cost for offloading task κ
i
should be
under its delay constraint. Here t
total
i,e
j
is the total time cost
for completing a type i task generated by a vehicle located
at road segment e
j
. Given the task offloading strategies, t
total
i,e
j
can be written as
t
total
i,e
j
= x
i,e
j
(f
i
/R
v,b,e
j
+ t
wait
0
+ g
i
/W
0
)
+y
i,e
j
,m
(f
i
/R
v,r,e
j
+ t
wait
m
+ g
i
/W
m
)
+z
i,e
j
,m
(
H
e
j
X
h=1
f
i
/R
v,j,e
j
+ t
wait
m
+ g
i
/W
m
))
, (4)
where t
wait
0
and t
wait
m
are the waiting time of the task in Serv
0
and Serv
m
, respectively. The value of the waiting time will be
discussed in the following subsection.
B. Markov Decision Approach
As each MEC server is modeled as a queuing system, the
current serving state of a server may affect the time cost for
accomplishing the following tasks. To choose the offloading
target server efficiently, the offloading strategy taken by each
task in time frame l depends on the characteristics of current
vehicle network as well as the server states in frame l 1.
Thus, we can formulate (3) as a Markov decision process, and
solve it in a Markov decision approach [24].
The state of the offloading system at time frame l is defined
as S
l
= (s
l
0
, s
l
1
, ..., s
l
M
), where s
l
0
is the total computation
required by the tasks queuing in Serv
0
at frame l. Similarly,
s
l
1
, ..., s
l
M
denotes the required computation of the tasks queu-
ing in Serv
1
, Serv
2
, ..., Serv
M
at time frame l, respectively.
The actions taken by the control center at frame l can be shown
as a
l
= (X
l
, Y
l
, Z
l
), where X
l
= {x
l
i,e
}, Y
l
= {y
l
i,e,m
}
and Z
l
= {z
l
i,e,m
} are the sets of task offloading strategies
with various transmission modes and offloading targets for the
generated tasks at frame l, respectively.
To facilitate the analysis of the effects brought by the
actions to the system states, we introduce variable ˆc
l
m
, m
{0, 1, · · · , M }, which denotes the amount of computation
taken by Serv
m
in time frame l. We define ˆc
l
m
as
ˆc
l
m
= min(s
l
m
+
G
X
i=1
n
X
j=1
(x
l
i,e
j
+ y
l
i,e
j
,m
+ z
l
i,e
j
,m
)g
i
, W
m
τ).
(5)
Then, the state transitions between time frame l and l + 1
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

5
can be written as
S
l+1
= (s
l
0
+
G
X
i=1
n
X
j=1
x
l
i,e
j
g
i
ˆc
l
0
,
s
l
1
+
G
X
i=1
n
X
j=1
(y
l
i,e
j
,1
+ z
l
i,e
j
,1
)g
i
ˆc
l
1
, ...,
s
l
M
+
G
X
i=1
n
X
j=1
(y
l
i,e
j
,M
+ z
l
i,e
j
,M
)g
i
ˆc
l
M
).
(6)
When action a
l
is taken in state S
l
, the gained average
utility in time frame l is
U
l
=
n
X
j=1
G
X
i=1
β
i
(ς
i
(t
max
i
t
total
i,e
j
,l
) x
l
i,e
j
(q
c
c
c
f
i
/R
v,b,e
j
+g
i
c
x
/W
0
) y
l
i,e
j
,m
(q
v
c
v
f
i
/R
v,r,e
j
+ g
i
c
x
/W
m
)
z
l
i,e
j
,m
(
H
e
j
X
h=1
q
v
c
v
f
i
/R
v,j,e
j
+ g
i
c
x
/W
m
)).
(7)
where t
total
i,e
j
,l
is defined in (4) with t
wait
0
= s
l
0
/W
0
and t
wait
m
=
s
l
m
/W
m
.
In order to maximize the utility of the offloading system,
we need to obtain an optimal strategy π
, which consists of
offloading actions for various tasks in different time frames.
π
can now be expressed as
π
= arg max
π
E(
X
l=1
η
l
U
l
),
(8)
where η is a discount factor that trades off the immediate
utility and the later ones, 0 < η 1.
C. Deep Q-learning Based Offloading Scheme
To derive the optimal offloading strategy π
, we turn to
reinforcement learning technology. Reinforcement learning is
a main branch of machine learning, where agents take series
of actions that maximize the discounted future reward with
corresponding strategies in various states. Thus, a Markov
decision process can be considered as a reinforcement learning
problem. Under a given offloading strategy π, the gained
average system utility from taking action a
l
in state S
l
can
be expressed as a Q-function, which is shown as
Q
π
(S
l
, a
l
) = E[U
l
+ ηU
l+1
+ η
2
U
l+2
+ · · · |S
l
, a
l
]
= E
S
l+1
[U
l
+ ηQ
π
(S
l+1
, a
l+1
)|S
l
, a
l
].
(9)
Then the optimal value of the Q-function will be
Q
(S
l
, a
l
) = E
S
l+1
[U
l
+ η max
a
l+1
Q
(S
l+1
, a
l+1
)|S
l
, a
l
],
(10)
where the maximum utility as well as the optimal offloading
strategies can be derived by value and strategy iteration.
Q-learning, which is a classical algorithm of reinforcement
learning technologies, can be used in modifying the iterations.
In each iteration, the value of Q-function in the learning
process is updated as
Q(S
l
, a
l
) Q(S
l
, a
l
) + α[U
l
+ η max
a
l+1
Q
(S
l+1
, a
l+1
)
Q(S
l
, a
l
)]
,
(11)
where α is the learning rate.
However, the states of the offloading system consist of
the amount of required computation queuing in the MEC
servers, whose value is continuous. It is hard to find the
optimal solution through discretizing the state space. Thus, Q-
learning approach cannot be directly implemented in solving
our proposed Markov decision problem. To address this issue,
we turn Q-function into a function approximator, which is a
function form easy to be handled in optimal action acquisition
process. Here we choose a multi-layered neural network as
a nonlinear approximator that is able to capture complex
interaction among various states and actions. Based on the
Q-function estimation, we utilize deep Q-learning technology
to obtain the optimal offloading strategies π
[25].
We refer to the proposed neural network based approximator
as Q-network, where θ is the set of parameters of the network.
With the help of Q-network, the Q-function in (9) can be
estimated as Q(S
l
, a
l
) Q
0
(S
l
, a
l
; θ) [26]. Q
0
is trained to
converge to real Q values over iterations. Based on Q
0
, the
optimal offloading strategies in each state is derived from the
actions that lead to the maximum utility. The chosen action at
frame l can now be written as a
l
= arg max
a
l
Q
0
(S
l
, a
l
; θ).
In the learning process, experience replay technique is
utilized to improve the learning efficiency, where the learning
experience at each time frame is stored in a replay memory
[25]. The experience consists of observed state transitions as
well as gained utilities led by actions. The experience gained
at time frame l is expressed as (S
l
, a
l
, U
l
, S
l+1
). During Q-
learning updates, a batch of stored experience drawn randomly
from the replay memory is used as samples in training the
parameters of Q-network. The goal of the training is to
minimize the difference between Q(S
l
, a
l
) and Q
0
(S
l
, a
l
; θ).
We define a loss function to denote the difference as
Loss(θ
l
) = E[
1
2
(Q
l
tar
Q
0
(S
l
, a
l
; θ
l
))
2
],
(12)
where θ
l
is the parameters of Q-network at time l. Q
l
tar
is
a learning target, which denotes the optimal value of the Q-
function in frame l and can be shown as
Q
l
tar
= U
l
+ ηQ(S
l
, arg max
a
l+1
Q
0
(S
l+1
, a
l+1
; θ
l
)).
(13)
We deploy a gradient descent approach to modify θ. The
gradient derived through differentiating Loss(θ
l
) is calculated
as
θ
l
Loss(θ
l
) = E[
θ
l
Q
0
(S
l
, a
l
; θ
l
)(Q
0
(S
l
, a
l
; θ
l
) Q
l
tar
)].
(14)
Then θ
l
is updated according to
θ
l
θ
l
$
θ
l
Loss(θ
l
),
(15)
where $ is a scalar step size.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

Citations
More filters
Journal ArticleDOI

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

TL;DR: By consolidating information scattered across the communication, networking, and DL areas, this survey can help readers to understand the connections between enabling technologies while promoting further discussions on the fusion of edge intelligence and intelligent edge, i.e., Edge DL.
Journal ArticleDOI

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

TL;DR: In this paper, a survey on the relationship between edge intelligence and intelligent edge computing is presented, and the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework, challenges and future trends of more pervasive and fine-grained intelligence.
Journal ArticleDOI

Blockchain Empowered Asynchronous Federated Learning for Secure Data Sharing in Internet of Vehicles

TL;DR: A new architecture based on federated learning to relieve transmission load and address privacy concerns of providers is proposed and the reliability of shared data is also guaranteed by integrating learned models into blockchain and executing a two-stage verification.
Journal ArticleDOI

Deep Reinforcement Learning for Offloading and Resource Allocation in Vehicle Edge Computing and Networks

TL;DR: This paper explores a vehicle edge computing network architecture in which the vehicles can act as the mobile edge servers to provide computation services for nearby UEs and proposes as vehicle-assisted offloading scheme for UEs while considering the delay of the computation task.
Journal ArticleDOI

Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence

TL;DR: This article proposes an efficient and privacy-enhanced federated learning (PEFL) scheme for IAI that is noninteractive, and can prevent private data from being leaked even if multiple entities collude with each other.
References
More filters
Posted Content

Playing Atari with Deep Reinforcement Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Deep reinforcement learning with double Q-learning

TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.
Proceedings Article

Deep reinforcement learning with double Q-Learning

TL;DR: In this paper, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.
Journal ArticleDOI

The Future of Industrial Communication: Automation Networks in the Era of the Internet of Things and Industry 4.0

TL;DR: The impact of IoT and CPSs on industrial automation from an industry 4.0 perspective is reviewed, a survey of the current state of work on Ethernet time-sensitive networking (TSN) is given, and the need for harmonization beyond networking is pointed out.
Journal ArticleDOI

Vehicular Fog Computing: A Viewpoint of Vehicles as the Infrastructures

TL;DR: An interesting relationship among the communication capability, connectivity, and mobility of vehicles is unveiled, and the characteristics about the pattern of parking behavior are found, which benefits from the understanding of utilizing the vehicular resources.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Deep learning empowered task offloading for mobile edge computing in urban informatics" ?

To cope with this challenge, the authors adopt a deep Q-learning approach for designing optimal offloading schemes, jointly considering selection of target server and determination of data transmission mode. Furthermore, the authors propose an efficient redundant offloading algorithm to improve task offloading reliability in the case of vehicular data transmission failure. The authors evaluate the proposed schemes based on real traffic data. 

As higher traffic density leads to worse vehicular communication performance, more redundant transmission and corresponding communication cost are required to ensure the offloading reliability. 

due to the adaptive number of redundant path as well as the cooperation of multiple transmission modes, which help reduce the offloading cost, their scheme still gets higher utility. 

Due to the interference between vehicle communicationpairs, in the scenario with high traffic density, too much redundant transmission may further aggravate the interference, and worsen offloading reliability. 

In order to maximize the utility of the offloading system, the authors need to obtain an optimal strategy π∗, which consists of offloading actions for various tasks in different time frames. 

The reason is that in high traffic density scenario, fixed number offloading scheme results in significant interference in vehicular communication, which reduces data delivery rate while increasing offloading delay. 

the authors focus on reliable offloading in presence of task transmission failure, and propose an adaptive redundant offloading algorithm to ensure offloading reliability while improving system utility. 

In [10], the authors unveiled underutilized vehicular computing resources, and put them into use for providing efficient computational support to MEC servers. 

Although the greedy algorithm jointly optimizes file transmission path and MEC server selection in current frame, it ignores the follow-up effects. 

The costs for using a unit spectrum of the cellular network and for that belonging to the vehicular network in a unit time are cc and cv , respectively. 

It can be seen that adopting their proposed offloading scheme, the proportion of task transmission in V2B mode becomes higher as Pg increases. 

To choose the offloading target server efficiently, the offloading strategy taken by each task in time frame l depends on the characteristics of current vehicle network as well as the server states in frame l − 1. 

It is noteworthy that when ρ is above 0.08, the utility of the scheme with fixed number of redundant paths is lower than that of the offloading scheme without any redundant transmission. 

Offloading a large number of generated tasks through vehicular communication may bring serious interference and impair offloading efficiency. 

due to the resource constraints of Serv0, this approach can not continuously improve offloading utility when the number of generated tasks is high. 

From this figure, the authors see that the learning process takes about 8000 time frames to reach the optimal offloading strategies with different vehicle density ρ. 

The reliable offloading scheme that prevents offloading failure through optimal redundant transmission is illustrated in Algorithm 2. 

The average number of tasks generated on vehicles located in one road segment at the same time frame will beN̄task = ∞∑ k=1 Pgk(ρe) k exp(−ρe)/k!. (28)According to their proposed deep Q-learning based task offloading scheme, the tasks of the same type and generated in the vehicles at the same road segment are offloaded in an identical approach.