What are the contributions mentioned in the paper "Deep learning empowered task offloading for mobile edge computing in urban informatics" ?

To cope with this challenge, the authors adopt a deep Q-learning approach for designing optimal offloading schemes, jointly considering selection of target server and determination of data transmission mode. Furthermore, the authors propose an efficient redundant offloading algorithm to improve task offloading reliability in the case of vehicular data transmission failure. The authors evaluate the proposed schemes based on real traffic data.

What is the effect of higher traffic density on offloading reliability?

As higher traffic density leads to worse vehicular communication performance, more redundant transmission and corresponding communication cost are required to ensure the offloading reliability.

Why does the scheme get higher utility?

due to the adaptive number of redundant path as well as the cooperation of multiple transmission modes, which help reduce the offloading cost, their scheme still gets higher utility.

Why does the offloading scheme have a higher utility?

Due to the interference between vehicle communicationpairs, in the scenario with high traffic density, too much redundant transmission may further aggravate the interference, and worsen offloading reliability.

Why does fixed number offloading scheme reduce data delivery rate?

The reason is that in high traffic density scenario, fixed number offloading scheme results in significant interference in vehicular communication, which reduces data delivery rate while increasing offloading delay.

Why do the authors propose an adaptive redundant offloading algorithm?

the authors focus on reliable offloading in presence of task transmission failure, and propose an adaptive redundant offloading algorithm to ensure offloading reliability while improving system utility.

What is the reason why the greedy algorithm is ignored?

Although the greedy algorithm jointly optimizes file transmission path and MEC server selection in current frame, it ignores the follow-up effects.

What is the effect of the proposed offloading scheme?

It can be seen that adopting their proposed offloading scheme, the proportion of task transmission in V2B mode becomes higher as Pg increases.

What is the utility of the offloading scheme with fixed number of redundant paths?

It is noteworthy that when ρ is above 0.08, the utility of the scheme with fixed number of redundant paths is lower than that of the offloading scheme without any redundant transmission.

What is the effect of offloading a large number of generated tasks?

Offloading a large number of generated tasks through vehicular communication may bring serious interference and impair offloading efficiency.

Why does the offloading scheme have a low utility?

due to the resource constraints of Serv0, this approach can not continuously improve offloading utility when the number of generated tasks is high.

How many time frames does it take to learn the optimal offloading strategy?

From this figure, the authors see that the learning process takes about 8000 time frames to reach the optimal offloading strategies with different vehicle density ρ.

What is the simplest way to get the utility of offloading?

The reliable offloading scheme that prevents offloading failure through optimal redundant transmission is illustrated in Algorithm 2.

What is the average number of tasks generated on vehicles located in one road segment at the same time?

The average number of tasks generated on vehicles located in one road segment at the same time frame will beN̄task = ∞∑ k=1 Pgk(ρe) k exp(−ρe)/k!. (28)According to their proposed deep Q-learning based task offloading scheme, the tasks of the same type and generated in the vehicles at the same road segment are offloaded in an identical approach.

(Open Access) Deep Learning Empowered Task Offloading for Mobile Edge Computing in Urban Informatics (2019) | Ke Zhang

Q: what is the way to maximize the utility of the offloading system?

In order to maximize the utility of the offloading system, the authors need to obtain an optimal strategy π∗, which consists of offloading actions for various tasks in different time frames.

Q: In what study did the authors unveiled underutilized vehicular computing resources?

In [10], the authors unveiled underutilized vehicular computing resources, and put them into use for providing efficient computational support to MEC servers.

Deep Learning Empowered Task Ofﬂoading for

Mobile Edge Computing in Urban Informatics

Ke Zhang, Yongxu Zhu, Member, IEEE, Supeng Leng, Member, IEEE, Yejun He, Senior Member, IEEE,

Sabita Maharjan, Member, IEEE, and Yan Zhang, Senior Member, IEEE

Abstract—Led by industrialization of smart cities, numer-

ous interconnected mobile devices and novel applications have

emerged in the urban environment, providing great opportunities

to realize industrial automation. In this context, autonomous

driving is an attractive issue, which leverages large amounts of

sensory information for smart navigation while posing intensive

computation demands on resource constrained vehicles. Mobile

Edge Computing (MEC) is a potential solution to alleviate the

heavy burden on the devices. However, varying states of multiple

edge servers as well as a variety of vehicular ofﬂoading modes

make efﬁcient task ofﬂoading a challenge. To cope with this

challenge, we adopt a deep Q-learning approach for designing

optimal ofﬂoading schemes, jointly considering selection of target

server and determination of data transmission mode. Further-

more, we propose an efﬁcient redundant ofﬂoading algorithm to

improve task ofﬂoading reliability in the case of vehicular data

transmission failure. We evaluate the proposed schemes based on

real trafﬁc data. Results indicate that our ofﬂoading schemes have

great advantages in optimizing system utilities and improving

ofﬂoading reliability.

Index Terms—Ofﬂoading, Q-learning, reliability, vehicular

edge computing.

I. INTRODUCTION

Along with the advancement of Internet of Things (IoT) in

industrial application scenarios, urban life pattern is undergo-

ing a tremendous change [1]. A high level of interconnections

between heterogenous smart devices bring in the possibility

to enable industrial automation, which improves operation and

increases productivity with less or without human work [2].

Autonomous driving is one of the most attractive indus-

trial automation applications. With large number of on-board

sensors and actuators as well as advanced control systems, au-

tonomous vehicles are capable of interpreting sensory informa-

tion and identifying appropriate navigation paths. In addition,

with the aid of information infrastructure in urban area and the

introduction of Intelligent Transportation System (ITS), smart

vehicles facilitate us with a pervasive and promising platform

to realize a broad range of novel mobile applications, such as

K. Zhang and S. Leng are with the School of Information and Communica-

tion Engineering, University of Electronic Science and Technology of China

(e-mail:{zhangke, spleng}@uestc.edu.cn).

Y. Zhu is with Wolfson School of Mechanical, Electrical and Manufacturing

Engineering, Loughborough University, U.K. (e-mail: y.zhu4@lboro.ac.uk).

Y. He is with College of Electronics and Information Engineering, Shenzhen

University, China (e-mail: heyejun@126.com).

S. Maharjan is with Simula Metropolitan Center for Digital Engineering,

and University of Osloy, Norway (e-mail: sabita@simula.no).

Y. Zhang is with Department of Informatics, University of Oslo, Norway.

He is also with Simula Metropolitan Center for Digital Engineering, Norway

(e-mail: yanzhang@ieee.org).

Corresponding author: Y. Zhang (e-mail: yanzhang@ieee.org).

Fig. 1. MEC-enabled intelligent trafﬁc applications.

augmented reality, natural language processing and interactive

gaming [3]. However, the process of understanding highly

dynamic and complex trafﬁc environment while making real-

time driving decisions involves processing a great volume of

sensory data and requires intensive computations. Due to the

constraint of on-board computation power, supporting these

real-time and computationally intensive tasks and applications

on vehicles is a big challenge.

Mobile Edge Computing (MEC), where heavy computation

tasks are ofﬂoaded to the cloud resources placed at the edge of

mobile networks, has emerged as a promising approach to cope

with growing computing demands [4]. Fig. 1 represents a typ-

ical scenario of applying MEC technique for intelligent trafﬁc

applications. Aided with MEC technique, safety-oriented tasks

in the context of autonomous driving, computation-intensive

vehicular applications and trafﬁc sensory data analysis can be

ofﬂoaded from vehicles to MEC servers on various types of

proximate wireless access infrastructure, where real-time data

processing and feedback can be achieved [5].

In MEC process, the implementing efﬁciency is tightly

coupled with the scheduling and management of computing

and communication resources [6]. With the harmonization of

the global deployment of Long Term Evolution (LTE) systems,

LTE-V is a paradigm in vehicular communications and plays a

vital role in the design of task ofﬂoading schemes. However,

inherent characteristics of vehicular networks, such as high

speed mobility, time-varying topology and ephemeral interac-

tions, bring unprecedented challenges in managing vehicular

communication applied with MEC applications. In the context

of vehicular networks with multiple MEC servers, the mutual

effects between transmission mode selection and ofﬂoading

target server determination make MEC scheduling even more

complex. Furthermore, transmission may fail during vehicular

ofﬂoading process, which limits the performance and applica-

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191

bility of MEC services. Thus, novel solutions are necessary

to address this issue in order to guarantee both reliability and

efﬁciency of the vehicular task ofﬂoading. However, very few

works have investigated integrating management of computing

and communication resources in a multi-server vehicular edge

computing network, and task ofﬂoading reliability has not been

incorporated in the recent literatures.

To bridge this gap, in this paper, we focus on task of-

ﬂoading in an MEC-enabled vehicular network, and present

an approach to optimize MEC system performance while also

improving ofﬂoading reliability. The main contributions of this

paper are as follows:

• We present an MEC-enabled LTE-V network, where the

inﬂuence of various vehicular communication modes on

the task ofﬂoading performance is qualitatively analyzed.

• By applying deep Q-learning approach, we propose op-

timal target MEC server determination and transmission

mode selection schemes, which maximize the utilities of

ofﬂoading system under given delay constraints.

• To cope with transmission failure in vehicular networks,

we design an efﬁcient redundant ofﬂoading algorithm,

which also ensures ofﬂoading reliability while improving

the gained utilities.

The remainder of the paper is organized as follows. In Sec-

tion II, we review related work. A vehicular edge computing

system model is presented in Section III. A deep Q-learning

based ofﬂoading scheme is described in Section IV. In Section

V, we investigate ofﬂoading reliability. Performance evaluation

is presented in Section VI. Finally, we conclude our work in

Section VII.

II. RELATED WORK

To meet the demands from computationally intensive ve-

hicular applications, some studies have investigated applying

MEC approach in vehicular networks. In [7], the authors

proposed an energy efﬁcient resource allocation for vehicular

fog computing centers. In [8], an MEC-based architecture was

used in urban trafﬁc management in a distributed and adaptive

service manner. In [9], the authors designed a fog vehicular

computing framework that integrates resources from both edge

server and remote cloud. In [10], the authors unveiled underuti-

lized vehicular computing resources, and put them into use for

providing efﬁcient computational support to MEC servers. To

efﬁciently merge MEC technology in vehicular networks, the

authors in [11] introduced a collaborative task ofﬂoading and

output transmission mechanism. In [12], the authors designed

an MEC service migration scheme that ensures vehicles always

connect to the nearest MEC entities. Although these studies

have provided some insights about MEC-enabled vehicular

applications, the effects of vehicular communication on the

design of task ofﬂoading strategies have not been thoroughly

investigated.

Beneﬁting from fast commercialization of LTE system,

LTE-V has been one of the key technologies in vehicular

networks. Several recent works have focused on analytical

models and implementations of LTE-V. In [13], direct vehicle

communication was utilized to ofﬂoad data transmission from

Fig. 2. Task ofﬂoading in an MEC-enabled vehicular network.

vehicles with poor quality link to infrastructures. Taking into

account high mobility of vehicles, the authors in [14] designed

a wireless link formulation mechanism, where beamwidths

between vehicular communication pairs were optimized. In

[15], the authors discussed key building blocks of 5G networks

in the context of vehicular communications. However, joint

V2I and V2V transmission schemes in a multiple MEC server

scenario have not been considered in the previous studies.

Learning is a branch of artiﬁcial intelligence, which studies

systems and acquires knowledge from data. Recently, various

learning techniques have been deployed for scheduling task

ofﬂoading. In [16], the authors proposed an online learning

based workload ofﬂoading scheme in mobile edge computing

systems with renewable power supply. In order to reduce

resource consumption in task ofﬂoading, the authors in [17]

formalized intelligent ofﬂoading metric prediction utilizing

a machine learning based approach. Deep Q-learning is a

powerful tool in policy optimization, and has been utilized

in various process decisions. For instance, the authors in [18]

designed an integrated resource management scheme for con-

nected vehicles using a deep reinforcement learning approach.

The authors in [19] used deep Q-learning in scheduling voltage

and frequency for real-time systems in embedded devices.

In [20], this learning approach was adopted in designing a

video streaming framework. Moreover, deep Q-learning also

can be used in trafﬁc area. To relieve trafﬁc congestion at

highway junctions, the authors in [21] applied deep Q-learning

in trafﬁc simulation study and vehicle pathway optimization.

However, the potential of learning based approaches have

not been explored for designing scheduling algorithms for

vehicular edge computing applications. Furthermore, mobile

characteristics of vehicles and reliability of vehicular task

ofﬂoading have not been considered in the previous studies.

Different from these studies, in this paper we concentrate

on task ofﬂoading in an LTE-V network, and propose optimal

ofﬂoading schemes that jointly schedule vehicular communica-

tion and edge computing through a deep Q-learning approach.

III. SYSTEM MODEL

Fig. 2 shows the architecture of an MEC-enabled vehicular

network in an urban area. The autonomous navigation of

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191

TABLE I

MAIN VARIABLES

Variables Description

G Number of task types

Size of task κ

’s input

Amount of task κ

’s required computation

max

Maximum delay tolerance of task κ

Probability of a task belonging to type-i

ρ Road trafﬁc density

M Number of MEC servers

Computing capacity of MEC server m

Cost of a unit spectrum of cellular network

Cost of a unit spectrum of vehicular network

Probability of generating a task in a time frame

tx,b

Vehicle transmission power in V2B mode

tx,v

Vehicle transmission power in V2V and V2R

modes

, q

Amount of allocated spectrum resources

through cellular network and vehicular net-

work, respectively

the vehicles requires various sensory data processing. More-

over, urban informatic infrastructure-aided mobile applications

may also pose computing requirement on the vehicles. We

model these data processing and computing as computation

tasks. Various tasks may have different characteristics. For

instance, autonomous navigation has strict delay constraints,

while entertainment applications do not impose a critical delay

requirement. According to this consideration, we classify these

tasks into G types. A task is described in four terms as

= {f

, g

, t

max

, ς

}, i ∈ G [22]. Here, f

and g

are the size

of task input data and the amount of required computation,

respectively. To provide timely response in various trafﬁc

context, the tasks are time sensitive. t

max

is the maximum

delay tolerance of task κ

. Ofﬂoading system gets utility ς

∆t

from the completion of task κ

, where ∆t is the saved time

in accomplishing κ

compared to t

max

. The probability of the

tasks belonging to type-i is denoted as β

with

i∈G

= 1.

The road is covered by a heterogeneous vehicular network.

Besides the cellular network provided by a Base Station (BS),

there is an LTE-V network composed of mobile vehicles and

M Road Side Units (RSUs) deployed along the road. The set

of these RSUs is denoted as M. The cellular network and the

vehicular network operate on different and non-overlapping

spectrum. Compared to the BS that has seamless coverage

and high data transmission cost, the RSUs conversely provide

spotty coverage and inexpensive access service. The costs for

using a unit spectrum of the cellular network and for that

belonging to the vehicular network in a unit time are c

and

, respectively. We have c

> c

The BS is equipped with an MEC server through wired

connections, which is denoted as Serv

. In addition, each

RSU hosts an MEC server. These servers are denoted

as Serv

, Serv

,...,Serv

, respectively. The MEC servers

get data from their attaching BS or RSUs directly. Let

, W

, ..., W

} denote the computing capacities of

these servers, respectively. As the server equipped on BS can

serve the vehicles located on the whole road, we consider that

the capacity of Serv

is much higher than that of the servers

deployed on the RSUs. Each MEC server is modeled as a

queuing network, where the input is the ofﬂoading task. The

arrived tasks is ﬁrst cached in an MEC server, and then served

with the ﬁrst-come-ﬁrst-serve policy. A server utilizes all of

its computing resources to execute currently served task. The

cost for tasks using a computing resource in a unit time is c

Each vehicle has a cellular and an LTE-V radio interface,

which work on different spectrum and enable multiple com-

munication paradigms. In the heterogeneous network formed

by overlapping of the BS and the RSUs coverage, vehicles can

ofﬂoad their tasks to MEC servers through multiple modes. We

name the task ﬁle transmission between a vehicle and the BS

as vehicle-to-BS (V2B). When a vehicle turns to the LTE-V

network for task ofﬂoading, the ﬁle can be transmitted to an

MEC server in a mode with joint vehicle-to-vehicle (V2V) and

vehicle-to-RSU (V2R) transmission.

In self-driving vehicles, real-time vehicle trafﬁc information,

such as position, speed, heading directions can be gathered by

vehicular sensors [23]. Furthermore, channel state information

also can be detected by these vehicles. All this information

together with the description of generated vehicular tasks

are transmitted to a control center through cellular networks.

There is spectrum allocated for this information transmission

besides the spectrum used for task ofﬂoading. Based on the

collected information, the control center can utilize communi-

cation resources of the heterogeneous network as well as the

computing resources of MEC servers, and efﬁciently schedule

task ofﬂoading.

The scheduling and resource management are considered

to operate in a discrete time model with ﬁxed length time

frames. The length of a frame is denoted as τ. In each time

frame, a vehicle generates a computing task with probability

. Enabled by advanced LTE technology, we consider that

the duration time of a task ﬁle transmission is within a time

frame. In addition, a task ofﬂoading vehicle can only choose

one transmission mode.

The communication topology between vehicles and in-

frastructure keeps constant during one frame. However, the

topology may change in different time frames due to the

mobility of the vehicles. To facilitate the modeling of the

dynamic relations, we divide the road into E segments. The

position of a vehicle on the road can be denoted by the index

of the segment e, where 1 ≤ e ≤ E. We consider that the

vehicles in the same road segment have an identical distance

to a communication infrastructure.

In assessing the network performance, we focus on the

upstream communication process that ofﬂoads tasks from

vehicles to MEC servers in various modes. We consider

that all vehicles have ﬁxed transmission power for a given

transmission mode, i.e., power P

tx,b

in V2B mode and power

tx,v

in V2R and V2V modes. In addition, these vehicles have

enough storage for caching task ﬁles.

In the case of V2B mode, the assignment of spectrum to

vehicles is orthogonal, and there is no collision between V2B

communication vehicles. For receiving task ﬁle from a V2B

mode vehicle, the signal to noise and interference ratio (SINR)

at the BS is given as

v,b

= P

tx,b

v,b

(1)

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191

where d

v,b

is the distance between the transmitting vehicle and

the BS. G

is the antenna gain at the BS. L

and α are the

path loss at a reference unit distance and path loss exponent,

respectively. P

is the power of additive white Gaussian noise.

When vehicles choose LTE-V communication in V2R or

V2V modes, collisions may occur due to the spectrum reuse

between communication pairs working in these modes. In such

a case, the SINR at receiver r is calculated as

v,r

= (P

tx,v

v,r

)/(P

j∈V

tx,v

j,r

(2)

where V is the set of other vehicles that communicate in the

same spectrum within the interference range. The receiver r

can be either an RSU or a relay vehicle.

Let γ

min

be the minimum SINR at a receiver under the

premise that the received data can be decoded. Given a static

network topology and spectrum resource allocation, we can

get the feasible communication pairs whose SINR is no less

than γ

min

. These pairs form the potential way to ofﬂoad task

ﬁles from vehicles to MEC servers.

IV. OPTIMAL OFFLOADING SCHEMES IN A LEARNING

APPROACH

In this section, we formulate an optimal ofﬂoading problem,

and then model it as a Markov decision process. Based on

deep Q-learning, an approach that incorporates deep learning

algorithm with Q-functions, joint MEC server selection and

ofﬂoading mode determination strategies are obtained.

A. Problem Formulation

In a given time frame, for a vehicle that is located in road

segment e and generates task κ

, we use x

i,e

= 1 to indicate

the task ofﬂoading to Serv

through V2B mode. Similarly, we

use y

i,e,m

= 1 and z

i,e,m

= 1 to indicate the task ofﬂoading

to Serv

in a V2R mode and in a joint V2V and V2R mode,

respectively. Otherwise, these indicators are set to 0.

The proposed optimal task ofﬂoading problem, which maxi-

mizes the utility of the ofﬂoading system under the constraints

of task delay, is formulated as follows:

max

{x,y,z}

U =

∞

l=1

j=1

i=1

(ς

max

− t

total

i,e

) − x

i,e

v,b,e

+ g

) − y

i,e

v,r,e

) − z

i,e

(

h=1

v,j,e

+ g

))

s.t. x

i,e

= {0, 1}, y

i,e

= {0, 1}, z

i,e

= {0, 1}

i,e

= x

i,e

= y

i,e

= 0

i,e

+ y

i,e

+ z

i,e

= 1

total

i,e

6 t

max

i ∈ κ, m, m

∈ M

(3)

where, n is the number of tasks generated in a time frame.

is the road segment index of vehicle j’s location. H

the number of transmission hops. q

and q

are the amount

of spectrum resources allocated for each task ﬁle ofﬂoading

through cellular network and LTE-V network, respectively.

v,b,e

is the transmission rate of ofﬂoading task ﬁle from

vehicle at road segment e

to the BS, which can be given as

v,b,e

= q

log(1+γ

v,b

). R

v,r,e

and R

v,j,e

can be calculated

similarly based on the allocated spectrum q

and SINR γ

v,r

In (3), the ﬁrst three constraints indicate that a task ﬁle can

only be transmitted through one mode. The fourth constraint

shows that the time cost for ofﬂoading task κ

should be

under its delay constraint. Here t

total

i,e

is the total time cost

for completing a type i task generated by a vehicle located

at road segment e

. Given the task ofﬂoading strategies, t

total

i,e

can be written as

total

i,e

= x

i,e

v,b,e

+ t

wait

+ g

)

i,e

v,r,e

+ t

wait

+ g

)

i,e

(

h=1

v,j,e

+ t

wait

+ g

))

, (4)

where t

wait

and t

wait

are the waiting time of the task in Serv

and Serv

, respectively. The value of the waiting time will be

discussed in the following subsection.

B. Markov Decision Approach

As each MEC server is modeled as a queuing system, the

current serving state of a server may affect the time cost for

accomplishing the following tasks. To choose the ofﬂoading

target server efﬁciently, the ofﬂoading strategy taken by each

task in time frame l depends on the characteristics of current

vehicle network as well as the server states in frame l − 1.

Thus, we can formulate (3) as a Markov decision process, and

solve it in a Markov decision approach [24].

The state of the ofﬂoading system at time frame l is deﬁned

as S

= (s

, s

, ..., s

), where s

is the total computation

required by the tasks queuing in Serv

at frame l. Similarly,

, ..., s

denotes the required computation of the tasks queu-

ing in Serv

, Serv

, ..., Serv

at time frame l, respectively.

The actions taken by the control center at frame l can be shown

as a

= (X

, Y

, Z

), where X

= {x

i,e

}, Y

= {y

i,e,m

}

and Z

= {z

i,e,m

} are the sets of task ofﬂoading strategies

with various transmission modes and ofﬂoading targets for the

generated tasks at frame l, respectively.

To facilitate the analysis of the effects brought by the

actions to the system states, we introduce variable ˆc

, m ∈

{0, 1, · · · , M }, which denotes the amount of computation

taken by Serv

in time frame l. We deﬁne ˆc

ˆc

= min(s

i=1

j=1

i,e

+ y

i,e

+ z

i,e

, W

τ).

(5)

Then, the state transitions between time frame l and l + 1

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191

can be written as

l+1

= (s

i=1

j=1

i,e

− ˆc

i=1

j=1

i,e

+ z

i,e

− ˆc

, ...,

i=1

j=1

i,e

+ z

i,e

− ˆc

(6)

When action a

is taken in state S

, the gained average

utility in time frame l is

j=1

i=1

(ς

max

− t

total

i,e

) − x

i,e

v,b,e

) − y

i,e

v,r,e

+ g

)

−z

i,e

(

h=1

v,j,e

+ g

)).

(7)

where t

total

i,e

is deﬁned in (4) with t

wait

= s

and t

wait

In order to maximize the utility of the ofﬂoading system,

we need to obtain an optimal strategy π

∗

, which consists of

ofﬂoading actions for various tasks in different time frames.

∗

can now be expressed as

∗

= arg max

∞

l=1

(8)

where η is a discount factor that trades off the immediate

utility and the later ones, 0 < η ≤ 1.

C. Deep Q-learning Based Ofﬂoading Scheme

To derive the optimal ofﬂoading strategy π

∗

, we turn to

reinforcement learning technology. Reinforcement learning is

a main branch of machine learning, where agents take series

of actions that maximize the discounted future reward with

corresponding strategies in various states. Thus, a Markov

decision process can be considered as a reinforcement learning

problem. Under a given ofﬂoading strategy π, the gained

average system utility from taking action a

in state S

can

be expressed as a Q-function, which is shown as

, a

) = E[U

+ ηU

l+1

+ η

l+2

+ · · · |S

, a

]

= E

l+1

+ ηQ

l+1

, a

l+1

)|S

, a

(9)

Then the optimal value of the Q-function will be

∗

, a

) = E

l+1

+ η max

l+1

∗

l+1

, a

l+1

)|S

, a

(10)

where the maximum utility as well as the optimal ofﬂoading

strategies can be derived by value and strategy iteration.

Q-learning, which is a classical algorithm of reinforcement

learning technologies, can be used in modifying the iterations.

In each iteration, the value of Q-function in the learning

process is updated as

Q(S

, a

) ← Q(S

, a

) + α[U

+ η max

l+1

∗

l+1

, a

l+1

)

−Q(S

, a

)]

(11)

where α is the learning rate.

However, the states of the ofﬂoading system consist of

the amount of required computation queuing in the MEC

servers, whose value is continuous. It is hard to ﬁnd the

optimal solution through discretizing the state space. Thus, Q-

learning approach cannot be directly implemented in solving

our proposed Markov decision problem. To address this issue,

we turn Q-function into a function approximator, which is a

function form easy to be handled in optimal action acquisition

process. Here we choose a multi-layered neural network as

a nonlinear approximator that is able to capture complex

interaction among various states and actions. Based on the

Q-function estimation, we utilize deep Q-learning technology

to obtain the optimal ofﬂoading strategies π

∗

[25].

We refer to the proposed neural network based approximator

as Q-network, where θ is the set of parameters of the network.

With the help of Q-network, the Q-function in (9) can be

estimated as Q(S

, a

) ≈ Q

, a

; θ) [26]. Q

is trained to

converge to real Q values over iterations. Based on Q

, the

optimal ofﬂoading strategies in each state is derived from the

actions that lead to the maximum utility. The chosen action at

frame l can now be written as a

l∗

= arg max

, a

; θ).

In the learning process, experience replay technique is

utilized to improve the learning efﬁciency, where the learning

experience at each time frame is stored in a replay memory

[25]. The experience consists of observed state transitions as

well as gained utilities led by actions. The experience gained

at time frame l is expressed as (S

, a

, U

, S

l+1

). During Q-

learning updates, a batch of stored experience drawn randomly

from the replay memory is used as samples in training the

parameters of Q-network. The goal of the training is to

minimize the difference between Q(S

, a

) and Q

, a

; θ).

We deﬁne a loss function to denote the difference as

Loss(θ

) = E[

tar

− Q

, a

; θ

))

(12)

where θ

is the parameters of Q-network at time l. Q

tar

a learning target, which denotes the optimal value of the Q-

function in frame l and can be shown as

tar

= U

+ ηQ(S

, arg max

l+1

, a

l+1

; θ

)).

(13)

We deploy a gradient descent approach to modify θ. The

gradient derived through differentiating Loss(θ

) is calculated

∇

Loss(θ

) = E[∇

, a

; θ

)(Q

, a

; θ

) − Q

tar

)].

(14)

Then θ

is updated according to

← θ

− $∇

Loss(θ

(15)

where $ is a scalar step size.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191

Deep Learning Empowered Task Offloading for Mobile Edge Computing in Urban Informatics

Figures

Citations

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

Blockchain Empowered Asynchronous Federated Learning for Secure Data Sharing in Internet of Vehicles

Deep Reinforcement Learning for Offloading and Resource Allocation in Vehicle Edge Computing and Networks

Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence

References

Playing Atari with Deep Reinforcement Learning

Deep reinforcement learning with double Q-learning

Deep reinforcement learning with double Q-Learning

The Future of Industrial Communication: Automation Networks in the Era of the Internet of Things and Industry 4.0

Vehicular Fog Computing: A Viewpoint of Vehicles as the Infrastructures

Related Papers (5)

Mobile Edge Computing: A Survey on Architecture and Computation Offloading

Mobile Edge Computing: A Survey

A Survey on Mobile Edge Computing: The Communication Perspective

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "Deep learning empowered task offloading for mobile edge computing in urban informatics" ?

Q2. What is the effect of higher traffic density on offloading reliability?

Q3. Why does the scheme get higher utility?

Q4. Why does the offloading scheme have a higher utility?

Q5. what is the way to maximize the utility of the offloading system?

Q6. Why does fixed number offloading scheme reduce data delivery rate?

Q7. Why do the authors propose an adaptive redundant offloading algorithm?

Q8. In what study did the authors unveiled underutilized vehicular computing resources?

Q9. What is the reason why the greedy algorithm is ignored?

Q10. What is the cost of using a unit spectrum of the cellular network and for that belonging?

Q11. What is the effect of the proposed offloading scheme?

Q12. How can the authors determine the time cost for offloading a task?

Q13. What is the utility of the offloading scheme with fixed number of redundant paths?

Q14. What is the effect of offloading a large number of generated tasks?

Q15. Why does the offloading scheme have a low utility?

Q16. How many time frames does it take to learn the optimal offloading strategy?

Q17. What is the simplest way to get the utility of offloading?

Q18. What is the average number of tasks generated on vehicles located in one road segment at the same time?