Reinforcement Learning driven Energy Efficient Mobile Communication and Applications

doi:10.1109/ISSPIT47144.2019.9001888

Asad, S. M., Ozturk, M., Rais, R. N. B., Zoha, A., Hussain, S. , Abbasi, Q. H. and Imran,

M. A. (2019) Reinforcement Learning Driven Energy Efficient Mobile Communication

and Applications. In: 2019 IEEE International Symposium on Signal Processing and

Information Technology (ISSPIT), Ajman, United Arab Emirates, 10-12 Dec 2019,

ISBN 9781728153414.

There may be differences between this version and the published version. You are

advised to consult the publisher’s version if you wish to cite from it.

http://eprints.gla.ac.uk/202348/

Deposited on: 4 November 2019

Enlighten – Research publications by members of the University of Glasgow

http://eprints.gla.ac.uk

Reinforcement Learning driven Energy Efﬁcient

Mobile Communication and Applications

Syed Muhammad Asad

⇤

, Metin Ozturk

⇤

, Rao Naveed Bin Rais

†

, Ahmed Zoha

⇤

, Sajjad Hussain

⇤

Qammer H. Abbasi

⇤

, Muhammad Ali Imran

⇤

James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK

s.asad.1@research@gla.ac.uk, m.ozturk.1@research.gla.ac.uk,

{Ahmed.Zoha, Sajjad. Hussain, Qammer.Abbasi, Muhammad.Imran}@glasgow.ac.uk,

†

Electrical and Computer Engineering, Ajman University, UAE

r.rais@ajman.ac.ae

Abstract—Smart city planning is envisaged as advance tech-

nology based independent and autonomous environment enabled

by optimal utilisation of resources to meet the short and long run

needs of its citizens. It is therefore, preeminent area of research

to improve the energy consumption as a potential solution in

multi-tier 5G Heterogeneous Networks (HetNets). This article

predominantly focuses on energy consumption coupled with CO

2

emissions in cellular networks in the context of smart cities.

We use Reinforcement Learning (RL) vertical trafﬁc ofﬂoading

algorithm to optimize energy consumption in Base Stations (BSs)

and to reduce carbon footprint by applying widely accepted

strategy of cell switching and trafﬁc ofﬂoading. The algorithm

relies on a macro cell and multiple small cells trafﬁc load

information to determine the cell ofﬂoading strategy in most

energy efﬁcient way while maintaining quality of service demands

and fulﬁlling users applications. Spatio-temporal simulations are

performed to determine a cell switch on/off operation and ofﬂoad

strategy using varying trafﬁc conditions in control data separated

architecture. The simulation results of the proposed scheme prove

to achieve reasonable percentage of energy and CO

2

reduction.

Index Terms—Smart City Planning, Green Communications,

Energy Efﬁciency, Vertical Ofﬂoading, Machine Learning, 5G.

I. INTRODUCTION

Mobile Communication is responsible for 2% of global CO

2

emissions with the potential to increase to approximately 4%

by 2020 [1], [2] where data is in high demands likely to

increase manifold. This would result in potential rise in energy

consumption. To mitigate the impacts on the environment with

such increased energy consumption, cell switching and trafﬁc

ofﬂoading is required in an effective manner which would have

direct impact on overall operational expenditure, cell power

and energy consumptions, and CO

2

emissions.

Nowadays, with the increased demands of mobile com-

munications and its applications that lead to a number of

mobile subscribers continue to grow with high data trafﬁc

demands. The problem is manifold by the limited amount

of available resources in cellular networks [3]. Therefore,

traditional Macro Base Stations (MBSs) encounter several

challenges to offer high data rates in highly dense environment.

As this has been brought into various discussions that a MBS

has limited mobile network channels offered by regulatory

authority to transmit on a limited scale to serve number of

users [4]. Similarly, with the increase in the deployment of

Small Cells (SCs), energy consumption dramatically increases

which brings challenge to mobile network operators when

dimensioning their network in order to control cost and support

smart city planning and green communications agenda

1

. These

challenges lead to a conclusion discussed in many literatures

such as [5], that traditional Macro Cells (MCs) with large

coverage footprints would be broken into multiple SCs.

A logical separation between MBS and Data Base Stations

(DBSs) is determined by control data separated architecture

(CDSA) where control and data planes are separated [6]. The

key concept behind this approach is to separate signalling

function required to ensure coverage from those needed to

support high data rate transmissions and to take the advantage

of spatial reuse. In this Radio Access Network (RAN) archi-

tecture, MBS are dedicated to provide signalling and support

efﬁcient Radio Resource Control (RRC) procedures whereas

DBSs are responsible for high data rate transmissions. The

proposed approach provides stringent measure to meet high

data trafﬁc demands and maintain Quality of Service (QoS)

within set boundaries of regulatory authorities. Such architec-

ture is heralded as most promising way to increase coverage

and capacity in efﬁcient manner as deﬁned in [6]. However,

withe the growing number of BSs has a direct impact on

increased energy consumption and CO

2

emissions.

In order to maintain QoS, there are three ofﬂoading schemes

discussed in the literature which are vertical, horizontal and

joint trafﬁc ofﬂoading [7], [8]. Vertical trafﬁc ofﬂoading shifts

the SC load to MC whereas horizontal trafﬁc ofﬂoading ofﬂoad

the SC trafﬁc to a neighbouring SC. In Joint trafﬁc ofﬂoading,

both vertical and horizontal schemes are used. Some literatures

considered use of RL in order to switch cells and ofﬂoad trafﬁc

such as [9].

There are many literature such as [10], [11] which discussed

the concept that Energy Efﬁciency (EE) of the network can

be improved by trafﬁc ofﬂoading and cell on/off switching

method, but none of them calculated the impact of energy on

CO

2

emissions. In this paper, our focus is to determine energy

aware methodology and its impact on CO

2

emissions of the

1

Mayor of London Transport Strategy can be found online at:

https://www.london.gov.uk/sites/default/ﬁles/mayorstransport-strategy-

2018.pdf.



          

  

  

 

  



 



 

 

 

Fig. 1. HetNet Architecture with SCs uniformly distributed around MC.

entire HetNet model which comprises of a MC and multiple

SCs by using RL vertical ofﬂoading method. Finally, we

compare the impact of such approach against overall energy

consumption and reduced CO

2

emissions.

Our major contribution here, is RL based novel cell switch-

ing (CS) scheme dependant on BS static and dynamic load

proﬁles considering live BSs in the dense city environment to

establish carbon footprint reduction associated with BS energy

consumption. Real trafﬁc and user mobility data have been

obtained by Mobile Network Operators (MNOs) in the UK

along with location of operational SCs in the city of London

to verify the proposed approach.

II. SYSTEM MODEL

A. HetNet Architecture

An approach to densify the network where multiple SCs are

deployed under one MC footprint has been proven an effective

method to improve capacity. A holistic view on ultra-dense

SC and HetNets is presented in [11]. This results, with the

small coverage radius compared to conventional MC where

SCs transmission power is reduced which eventually enhances

capacity, reduces cost and improves EE of the network. In

order to analyse HetNet energy performance and its impact

on CO

2

emissions, a multi-tier cellular network comprises of

a MC and multiple SCs that are surrounded by MC under

its coverage foot print is shown in Fig. 1. However, with

the discussed approach, several technical challenges start to

occur which includes unpremeditated deployment, intercell

interference, non-seamless handovers, back-haul overload and

inefﬁcient energy consumption.

Our main goal, as a ﬁrst step, is to design a wireless

network to derive overall energy consumption, therefore two-

tier HetNet model is considered. The MC is used to provide

low data rate services, continuous coverage and signalling in

its footprint. Whereas, the SCs are responsible to provide high

capacity data rates serving their users within their coverage

footprint. All SCs are connected to MC by a back-haul link.

TABLE I

POW E R CO N S U M PT I ON OF A TY P I C A L BS

Equipment Abbreviation Value

Power ampliﬁer (MIMO) BS

amp

600W

Power ampliﬁer efﬁciency PA

eff

10%

Antenna input power (MIMO) A

i

40W

Transceiver Pr

t

100W

Digital signal processor Pr

d

100W

Signal generator Pr

g

400W

AC-DC converter Pr

c

100W

Back-haul link Pr

l

100W

Others Pr

o

100W

RL algorithm driven by vertical ofﬂoading, monitors low

trafﬁc activity where it switches off the lightly loaded SCs and

ofﬂoad its trafﬁc to MC. Vertical ofﬂoading, is a technique

to provide continuous service across all SCs within HetNet

where user does not experience any transference of services

during the ofﬂoading procedure. In order to reduce energy,

vertical ofﬂoading plays a vital role when users are seamlessly

migrated to MC. Overlapping between the SCs can happen

provided the total sum of their areas do not exceed the MC

coverage radius. Finally, CO

2

emissions are analysed for the

proposed cell switching and trafﬁc ofﬂoading approach.

B. Energy Consumption Model

For wireless network performance evaluation, the broadly

accepted state of the art is to analyse components of RAN

at system level. There are multiple components in a typical

BS that contributes to certain level of power consumption

depend on trafﬁc load proﬁles. These components include,

power ampliﬁers, back-haul links, ampliﬁer efﬁciency, signal

processing and generation, air conditioning and others. The

power consumption of typical BS components is summarised

in Table I.

In order to determine the total power consumption by a

typical BS with all of its components is:

BS

tot

=

S

h

(A

Tx

BS

amp

)+Pr

t

+ Pr

d

+

Pr

g

+ Pr

c

+ Pr

o

i

+ Pr

l

+ Pr

a

,

(1)

BS

amp

=

A

i

PA

eff

, (2)

where S is the number of sectors in a cell, A

Tx

is the number

of antennas transmitting per sector. The power consumption of

a typical BS components are represented by; power ampliﬁer

as BS

amp

, transceiver as Pr

t

, digital signal processor as Pr

d

,

signal generator as Pr

g

, AC-DC converter as Pr

c

, back-haul

link as Pr

l

, air conditioning as Pr

a

respectively. There may

be other components which contribute to the total BS power

consumption are termed as Pr

o

. We can calculate the total

power being consumed by HetNet as:

P

HetNet

= P

mc

+

P

K

k=2

P

k

sc

,

(3)

where P

HetNet

is total HetNet power consumption, P

mc

and P

k

sc

are the power consumptions of MC and K-th SCs respectively.

The total power consumption of a MC would be expressed as:

P

mc

= BS

mc

tot

+ 

mc

P

mc

tx

, (4)

where P

mc

is the total power consumption of a MC, BS

mc

tot

is the power calculated in (1) for MC, 

mc

is the component

which has dependency on load proﬁle of MC and P

sc

tx

is the

load of the MC per 15 minutes. Similarly, the total power

consumption of a SC can be calculated as:

P

k

sc

= BS

sc

tot

+ 

sc

P

sc

tx

, (5)

where P

k

sc

denotes the total SC power consumption, k =

{2, 3, 4,..,n}, is the number of SCs surrounded by a MC,

BS

sc

tot

is the power calculated in (1) for each of the individual

SCs, 

sc

is the load dependent component of power consump-

tion of the SC and P

sc

tx

represents the load of the SC per 15

minutes. Therefore, from (3), the total energy consumption

E

HetNet

for each time interval t would be determined.

In order to assess entire network EE performance for each

time interval, a ratio of expected capacity consumed by HetNet

C

m

to the maximum HetNet power consumption P

HetNet

needs

to be calculated with the following:

EE

tot

=

C

m

P

HetNet

, (6)

C. Cell Load

There is rich literature already been presented in many

papers on Handover (HO) decision algorithms for small cells,

e.g. [12] that incorporates several radio parameters such as

channel capacity, signal strength, signal quality, speed, and

transmit power. As we know Shannon capacity is expressed

by C = BW.log(1 + SIN R) where C represents capacity

associated with the channel bandwidth B and Signal to Inter-

ference and Noise Ratio (SINR), we propose TO based on cell

load proﬁle associated with expected capacity of MC E(C

mc

)

and SCs E(C

sc

) every 15 minutes over 24 hours duration t.

C

m

=lim

t!24

E(C

mc

)t

mc

+ E(C

sc

)t

sc

t

, (7)

where C

m

would be measured capacity, t

mc

and t

sc

represent

the time of user association with MC and SC. In order to

calculate expected capacities of MC and SC, we have (8), (9)

where x denotes SINR of the BSs in the HetNet:

E(C

mc

)=BW

Z

n

0

log(1 + x) dx, (8)

E(C

sc

)=BW

N

X

i=2

Z

n

0

log(1 + x

i

) dx, (9)

Therefore, the cell load CL of MC and SCs is the ratio of

measured C

m

to C

max

maximum capacity which is represented

as CL(%) = C

m

/C

max

. Thus, by normalising CL, load factor

⇢

i

is achieved. From the following equation, we can calculate

transmitted power P

tx

as:

P

tx

= ⇢

i

P

max

, (10)

where P

max

is the maximum power output power of a BS and

i = {1, 15, 30,...,n} represented in minutes.

States of SCs when they switch on/off depend on the

number of factors such as distance of a user from associated

SC, user’s movement out of the SC’s radius range, load of the

SC and time of the day (peak and off-peak hours). This can

be represented as:

(



0

,t>T

th

,⇢< ⇢

th

,d

i

>R

i

,



1

,t<T

th

,⇢> ⇢

th

,d

i

<R

i

,

(11)

where, 

0

and 

1

denotes the two states at which a SC is off

and on, t is the time when SC is switch on/off depends on

the threshold time T

th

, ⇢ is the load factor of a SC when SC

decides to switch on/off depending on threshold value of the

load proﬁle represented as ⇢

th

, d

i

is distance of a user from

BS and Ri represents BS radius.

D. Carbon Emissions

Use of carbon footprint (CO

2

emissions) is based on total

energy of HetNet and can be calculated with the help of

conversion factor described in [1], [13]. Therefore, from (3)

we have;



CO

2

=

Z

T

0

E

HetNet

(

P

mc

t

,

P

sc

t

) dt, (12)

where 

CO

2

is carbon footprint associated with total energy

consumption E

HetNet

, refers to emissions per unit/conversion

factor and t represents the time duration in which E

HetNet

has

been calculated.

III. PRO POSE D METHODOLOGY

Reinforcement learning driven vertical ofﬂoading method

proposed in this work uses Q-Learning (QL) algorithm for

sequential decision making variant on cell load conditions.

We analysed the maximum ratio of under-loaded SCs in the

time domain where users within the lightly loaded SCs are

only ofﬂoaded to other MC called vertical ofﬂoading. Due

to the low transmit powers of SCs in horizontal ofﬂoading

and have limitations to a certain range, horizontal ofﬂoading

can not always be realised between SCs. Therefore, for some

SCs to go into the sleep mode when their neighbouring SCs

are not in proximity, vertical ofﬂoading becomes the only

choice. Action to ofﬂoad trafﬁc is taken when agent’s collected

information triggers under-loading situation. This action would

be rewarded or penalised based on particular conditional state

in a given time period.

QL algorithm is a form of RL which is model-free. In other

words, it a method of asynchronous dynamic programming

where it provides agents with the opportunity of learning

that ﬁnds an estimate of the optimal action-value function by

experiencing concurrent sequences of actions [14].

Since, RL would be able to handle wide range of tasks

associated with actions, we have chosen RL algorithm where

the MC interacts with the network environment, collects live

user trafﬁc information, compare the information with oper-

ational SCs energy consumption levels and their operational

load through its back-haul connectivity. After learning from

the network environment, MC takes decision whether or which

SC are required to be switched off at a given period of

time when they are either idle or lightly loaded. Hence, RL

would be able to tackle with the challenging environment

because it can adapt to changing needs driven by actions

through continuous learning. QL algorithm has also proven

capability of interacting in dynamic environments [10] with

the six main components as (i) agent, (ii) environment, (iii)

action, (iv) state, (v) reward/penalty, and (vi) action-value

table. Agent’s actions are environment dependant to maximise

the reward or minimize the penalty. After the execution of

each agent’s action, resulting state and reward/penalty are

evaluated. Following rule is applied once all the executions

are completed:

Q(s

t

,a

t

):=Q(s

t

,a

t

)+



⇥



t+1

+  min

a

(Q(s

t+1

,a))  Q(s

t

,a

t

)

⇤

, (13)

where s

t

and s

t+1

are the current and next states,  is a

discount factor, 

t+1

is the expected penalty for the next

step and a

t

is the action taken after MC learned from the

environment, a is the set of all possible actions and  is the

learning rate.

QL is an off-policy and model-free algorithm which follows

different policies in determining the next actions and updates

the action-value table where agent does not have knowledge

of prior actions being taken in the environment, instead it

take actions to obtain environment information. Due to its

low computational overhead for BS switching QL algorithm

proved to be the most chosen solution [14].

Our work comprises of 1 MC and 9 SCs such that the state

space in the Q table is updated for every action-value pair. In

different time intervals, MC obtains and records varying trafﬁc

condition of SCs in order to make decisions and eventually

select set of SCs that are needed to be switched off.

The MC state space has dependency on availability of

capacity and resources when it performs trafﬁc monitoring,

ofﬂoading and switching. The two possible states, 

1

and 

2

,

are described as follows:

(



1

,M

c

,R

m

<O

c

,



2

,M

c

,R

m

> O

c

,

(14)

where M

c

is the monitoring capacity, R

m

being the resources

available after the capacity has been monitored. O

c

is the

capacity of ofﬂoading. First state 

1

signiﬁes the constraint

when capacity of monitoring and available resources are not

satisﬁed whereas 

2

satisﬁes the case. Now the total power

consumption  of the network in (3) can be represented as:

(a)=P

HetNet

(P

mc

,P

k

sc

). (15)

Therefore, based on power consumption of the BS(s), total

energy consumption is calculated in each time interval. Finally,

the use of CO

2

emissions would be determined by using (12).

IV. PERFORMANCE EVA L UAT I O N

A. Data Set

This section describes the distribution of users within each

cell (either MC or SC) that are used to produce expected

capacity over time in HetNet architecture. The number of

active users in each cell varies over time in a day such that they

are distributed over quarter intervals within an hour to form

24-hour duration. More speciﬁcally, in 24-hour duration, we

have modelled 21-hours from 05:00 am to 02:00 am because

of negligible trafﬁc recorded in the remaining hours of night.

The cell load then normalised to produce load factor ⇢

i

in

order to calculate transmitted power P

tx

from (10). Typical

BS static power consumption is calculated with the help of

multiple BS components as mentioned in (1). Therefore, by

using BS static power, BS transmitted power and dependant

load component, total power consumptions of all cells from

(4), (5) are calculated. From 100 iterations, we plotted average

mean of calculated energy consumptions for all cells with the

gain percentages as shown in Fig. 2. The overall EE from (6)

has also been plotted after running 100 iterations and averaging

the values. Measured capacity for each BS in a speciﬁc time

frame (15 minutes intervals) is divided by power consumption

of the associated BS. The plot is shown in Fig. 3 where we

have assumed 50% of the subscribers are heavy data users with

average data rate of 2 Mb/s multiplied by number of users in

each interval. There are many ways to calculate user demands

by adjusting the ratio of low, medium and heavy users.

However, our main focus is on CO

2

emissions, therefore we

have shown EE graphs to determine the relation of overall EE

with our proposed methods. Finally, CO

2

emissions associated

with energy consumptions are presented in Fig. 4. Simulation

parameters are mentioned in Table II [10], [15].

B. Benchmarking

In addition to the proposed Q-learning based CS approach,

three more techniques are also developed to compare and

assess the performance of the proposed method. Note that the

MC is always on for all the methods that will be explained in

the next paragraphs.

a) All-On: In this CS method, all the SCs are always kept

on, meaning that no switching is implemented. Having this

method in the results is quite important, since it is currently

the case for the majority of the networks. Even though this

method does not offer any saving in power consumption and/or

CO

2

emission, it does not suffer from reduced quality of

service (QoS) given that all the users are kept connected with

their best serving BS due to the fact that there is no switching

and ofﬂoading.

Reinforcement Learning driven Energy Efficient Mobile Communication and Applications

Summary (2 min read)

I. INTRODUCTION

A. HetNet Architecture

B. Energy Consumption Model

III. PROPOSED METHODOLOGY

A. Data Set

B. Benchmarking

C. Metrics

Citations

Cites background from "Reinforcement Learning driven Energ..."

Cites background from "Reinforcement Learning driven Energ..."

Cites background from "Reinforcement Learning driven Energ..."

References

"Reinforcement Learning driven Energ..." refers methods in this paper

"Reinforcement Learning driven Energ..." refers background or methods in this paper

"Reinforcement Learning driven Energ..." refers background in this paper

Related Papers (5)

Trending Questions (1)