scispace - formally typeset
Open AccessJournal ArticleDOI

A Crowdsourcing Framework for On-Device Federated Learning

Reads0
Chats0
TLDR
In this article, the authors proposed a decentralized federated learning (FL) framework that considers the communication efficiency during parameters exchange and formulated a Stackelberg game to find the game's equilibria.
Abstract
Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the number of communications per iteration) while exchanging the model parameters during aggregation. Therefore, a key challenge in FL is how users participate to build a high-quality global model with communication efficiency. We tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, we show an incentive-based interaction between the crowdsourcing platform and the participating client's independent strategies for training a global learning model, where each side maximizes its own benefit. We formulate a two-stage Stackelberg game to analyze such scenario and find the game's equilibria. Second, we formalize an admission control scheme for participating clients to ensure a level of local accuracy. Simulated results demonstrate the efficacy of our proposed solution with up to 22% gain in the offered reward.

read more

Content maybe subject to copyright    Report

arXiv:1911.01046v2 [cs.LG] 3 Feb 2020
1
A Crowdsourcing Framework for On-Device
Federated Learning
Shashi Raj Pandey, Student Member, IEEE, Nguyen H. Tran, Senior Member, IEEE, Mehdi Bennis, S e-
nior Member, IEEE, Yan Kyaw Tun , Aunas Manzoor, and Choong Seon Hong, Senior Member, IEEE
Abstract—Federated learning (FL) rests on th e notion of
training a global model in a decentralized manner. Under this
setting, mobile devices perform computations on their local data
before uploading the required updates to improve the global
model. However, when the participating clients implement an
uncoordinated computation strategy, the difficulty is to handle
the communication efficiency (i.e., the number of communi-
cations per iteration) while exchanging the model parameters
during aggregation. Therefore, a key challenge in FL is how
users participate to build a high-quality global model with
communication efficiency. We tackle this issue by formulating a
utility maximization problem, and propose a novel crowdsourcing
framework to leverage FL that considers the communication effi-
ciency during parameters exchange. First, we show an incentive-
based i nteraction between the crowdsourcing platform and the
participating client’s independent strategies for training a global
learning model, where each side maximizes i ts own benefit. We
formulate a two-stage Stackelberg game to analyze such scenario
and nd the game’s equilibria. Second, we formalize an admission
control scheme for participating clients to ensure a level of
local accuracy. Simulated results demonstrate the efficacy of our
proposed solution with up to 22% gain in t he offered reward.
Index Terms—Decentralized machine learning, f ederated
learning (FL), mobile crowdsourcing, incentive mechanism,
Stackelberg game.
I. INTRODUCTION
A. Background and motivation
Recent years have admittedly witnessed a tr emendous
growth in the use of Machine Learning (ML) technique s and
Manuscript received May 19, 2019; revised September 7, 2019, De-
cember 17, 2019 and January 13, 2020; accepted January 28, 2020. Date
of publication......; date of current version ....This work was supported by
Institute of Information & communications Technology Planning & Evaluation
(IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01287,
Evolvable Deep Learning Model Generation Platform for Edge Computing)
and the National Research Foundation of Korea(NRF) grant funded by the
Korea government (MSIT) (NRF-2017R1A2A2A05000995). A preliminary
version of this work has been presented at IE EE GLOBECOM 2019 [1].
(Corresponding author: Choong Seon Hong.)
Shashi Raj Pandey, Yan Kyaw Tun, Aunas Manzoor, and Choong Seon
Hong are with the Department of Computer Science and Engineering,
Kyung Hee University, Yongin-si, Gyeonggi-do 17104, Rep. of Korea, e-mail:
{shashiraj, ykyawtun7, aunasmanzoor, cshong}@khu.ac.kr.
Nguyen H. Tran is with the School of Computer Science, The University
of Sydney, NSW 2006, Australia, and also with the Department of Computer
Science and Engineering, Kyung Hee University, Seoul 17104, South Korea
(email: nguyen.tran@sydney.edu.au).
Mehdi Bennis is with the Center for Wireless Communications, University
of Oulu, 90014 Oulu, Finland, and also with the Department of Computer
Science and Engineering, Kyung Hee University, Seoul 17104, South Korea
(email: mehdi.bennis@oulu.fi).
its applications in mobile devices. On one hand, according to
International Data Corporation, the shipments of smartp hones
reached 3 billions in 2018 [2], which implies a large crowd
of mobile users generating personalized data via interaction
with mobile applicatio ns, or with the use of in-built sensors
(e.g., cameras, microphones and GPS) exploited efficiently by
mobile crowdsensing paradigm (e.g., for indoor localization,
traffic monitoring, navigation [3], [4], [5], [6]). On the other
hand, mobile d evices are getting empowered extensively with
specialized hardware architectures and computing engines
such as the CPU, GPU and DSP (e.g., energy efficient
Qualcomm Hexagon Vector eXtensions o n Snapdragon 835
[7]) for solving diverse machine learnin g problems. Gartner
predicts that 80 percent of sm artphones will h ave on-device AI
capabilities by 2022. With d edicated chipsets, it will empower
smartphone makers to achieve market gain by offering more
secured facial recogn ition system, the ability to understand
user behaviors and offer pre dictive future [8]. This means on-
device intelligence will be ubiquitous!
In the backdrop to these exciting possibilities with on-
device intelligence, a White House report on principle of data
minimization had been published in 2012 to advocate the
privacy of consumer data [9]. The direct application of this
is the ML technique that leaves the train ing data distributed
on the mobile devices, called Federated Learning [7], [10],
[11], [12], [13]. This te c hnique unleashes a new collabora tive
ecosystem in ML to build a shared learning model while keep-
ing the training data locally on user d evices, w hich complies
with the data minim ization principle and protects user data
privacy. Unlike the conventional approaches of collecting all
the training data in one place to train a learning model, the
mobile users (p articipating clients) perform computation for
the updates on their local trainin g data with th e current global
model parameters, which are then aggregated an d broadcasted
back by the centralized coordinating server. This is an iterative
process that undergoes until an accuracy level of the learning
model is reached. By this way, FL deco uples the training
process to learn a global model by eliminating the mobility of
local tr aining data.
In another report, research organizations estimate that over
90% of the da ta will be stored and processed locally [1 4] (e.g.,
at the network edge), which provides an immense expo sure to
extract the benefits of FL. Also, because of the huge market
potential of the untapped private data, FL is a pro mising tool
to exploit more p e rsonalized service oriented applications.
Local computations at the devices and their communication

2
with the centralized coordinating server are interleaved in a
complex manner to build a global learning model. Therefore,
a communication-e fficient FL f ramework [12], [15] requires
solving several challenges. Furthermore, be cause of limited
data p er device to train a high-quality lea rning mode l, the
difficulty is to incentivize a large number of mobile users
to ensur e cooperatio n. This important aspect in FL has b e en
overlooked so far, where the question is how can we motivate a
number of participating clients, collectively providing a large
number of data samples to enable FL without sharing their
private data? Note th at, b oth par ticipating clients and the
server can benefit from training a global model. However,
to fully reap th e benefits of high-quality updates, the multi-
access edge computing (MEC) server has to incentivize clients
for participation. In particular, under heterogeneous scenarios,
such as an ad a ptive and cognitive-communication network,
client’s participation in FL can spur collabora tion and provide
benefits for ope rators to accelerate an d deliver n etwork-wide
services [16]. Similarly, clients in general are not concerned
with the reliability and scalab ility issues of FL [17]. Therefore,
to incentivize users to participate in the collaborative training,
we require a market place. For this purp ose, we present
a value-based compensation m echanism to the participatin g
clients, such as a bounty (e.g., data discount package), as per
their level of participatio n in the crowdsourcing framework.
This is reflected in terms of local accuracy level, i.e., quality
of solution to the local subproblem, in which the framework
will protect the model from imperfect updates by restricting
the clients trying to compromise the model (for instance, with
skewed data because of its i.i.d nature or data poisoning)
[3]. Moreover, we cast the global loss minimization problem
as a primal-dual optimization problem, instead of adopting
traditional gradient descent learning algo rithm in the federated
learning setting (e.g., FedAvg [15]). This enables in (a) proper
assessment of the quality of the local solution to improve
personalization and fairness amongst the pa rticipating clients
while training a global model, (b ) effective decouplin g of the
local solvers, thereby balancing communication and computa-
tion in the distributed setting.
The goal of this paper is two-fold: First, we formalize an
incentive mechanism to develop a participatory framework for
mobile clients to perf orm FL for imp roving th e global model.
Second, we addr ess the challen ge of maintaining communica-
tion efficiency while exchanging the model parameters with
a number of participating clients during aggregation . Specifi-
cally, communication efficiency in this scenario accounts for
communications per iteration with an arbitrary algorith m to
maintain an acceptable accuracy level for the global mode l.
B. Con tributions
In this work, we design and analyze a novel crowdsourcin g
framework to realize the FL vision. Specifically, o ur contribu-
tions are summarized as follows:
A crowdsourcing framework to enable c ommunication
-efficient FL. We design a crowdsourcing framework, in
which FL participating clients iteratively solve the local
learning subproblems for an accuracy level subject to an
offered incentive. We then establish a c ommun ic ation-
efficient cost model f or the participating clients. We then
formu late an incentive mechanism to induce the n ecessary
interaction between the MEC server and the participating
clients for the FL in Section IV.
Solution approach using Stackelberg game. With the
offered incentive, the participating clients indepe ndently
choose their strategies to solve the loca l subproblem for
a certain accuracy level in order to minimize the ir p artic-
ipation co sts. Correspondingly, the MEC server builds a
high quality ce ntralized model characterized by its utility
function, with the data distributed over the participating
clients by offering the reward. We exp loit this tightly
coupled motives of the participating clients and the MEC
server as a two-stage Stackelberg game. The equ ivalent
optimization problem is characterized as a mixed-boolean
programming which requires an exponen tial co mplexity
effort for finding the solution. We a nalyze the game’s
equilibria and propose a lin e ar complexity algor ithm to
obtain th e optimal solution.
Participant’s response analysis and case study. We
next analyze the r esponse behavior of the particip ating
clients via the solutions of the Stackelberg g ame, and
establish the efficacy of our proposed framework via
case studies. We show that the linear-complexity solu tion
approa c h attains the same p e rformance as the mixed-
boolean programming problem . Furthermore, we show
that our mechanism design can achieve the optimal
solution while outperforming a heuristic approach for
attaining the max imal utility with up to 22% of gain in
the offered reward.
Admission control strategy. Finally, we show that it is
significant to have certain participating clients to guaran-
tee the communication efficiency for an a ccuracy level
in FL. We formulate a prob a bilistic model for threshold
accuracy estimatio n a nd find the corresponding number
of participation required to build a hig h-quality learning
model. We analyze the impact of the num ber of partic-
ipation in FL while determining the threshold accuracy
level with closed-form solutions. Finally, with numerical
results we demonstrate the structure of admission control
model for different configu rations.
The remainder of this paper is o rganized as follows. We
review related work in Section II, and present the system
model in Section III. In Section IV, we formulate an incentive
mechanism with a two-stage Stackelberg game, and investigate
the Nash equilibrium of the game with simulation results in
Section V. An admission control strategy is formulated to
define a minimum local accuracy level, and numer ical analysis
is presented in Section VI. Finally, conclusions are drawn in
Section VII.
II. RELATED WORK
The unprecede nted a mount of data necessitates the use of
distributed computational framework to provide solutions f or
various machine learning applications [11]–[15]. Using dis-
tributed optimization techn iques, researches on decentr alized

3
machine learning largely focused on com petitive a lgorithms to
train learning models the numbe r of cluster nod e s [18], [19],
[20], [21], with balanced and i.i.d data.
Setting a different motivation, FL recently has attracted an
increasing interest [7], [11], [12], [13], [15], [22] in which
collaboration of the number of devices with non-i.i.d and
unbalan ced data is adapted to train a learning model. In th e
pioneering works [1 1], [12], the authors presented the setting
for federated optimization, an d related technical challenges to
understand the convergence properties in FL. Existing work
studied these issues. For example, Wa ng, Shiqiang , et a l. [16]
theoretically analyzed the c onvergence rate of the distributed
gradient de scent. In this detailed work, the authors focus
on deducin g the optimal global aggregation frequency in a
distributed learning setting to minimize the loss fu nction of the
global pro blem. Their problem considers resource constrain e d
edge computing system. However, the setting differs with
our pro posed model where we h ave introduced the notion
of participatio n, and proposed a game theo retic inter a ction
between the workers (participating clients) and the master
(MEC server) to attain a cost effective FL framework. E arlier
to this work, Mc Mahan, H. Brendan, et al. in [15] proposed
a practical variant of FL where the glo bal aggr egatio n was
synchro nous with a fixed frequency. The authors confir med the
effectivene ss of this approach using various datasets. Further-
more, authors in [18] extended the theoretical training c onver-
gence analysis resu lts of [15] to genera l classes of distributed
learning appr oaches with communication and compu tation
cost. For the deep learning architecture where the objectives
are non-convex, authors in [23] proposed an algorithm namely
FedProx, a special case of FedAvg where a surrogate of the
global objective function was used to efficiently ensure the
empirical performance bound in FL setting. In this work, the
authors demonstrated the improvement in performance as in
their theo retical assumptions, both in terms of ro bustness and
convergence through a set of experiments.
Recent works adapt and extend the core concepts in [11],
[12], [1 5] to develop a communication-efficient FL algorithm,
where each participating clients in the federated learning
setting independently computes their local u pdates on the
current model and communicates with a central server to
aggregate the parameter s for the com putation of a global
model. The framework uses Federated Averaging (FedAvg)
algorithm to reduce communication costs. In these regard,
to characterize the communication and computation trade-
off during model updates, distributed machine learning based
on gradien t descent is widely used. In the mentioned work
[11], a variant of distributed stochastic gradient descen t (SGD)
was used to attain parallelism and improved computation.
Similarly, in [12], th e autho rs discussed about a family of
new randomized methods combining SGD, with primal an d
dual variants such a s Stochastic Variance Reduced Gradient
(SVRG), Federated Stochastic Variance Reduced Gradient
(FSVRG) and Stochastic Dual Coordinate Ascent (SDCA).
Further, in [24] the authors explained about the redundancy
in gradient exchanges in distributed SGD, and proposed a
Deep Gradien t Compre ssion (DGC) algorithm to enhance
communication efficiency in FL setting. The perfor mance of
parallel SGD and mini-batch parallel SGD had been discussed
in [25], [23] for fast convergence and effective communication
rounds. However, authors in their recent work [25] argue for
the sufficient improvement in gene ralization performance with
the variant of local SGD rather than the large mini-batch sizes,
even in a non-convex setting. In [ 26], the authors proposed
the Distributed Approxim ate Newton (DANE) algorithm for
precisely solving a general subproblem available locally before
averaging their solutions. In the recent work [27], the autho rs
designed a robust me thod which applies the pr oposed periodic-
averaging SGD (PASGD) technique to prevent communication
delay in the distributed SGD setting. Th e idea in this work
was to adapt the communication perio d such that it minimizes
the optimiz a tion error at each wall-clock time. To this end,
interestingly, in some of the latest works such as [28], the
authors have well-studied and demonstrated the privacy risk
scenario u nder collaborated learning mechanism such as FL.
In contrast to the above research tha t has overlooked the
participatory me thod to build a high-quality central ML model
and its criticality, and primarily focused on the convergence
of learning time with variants of learning algorithms, our
work a ddresses the challenge in designing a communication
and computational cost effective FL framework by exploring
a crowdsourcing structure. In this regard, few r ecent stud-
ies have discussed about the participation to build a global
ML model with FL as in [29], [30]. Basically, in [29] the
authors proposed a novel distributed approach based on FL
to learn the network-wide qu eue dynamics in vehicular net-
works for achieving ultra-reliable low-latency communication
(URLLC) v ia a joint power and resource allocation problem.
The veh ic les participate in FL to provide information r elated
to sample events (i.e., queue lengths) to parameterize the
distribution of extremes. In [30], the authors provided new
design principles to characterize edge-learning and highlighted
important research opportunities and applications with the new
philosophy f or wireless communication called learning-driven
communication. The authors also presented some of the signifi-
cant case studies and demonstrated the effectiveness of design
principles in this regards. Further, r ecent work [17] studied
the bloc k-chained FL architecture proposing the data reward
and mining reward mechanism for FL. However, these works
largely provide a latency analysis for the related applications.
Our paper focuses on the Stackelberg game-based incentive
mechanism design to reveal the iteration strategy of the par-
ticipating clients by solving the loca l subproblems for build-
ing a high-quality c entralized lear ning model. Inter estingly,
incentive mechanism has been studied for years in mobile
crowdsourcing/crowdsensing systems, especially with auction
mechanisms (e.g., [31], [32], [33]), contract a nd tou rnament
models ( e.g, [34], [35]) and Stackelberg game-based incentive
mechanisms suc h as in [36] and [37]. However, the design
goals were specific towards fair and truth ful data trading of
distributed sensing tasks. In this regard, the novelty of our
model is that we untangle and an alyze the complex interaction
scenario between the participating clients and the aggregating
edge server in th e crowdsourcing framework to ob tain a cost-
effective global learnin g model with out sharing local datasets.
Moreover, the proposed incentive mechanism mo dels such

4
MEC Server
Platform
tfo
r
m
Local Models
local training
Global Model
Aggregator
local parameters pass on
global model
parameter
L
MBS- MUs association
Backhaul
Local data
Participating clients
Fig. 1: Crowdsourcing framework for dec e ntralized machine
learning.
interactions to e nable communication-efficient FL, which is
able to achieve a target accuracy, in considera tion with the
performance metrics. Fur ther, we adopt the dual formulation of
the learning problem to better decompose the global problem
into distributed subproblem s for federated computation across
the participa ting clients.
III. SYSTEM MODEL
Fig. 1 illustrate s our proposed system model fo r th e crowd-
sourcing framework to enab le FL. The mod e l consists of a
number of mobile clients associated with a b a se station having
a central coordinating server (MEC server), acting a s a central
entity. The server facilitates the computa tion o f the para meters
aggregation, and feedbac k the global m odel updates in ea ch
global iteration. We consider a set of participating clients
K = {1, 2, . . . , K} in the crowdsourcing framework. The
crowdsourcer (platform) can interact with mobile clients via
an application interface, and aims at leveraging FL to build a
global ML model. As an example, consider a case where the
crowdsourcer ( referred to as MEC server hereafter, to avoid
any confusion) wants to build a ML model. Instead of just
relying on available local data to train the global model at the
MEC server, the global model is constructed utilizing the local
training data available acro ss several d istributed mobile clients.
Here, the g lobal model parame te r is first sha red by the MEC
server to train the local mod e ls in each p articipating client.
The local model’s parameters minimizing local loss func tions
are then sent back as feedback , and ar e aggregated to update
the global model parameter. The proce ss continues iteratively,
until c onvergence.
A. Federated Learning Backgrou nd
For FL, we consider unevenly partitioned tr aining data over
a large number of participa ting clients to train the local models
under any arbitrary learning algorithm . Each client k stores
its local dataset D
k
of size D
k
respectively. Then, we define
the training da ta size D =
P
K
k=1
D
k
. In a typical superv ised
learning setting, D
k
defines the collection of data samples
given as a set of input-output pairs {x
i
, y
i
}
D
k
i=1
, where x
i
R
d
is an input sample vector with d features, a nd y
i
R is
the labeled output value fo r the sample x
i
. The learning
problem, for an input sample vector x
i
(e.g., the pixels of
an image) is to find the model parameter vector w R
d
Algorithm 1 Federated Lea rning Framework
1: Input: Initialize dual variable α
0
R
D
, D
k
, k K.
2: for each aggregation r ound do
3: for k K do
4: Solve local subproblems (5) in parallel.
5: Update local variables as in (7).
6: end for
7: Aggregate to update global parameter as in (8).
8: end for
that chara c te rizes the output y
i
(e.g., the labeled output of the
image, such as the corresponding product names in a store)
with the loss function f
i
(w). Some examples of loss fun ctions
include f
i
(w) =
1
2
(x
T
i
w y
i
)
2
, y
i
R for a linear regre ssion
problem and f
i
(w) = max{0, 1 y
i
x
T
i
w} , y
i
{−1, 1} for
support vector machine s. The term x
T
i
w is often called a linear
mapping function. Therefore, the loss function based on the
local data of client k, termed local subproblem is f ormulated
as
J
k
(w) =
1
D
k
X
D
k
i=1
f
i
(w) + λg(w), (1)
where w R
d
is the local model parameter, and g(·) is a
regularizer function, commonly expre ssed as g(·) =
1
2
k·k
2
;
λ [0, 1]. This characterizes the local model in the FL
setting.
Global Problem: At the M EC server, the global problem
can be represented as the finite-sum objec tive of the form
min
wR
d
J(w) where J(w)
P
K
k=1
D
k
J
k
(w)
D
. (2)
Problems of such structure as in (2) where we aim to minimize
an average of K local objectives are well-known as distributed
consensus problems [38].
Solution Framework under Federated Learning: We recast
the regularized global problem in (2) as
min
wR
d
J(w) :=
1
D
X
D
i=1
f
i
(w) + λg(w), (3)
and decompose it as a dual op timization problem
1
in a
distributed scenar io [39] am ongst K participating clients. For
this, at first, we define X R
d×D
k
as a matrix with columns
having data points for i D
k
, k. Then, the corresponding
dual optimization pro blem of (3) for a convex lo ss function f
is
max
αR
D
G(α) :=
1
D
X
D
i=1
f
i
(α
i
) λg
(φ(α)), (4)
where α R
D
is the dual variable mapping to the primal
candidate vector, f
i
and g
are the convex conjuga tes of f
i
and
g respectively [40]; φ(α) =
1
λD
Xα. With the optimal value of
dual variable α
in ( 4), we have w(α
) = g
(φ(α
)) as the
optimal solution of (3) [ 39]. For the ease of representation,
we will use φ R
d
for φ(α) hereafter. We consider that
g is a strongly convex function , i.e., g
(·) is continuous
differentiable. Then, the solution is obtained following an
1
The duality gap provides a certificate to the quality of local solutions and
facilitates distributed training.

5
iterative approach to attain a global accuracy 0 ǫ 1 (i.e.,
E [G(α) G(α
)] < ǫ).
Under the distributed setting, we further define data parti-
tioning notations fo r clients k K to represent the working
principle of the framework. Let us d efine a weight vector
[k]
R
D
at the loca l subproblem k with its eleme nts zer o
for the unavailable data points. Following the assumption
of having f
i
as (1)-smooth and 1-strongly convex of g
to ensure convergence, its consequences is the approximate
solution to the local pr oblem k defined by the dual variables
α
[k]
,
[k]
, character iz ed as
max
[k]
R
D
G
k
(
[k]
; φ, α
[k]
), (5)
where G
k
(
[k]
; φ, α
[k]
) =
1
K
h∇(λg
(φ(α))),
[k]
i
λ
2
k
1
λD
X
[k]
[k]
k
2
is defined with a matrix X
[k]
columns having
data points for i D
k
, and zero padded otherwise. Eac h
participating client k K iterates over its com putational
resources using any ar bitrary solver to solve its local problem
(5) with a local relative θ
k
accuracy that characterizes th e
quality of the loc al solution, and produces a ra ndom ou tput
[k]
satisfying
E
h
G
k
(
[k]
) G
k
(
[k]
)
i
θ
k
h
G
k
(
[k]
) G
k
(0)
i
. (6)
Note that, with local (relative) accura cy θ
k
[0, 1], the value
of θ
k
= 1 suggests that no impr ovement was made by the local
solvers during successive local iterations. Th e n, the local dual
variable is updated as follows:
α
t+1
[k]
:= α
t
[k]
+
t
[k]
, k K. (7)
Correspon dingly, each pa rticipating client will broadcast the
local parameter defined as φ
t
[k]
:=
1
λD
X
[k]
t
[k]
, during each
round o f communica tion to the MEC server. T he MEC server
aggregates the local parameter (averaging) with the following
rule:
φ
t+1
:= φ
t
+
1
K
X
K
k=1
φ
t
[k]
, (8)
and distributes the global change in φ to the participating
clients, which is used to solve (5) in the next round of local
iterations. This way we observe the decoupling of global
model parameter from the need of local clients’ data
2
for
training a global model.
Algorithm 1 briefly summarizes the FL framework as an
iterative process to solve the global problem char acterized in
(3) for a global accuracy level. T he iterative p rocess (S2)-
(S8) of Algorithm 1 terminates when the globa l accuracy ǫ
is reached. A participatin g client k strategically
3
iterates over
its local training data D
k
to solve the local sub problem (5)
up to an accuracy θ
k
. In each communication rou nd with the
MEC server, the participating clients synchronously p ass on
their parameters φ
[k]
using a shared wireless channe l. The
MEC server then aggregates the loc a l model p a rameters φ a s
2
Note that we consider the availability of quality of data with each
participating client for solving a corresponding local subproblem. Further
related demonstration on dependency of the normalized data size and accuracy
can be found in [41].
3
Fewer iterations might not be sufficient to have an optimal local solution
[16].
Client 1
MEC Server
Client 2
ݎǡ ԄאԹ
ߠ
ʹ

ǡοԄ
Participating Clients (Local Models)
Global Model
Client K
Fig. 2: Interaction environment of federated learning setting
under crowdsourcing framework.
in (8), a nd broadcasts the glob al parameters required for the
participating clients to solve their local subproblems for the
next communication rou nd. Within th e framework, consider
that each participating client uses any arbitrary optimiza-
tion algorithm (su ch as Stochastic Gradient Descent (SGD),
Stochastic Average Gradient (SAG), Stochastic Variance Re-
duced Gradient (SVRG)) to attain a relative θ accuracy per
local subproblem. Then, for strongly c onvex objectives, the
general upper bo und on the num ber of iterations is de pendent
on local relative θ accuracy of the local sub problem and the
global m odel’s accuracy ǫ as [12]:
I
g
(ǫ, θ) =
ζ · log(
1
ǫ
)
1 θ
, (9)
where the local relative accur acy measur es the quality of the
local solution as defined in the earlier paragraphs. Further, in
this fo rmulation, we have replaced the term O(log(
1
ǫ
)) in the
numerator with ζ · log(
1
ǫ
), for a constant ζ > 0. For fixed
iterations I
g
at the MEC server to solve the global problem,
we observe in (9) that a very hig h local accuracy (small θ) can
significantly improve the global accura cy ǫ . However, e a ch
client k has to spend excessive resources in terms of local
iterations, I
l
k
to attain a small θ
k
accuracy as
I
l
k
(θ
k
) = γ
k
log
1
θ
k
, (10)
where γ
k
> 0 is a parameter choice of client k that depends
on the data size and conditio n nu mber of the local subproblem
[42]. Theref ore, to address this trade-off, MEC server can
setup an e conomic interaction environment (a crowdsourcing
framework) to motivate the participating clients for improving
the local relative θ
k
accuracy. Correspondingly, with the
increased reward, the participating c lients are motivated to
attain better local θ
k
accuracy, which as obser ved in (9)
can improve the global ǫ accuracy for a fixed number of
iterations I
g
of the MEC server to solve the global problem.
In this scenario , to capture the statistical an d sy stem-level
heteroge neity, the corresponding p erforma nce bound in (9) for
heteroge neous responses θ
k
can be modified considering the
worst-case response of the participating client as
I
g
(ǫ, θ
k
) =
ζ · log(
1
ǫ
)
1 max
k
θ
k
, k K. (11)

Citations
More filters
Journal ArticleDOI

Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems

TL;DR: This survey provides a comprehensive tutorial on federated learning and its associated concepts, technologies and learning approaches, and designs a three-level classification scheme that first categorizes the Federated learning literature based on the high-level challenge that they tackle, and classify each high- level challenge into a set of specific low-level challenges to foster a better understanding of the topic.
Journal ArticleDOI

Federated Learning Meets Blockchain in Edge Computing: Opportunities and Challenges

TL;DR: Several main issues in FLchain design are identified, including communication cost, resource allocation, incentive mechanism, security and privacy protection, and the applications of FLchain in popular MEC domains, such as edge data sharing, edge content caching and edge crowdsensing are investigated.
Journal ArticleDOI

Federated Learning in Vehicular Edge Computing: A Selective Model Aggregation Approach

TL;DR: A selective model aggregation approach is proposed, where “fine” local DNN models are selected and sent to the central server by evaluating the local image quality and computation capability, and demonstrated to outperform the original federated averaging approach in terms of accuracy and efficiency.
Journal ArticleDOI

Federated Learning for Vehicular Internet of Things: Recent Advances and Open Issues

TL;DR: The significance and technical challenges of applying FL in vehicular IoT, and future research directions are discussed, and a brief survey of existing studies on FL and its use in wireless IoT is conducted.
Posted Content

Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges

TL;DR: The recent advances of federated learning towards enabling Federated learning-powered IoT applications are presented and a set of metrics such as sparsification, robustness, quantization, scalability, security, and privacy, is delineated in order to rigorously evaluate the recent advances.
References
More filters
Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Posted Content

Communication-Efficient Learning of Deep Networks from Decentralized Data

TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Proceedings Article

Communication-Efficient Learning of Deep Networks from Decentralized Data

TL;DR: In this paper, the authors presented a decentralized approach for federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets.
Posted Content

Federated Learning: Strategies for Improving Communication Efficiency

TL;DR: Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Journal ArticleDOI

Mobile crowdsensing: current state and future challenges

TL;DR: The need for a unified architecture for mobile crowdsensing is argued and the requirements it must satisfy are envisioned.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "A crowdsourcing framework for on-device federated learning" ?

The authors tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, the authors show an incentivebased interaction between the crowdsourcing platform and the participating client ’ s independent strategies for training a global learning model, where each side maximizes its own benefit. The authors formulate a two-stage Stackelberg game to analyze such scenario and find the game ’ s equilibria. 

For future work, the authors will focus on mobile crowdsourcing framework to enable the self-organizing FL that considers task offloading 14 strategies for the resource constraint devices. Another direction is to study the impact of discriminatory pricing scheme for participation. The authors also plan to further investigate on participating client ’ s behavior, in terms of incentive and communication efficiency, to incorporate cooperative data trading scenario for the proposed framework [ 48 ], [ 49 ]. The authors will consider the scenario where the central coordinating MEC server is replaced by one of the participating clients and devices can offload their training task to the edge computing infrastructure. 

(20)Because the cost of communication is proportional to the speed and energy consumption in a distributed scenario [20], the bound defined in (19) explains the efficiency in terms of MEC server’s resource restriction for attaining ǫ accuracy. 

MEC ServerGlobal Modelin (8), and broadcasts the global parameters required for the participating clients to solve their local subproblems for the next communication round. 

the MEC server builds a high quality centralized model characterized by its utility function, with the data distributed over the participating clients by offering the reward. 

the authors show that their mechanism design can achieve the optimal solution while outperforming a heuristic approach for attaining the maximal utility with up to 22% of gain in the offered reward. 

Since the threshold accuracy θth can be adjusted by the MEC server for each round of solution, each participating client will maintain a response towards the maximum local consensus accuracy θth. 

authors in their recent work [25] argue for the sufficient improvement in generalization performance with the variant of local SGD rather than the large mini-batch sizes, even in a non-convex setting. 

Through a probabilistic model, the authors have designed and presented numerical results on an admission control strategy for the number of client’s participation to attain the corresponding local consensus accuracy. 

To be more specific about this relation, the authors can observe that with the increased value of (1 − θ), i.e., lower relative accuracy (high local accuracy), the MEC server can attain better utility due to corresponding increment in value of x(ǫ). 

The authors consider the local θ accuracy for the participating clients is an i.i.d and uniformly distributed random variable over the range [θmin, θmax], then the PDF of the responses can be defined as fθ(θ) =1 θmax−θmin . 

for the measured θ ∗ from the participating clients at MEC server, the utility maximization problem can be formulated as follows:max r≥0,x(ǫ)U(x(ǫ), r|θ∗), (21)s.t. x(ǫ)1−maxk θ∗k(r) ≤ δ. (22)In constraint (22), maxk θ ∗ k(r) characterizes the worst case response for the server side utility maximization problem with the bound on permissible global iterations. 

The authors consider the learning setting for a strongly convex model such as logistic regression, as discussed in Section III, to characterize and demonstrate the efficacy of the proposed framework. 

Their earlier discussion in Section IV and simulation results explain the significance of choosing a local θth accuracy to build a global model that maximizes the utility of the MEC server.