What future works have the authors mentioned in the paper "A crowdsourcing framework for on-device federated learning" ?

For future work, the authors will focus on mobile crowdsourcing framework to enable the self-organizing FL that considers task offloading 14 strategies for the resource constraint devices. Another direction is to study the impact of discriminatory pricing scheme for participation. The authors also plan to further investigate on participating client ’ s behavior, in terms of incentive and communication efficiency, to incorporate cooperative data trading scenario for the proposed framework [ 48 ], [ 49 ]. The authors will consider the scenario where the central coordinating MEC server is replaced by one of the participating clients and devices can offload their training task to the edge computing infrastructure.

What is the value of the bound in (19)?

(20)Because the cost of communication is proportional to the speed and energy consumption in a distributed scenario [20], the bound defined in (19) explains the efficiency in terms of MEC server’s resource restriction for attaining ǫ accuracy.

What is the MEC server’s ability to maintain the maximum local consensus accuracy?

Since the threshold accuracy θth can be adjusted by the MEC server for each round of solution, each participating client will maintain a response towards the maximum local consensus accuracy θth.

What is the effect of the proposed framework on the local consensus accuracy?

Through a probabilistic model, the authors have designed and presented numerical results on an admission control strategy for the number of client’s participation to attain the corresponding local consensus accuracy.

What is the way to measure the utility of a MEC server?

To be more specific about this relation, the authors can observe that with the increased value of (1 − θ), i.e., lower relative accuracy (high local accuracy), the MEC server can attain better utility due to corresponding increment in value of x(ǫ).

What is the definition of accuracy for the local clients?

The authors consider the local θ accuracy for the participating clients is an i.i.d and uniformly distributed random variable over the range [θmin, θmax], then the PDF of the responses can be defined as fθ(θ) =1 θmax−θmin .

What is the case for the utility maximization problem?

for the measured θ ∗ from the participating clients at MEC server, the utility maximization problem can be formulated as follows:max r≥0,x(ǫ)U(x(ǫ), r|θ∗), (21)s.t. x(ǫ)1−maxk θ∗k(r) ≤ δ. (22)In constraint (22), maxk θ ∗ k(r) characterizes the worst case response for the server side utility maximization problem with the bound on permissible global iterations.

What is the learning setting for a strongly convex model?

The authors consider the learning setting for a strongly convex model such as logistic regression, as discussed in Section III, to characterize and demonstrate the efficacy of the proposed framework.

What is the significance of choosing a local th accuracy?

Their earlier discussion in Section IV and simulation results explain the significance of choosing a local θth accuracy to build a global model that maximizes the utility of the MEC server.

(Open Access) A Crowdsourcing Framework for On-Device Federated Learning (2019) | Shashi Raj Pandey

Q: What are the contributions mentioned in the paper "A crowdsourcing framework for on-device federated learning" ?

The authors tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, the authors show an incentivebased interaction between the crowdsourcing platform and the participating client ’ s independent strategies for training a global learning model, where each side maximizes its own benefit. The authors formulate a two-stage Stackelberg game to analyze such scenario and find the game ’ s equilibria.

Q: What is the function that broadcasts the global parameters for the next communication round?

MEC ServerGlobal Modelin (8), and broadcasts the global parameters required for the participating clients to solve their local subproblems for the next communication round.

Q: What is the motivation for the MEC server to build a centralized model?

the MEC server builds a high quality centralized model characterized by its utility function, with the data distributed over the participating clients by offering the reward.

Q: How does the author show that the heuristic approach can achieve the maximum utility?

the authors show that their mechanism design can achieve the optimal solution while outperforming a heuristic approach for attaining the maximal utility with up to 22% of gain in the offered reward.

Q: What is the main argument for the improvement in generalization performance of local SGD?

authors in their recent work [25] argue for the sufficient improvement in generalization performance with the variant of local SGD rather than the large mini-batch sizes, even in a non-convex setting.

arXiv:1911.01046v2 [cs.LG] 3 Feb 2020

A Crowdsourcing Framework for On-Device

Federated Learning

Shashi Raj Pandey, Student Member, IEEE, Nguyen H. Tran, Senior Member, IEEE, Mehdi Bennis, S e-

nior Member, IEEE, Yan Kyaw Tun , Aunas Manzoor, and Choong Seon Hong, Senior Member, IEEE

Abstract—Federated learning (FL) rests on th e notion of

training a global model in a decentralized manner. Under this

setting, mobile devices perform computations on their local data

before uploading the required updates to improve the global

model. However, when the participating clients implement an

uncoordinated computation strategy, the difﬁculty is to handle

the communication efﬁciency (i.e., the number of communi-

cations per iteration) while exchanging the model parameters

during aggregation. Therefore, a key challenge in FL is how

users participate to build a high-quality global model with

communication efﬁciency. We tackle this issue by formulating a

utility maximization problem, and propose a novel crowdsourcing

framework to leverage FL that considers the communication efﬁ-

ciency during parameters exchange. First, we show an incentive-

based i nteraction between the crowdsourcing platform and the

participating client’s independent strategies for training a global

learning model, where each side maximizes i ts own beneﬁt. We

formulate a two-stage Stackelberg game to analyze such scenario

and ﬁnd the game’s equilibria. Second, we formalize an admission

control scheme for participating clients to ensure a level of

local accuracy. Simulated results demonstrate the efﬁcacy of our

proposed solution with up to 22% gain in t he offered reward.

Index Terms—Decentralized machine learning, f ederated

learning (FL), mobile crowdsourcing, incentive mechanism,

Stackelberg game.

I. INTRODUCTION

A. Background and motivation

Recent years have admittedly witnessed a tr emendous

growth in the use of Machine Learning (ML) technique s and

Manuscript received May 19, 2019; revised September 7, 2019, De-

cember 17, 2019 and January 13, 2020; accepted January 28, 2020. Date

of publication......; date of current version ....This work was supported by

Institute of Information & communications Technology Planning & Evaluation

(IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01287,

Evolvable Deep Learning Model Generation Platform for Edge Computing)

and the National Research Foundation of Korea(NRF) grant funded by the

Korea government (MSIT) (NRF-2017R1A2A2A05000995). A preliminary

version of this work has been presented at IE EE GLOBECOM 2019 [1].

(Corresponding author: Choong Seon Hong.)

Shashi Raj Pandey, Yan Kyaw Tun, Aunas Manzoor, and Choong Seon

Hong are with the Department of Computer Science and Engineering,

Kyung Hee University, Yongin-si, Gyeonggi-do 17104, Rep. of Korea, e-mail:

{shashiraj, ykyawtun7, aunasmanzoor, cshong}@khu.ac.kr.

Nguyen H. Tran is with the School of Computer Science, The University

of Sydney, NSW 2006, Australia, and also with the Department of Computer

Science and Engineering, Kyung Hee University, Seoul 17104, South Korea

(email: nguyen.tran@sydney.edu.au).

Mehdi Bennis is with the Center for Wireless Communications, University

of Oulu, 90014 Oulu, Finland, and also with the Department of Computer

Science and Engineering, Kyung Hee University, Seoul 17104, South Korea

(email: mehdi.bennis@oulu.ﬁ).

its applications in mobile devices. On one hand, according to

International Data Corporation, the shipments of smartp hones

reached 3 billions in 2018 [2], which implies a large crowd

of mobile users generating personalized data via interaction

with mobile applicatio ns, or with the use of in-built sensors

(e.g., cameras, microphones and GPS) exploited efﬁciently by

mobile crowdsensing paradigm (e.g., for indoor localization,

trafﬁc monitoring, navigation [3], [4], [5], [6]). On the other

hand, mobile d evices are getting empowered extensively with

specialized hardware architectures and computing engines

such as the CPU, GPU and DSP (e.g., energy efﬁcient

Qualcomm Hexagon Vector eXtensions o n Snapdragon 835

[7]) for solving diverse machine learnin g problems. Gartner

predicts that 80 percent of sm artphones will h ave on-device AI

capabilities by 2022. With d edicated chipsets, it will empower

smartphone makers to achieve market gain by offering more

secured facial recogn ition system, the ability to understand

user behaviors and offer pre dictive future [8]. This means on-

device intelligence will be ubiquitous!

In the backdrop to these exciting possibilities with on-

device intelligence, a White House report on principle of data

minimization had been published in 2012 to advocate the

privacy of consumer data [9]. The direct application of this

is the ML technique that leaves the train ing data distributed

on the mobile devices, called Federated Learning [7], [10],

[11], [12], [13]. This te c hnique unleashes a new collabora tive

ecosystem in ML to build a shared learning model while keep-

ing the training data locally on user d evices, w hich complies

with the data minim ization principle and protects user data

privacy. Unlike the conventional approaches of collecting all

the training data in one place to train a learning model, the

mobile users (p articipating clients) perform computation for

the updates on their local trainin g data with th e current global

model parameters, which are then aggregated an d broadcasted

back by the centralized coordinating server. This is an iterative

process that undergoes until an accuracy level of the learning

model is reached. By this way, FL deco uples the training

process to learn a global model by eliminating the mobility of

local tr aining data.

In another report, research organizations estimate that over

90% of the da ta will be stored and processed locally [1 4] (e.g.,

at the network edge), which provides an immense expo sure to

extract the beneﬁts of FL. Also, because of the huge market

potential of the untapped private data, FL is a pro mising tool

to exploit more p e rsonalized service oriented applications.

Local computations at the devices and their communication

with the centralized coordinating server are interleaved in a

complex manner to build a global learning model. Therefore,

a communication-e fﬁcient FL f ramework [12], [15] requires

solving several challenges. Furthermore, be cause of limited

data p er device to train a high-quality lea rning mode l, the

difﬁculty is to incentivize a large number of mobile users

to ensur e cooperatio n. This important aspect in FL has b e en

overlooked so far, where the question is how can we motivate a

number of participating clients, collectively providing a large

number of data samples to enable FL without sharing their

private data? Note th at, b oth par ticipating clients and the

server can beneﬁt from training a global model. However,

to fully reap th e beneﬁts of high-quality updates, the multi-

access edge computing (MEC) server has to incentivize clients

for participation. In particular, under heterogeneous scenarios,

such as an ad a ptive and cognitive-communication network,

client’s participation in FL can spur collabora tion and provide

beneﬁts for ope rators to accelerate an d deliver n etwork-wide

services [16]. Similarly, clients in general are not concerned

with the reliability and scalab ility issues of FL [17]. Therefore,

to incentivize users to participate in the collaborative training,

we require a market place. For this purp ose, we present

a value-based compensation m echanism to the participatin g

clients, such as a bounty (e.g., data discount package), as per

their level of participatio n in the crowdsourcing framework.

This is reﬂected in terms of local accuracy level, i.e., quality

of solution to the local subproblem, in which the framework

will protect the model from imperfect updates by restricting

the clients trying to compromise the model (for instance, with

skewed data because of its i.i.d nature or data poisoning)

[3]. Moreover, we cast the global loss minimization problem

as a primal-dual optimization problem, instead of adopting

traditional gradient descent learning algo rithm in the federated

learning setting (e.g., FedAvg [15]). This enables in (a) proper

assessment of the quality of the local solution to improve

personalization and fairness amongst the pa rticipating clients

while training a global model, (b ) effective decouplin g of the

local solvers, thereby balancing communication and computa-

tion in the distributed setting.

The goal of this paper is two-fold: First, we formalize an

incentive mechanism to develop a participatory framework for

mobile clients to perf orm FL for imp roving th e global model.

Second, we addr ess the challen ge of maintaining communica-

tion efﬁciency while exchanging the model parameters with

a number of participating clients during aggregation . Speciﬁ-

cally, communication efﬁciency in this scenario accounts for

communications per iteration with an arbitrary algorith m to

maintain an acceptable accuracy level for the global mode l.

B. Con tributions

In this work, we design and analyze a novel crowdsourcin g

framework to realize the FL vision. Speciﬁcally, o ur contribu-

tions are summarized as follows:

• A crowdsourcing framework to enable c ommunication

-efﬁcient FL. We design a crowdsourcing framework, in

which FL participating clients iteratively solve the local

learning subproblems for an accuracy level subject to an

offered incentive. We then establish a c ommun ic ation-

efﬁcient cost model f or the participating clients. We then

formu late an incentive mechanism to induce the n ecessary

interaction between the MEC server and the participating

clients for the FL in Section IV.

• Solution approach using Stackelberg game. With the

offered incentive, the participating clients indepe ndently

choose their strategies to solve the loca l subproblem for

a certain accuracy level in order to minimize the ir p artic-

ipation co sts. Correspondingly, the MEC server builds a

high quality ce ntralized model characterized by its utility

function, with the data distributed over the participating

clients by offering the reward. We exp loit this tightly

coupled motives of the participating clients and the MEC

server as a two-stage Stackelberg game. The equ ivalent

optimization problem is characterized as a mixed-boolean

programming which requires an exponen tial co mplexity

effort for ﬁnding the solution. We a nalyze the game’s

equilibria and propose a lin e ar complexity algor ithm to

obtain th e optimal solution.

• Participant’s response analysis and case study. We

next analyze the r esponse behavior of the particip ating

clients via the solutions of the Stackelberg g ame, and

establish the efﬁcacy of our proposed framework via

case studies. We show that the linear-complexity solu tion

approa c h attains the same p e rformance as the mixed-

boolean programming problem . Furthermore, we show

that our mechanism design can achieve the optimal

solution while outperforming a heuristic approach for

attaining the max imal utility with up to 22% of gain in

the offered reward.

• Admission control strategy. Finally, we show that it is

signiﬁcant to have certain participating clients to guaran-

tee the communication efﬁciency for an a ccuracy level

in FL. We formulate a prob a bilistic model for threshold

accuracy estimatio n a nd ﬁnd the corresponding number

of participation required to build a hig h-quality learning

model. We analyze the impact of the num ber of partic-

ipation in FL while determining the threshold accuracy

level with closed-form solutions. Finally, with numerical

results we demonstrate the structure of admission control

model for different conﬁgu rations.

The remainder of this paper is o rganized as follows. We

review related work in Section II, and present the system

model in Section III. In Section IV, we formulate an incentive

mechanism with a two-stage Stackelberg game, and investigate

the Nash equilibrium of the game with simulation results in

Section V. An admission control strategy is formulated to

deﬁne a minimum local accuracy level, and numer ical analysis

is presented in Section VI. Finally, conclusions are drawn in

Section VII.

II. RELATED WORK

The unprecede nted a mount of data necessitates the use of

distributed computational framework to provide solutions f or

various machine learning applications [11]–[15]. Using dis-

tributed optimization techn iques, researches on decentr alized

machine learning largely focused on com petitive a lgorithms to

train learning models the numbe r of cluster nod e s [18], [19],

[20], [21], with balanced and i.i.d data.

Setting a different motivation, FL recently has attracted an

increasing interest [7], [11], [12], [13], [15], [22] in which

collaboration of the number of devices with non-i.i.d and

unbalan ced data is adapted to train a learning model. In th e

pioneering works [1 1], [12], the authors presented the setting

for federated optimization, an d related technical challenges to

understand the convergence properties in FL. Existing work

studied these issues. For example, Wa ng, Shiqiang , et a l. [16]

theoretically analyzed the c onvergence rate of the distributed

gradient de scent. In this detailed work, the authors focus

on deducin g the optimal global aggregation frequency in a

distributed learning setting to minimize the loss fu nction of the

global pro blem. Their problem considers resource constrain e d

edge computing system. However, the setting differs with

our pro posed model where we h ave introduced the notion

of participatio n, and proposed a game theo retic inter a ction

between the workers (participating clients) and the master

(MEC server) to attain a cost effective FL framework. E arlier

to this work, Mc Mahan, H. Brendan, et al. in [15] proposed

a practical variant of FL where the glo bal aggr egatio n was

synchro nous with a ﬁxed frequency. The authors conﬁr med the

effectivene ss of this approach using various datasets. Further-

more, authors in [18] extended the theoretical training c onver-

gence analysis resu lts of [15] to genera l classes of distributed

learning appr oaches with communication and compu tation

cost. For the deep learning architecture where the objectives

are non-convex, authors in [23] proposed an algorithm namely

FedProx, a special case of FedAvg where a surrogate of the

global objective function was used to efﬁciently ensure the

empirical performance bound in FL setting. In this work, the

authors demonstrated the improvement in performance as in

their theo retical assumptions, both in terms of ro bustness and

convergence through a set of experiments.

Recent works adapt and extend the core concepts in [11],

[12], [1 5] to develop a communication-efﬁcient FL algorithm,

where each participating clients in the federated learning

setting independently computes their local u pdates on the

current model and communicates with a central server to

aggregate the parameter s for the com putation of a global

model. The framework uses Federated Averaging (FedAvg)

algorithm to reduce communication costs. In these regard,

to characterize the communication and computation trade-

off during model updates, distributed machine learning based

on gradien t descent is widely used. In the mentioned work

[11], a variant of distributed stochastic gradient descen t (SGD)

was used to attain parallelism and improved computation.

Similarly, in [12], th e autho rs discussed about a family of

new randomized methods combining SGD, with primal an d

dual variants such a s Stochastic Variance Reduced Gradient

(SVRG), Federated Stochastic Variance Reduced Gradient

(FSVRG) and Stochastic Dual Coordinate Ascent (SDCA).

Further, in [24] the authors explained about the redundancy

in gradient exchanges in distributed SGD, and proposed a

Deep Gradien t Compre ssion (DGC) algorithm to enhance

communication efﬁciency in FL setting. The perfor mance of

parallel SGD and mini-batch parallel SGD had been discussed

in [25], [23] for fast convergence and effective communication

rounds. However, authors in their recent work [25] argue for

the sufﬁcient improvement in gene ralization performance with

the variant of local SGD rather than the large mini-batch sizes,

even in a non-convex setting. In [ 26], the authors proposed

the Distributed Approxim ate Newton (DANE) algorithm for

precisely solving a general subproblem available locally before

averaging their solutions. In the recent work [27], the autho rs

designed a robust me thod which applies the pr oposed periodic-

averaging SGD (PASGD) technique to prevent communication

delay in the distributed SGD setting. Th e idea in this work

was to adapt the communication perio d such that it minimizes

the optimiz a tion error at each wall-clock time. To this end,

interestingly, in some of the latest works such as [28], the

authors have well-studied and demonstrated the privacy risk

scenario u nder collaborated learning mechanism such as FL.

In contrast to the above research tha t has overlooked the

participatory me thod to build a high-quality central ML model

and its criticality, and primarily focused on the convergence

of learning time with variants of learning algorithms, our

work a ddresses the challenge in designing a communication

and computational cost effective FL framework by exploring

a crowdsourcing structure. In this regard, few r ecent stud-

ies have discussed about the participation to build a global

ML model with FL as in [29], [30]. Basically, in [29] the

authors proposed a novel distributed approach based on FL

to learn the network-wide qu eue dynamics in vehicular net-

works for achieving ultra-reliable low-latency communication

(URLLC) v ia a joint power and resource allocation problem.

The veh ic les participate in FL to provide information r elated

to sample events (i.e., queue lengths) to parameterize the

distribution of extremes. In [30], the authors provided new

design principles to characterize edge-learning and highlighted

important research opportunities and applications with the new

philosophy f or wireless communication called learning-driven

communication. The authors also presented some of the signiﬁ-

cant case studies and demonstrated the effectiveness of design

principles in this regards. Further, r ecent work [17] studied

the bloc k-chained FL architecture proposing the data reward

and mining reward mechanism for FL. However, these works

largely provide a latency analysis for the related applications.

Our paper focuses on the Stackelberg game-based incentive

mechanism design to reveal the iteration strategy of the par-

ticipating clients by solving the loca l subproblems for build-

ing a high-quality c entralized lear ning model. Inter estingly,

incentive mechanism has been studied for years in mobile

crowdsourcing/crowdsensing systems, especially with auction

mechanisms (e.g., [31], [32], [33]), contract a nd tou rnament

models ( e.g, [34], [35]) and Stackelberg game-based incentive

mechanisms suc h as in [36] and [37]. However, the design

goals were speciﬁc towards fair and truth ful data trading of

distributed sensing tasks. In this regard, the novelty of our

model is that we untangle and an alyze the complex interaction

scenario between the participating clients and the aggregating

edge server in th e crowdsourcing framework to ob tain a cost-

effective global learnin g model with out sharing local datasets.

Moreover, the proposed incentive mechanism mo dels such

MEC Server

Platform

tfo

Local Models

local training

Global Model

Aggregator

local parameters pass on

global model

parameter

MBS- MUs association

Backhaul

Local data

Participating clients

Fig. 1: Crowdsourcing framework for dec e ntralized machine

learning.

interactions to e nable communication-efﬁcient FL, which is

able to achieve a target accuracy, in considera tion with the

performance metrics. Fur ther, we adopt the dual formulation of

the learning problem to better decompose the global problem

into distributed subproblem s for federated computation across

the participa ting clients.

III. SYSTEM MODEL

Fig. 1 illustrate s our proposed system model fo r th e crowd-

sourcing framework to enab le FL. The mod e l consists of a

number of mobile clients associated with a b a se station having

a central coordinating server (MEC server), acting a s a central

entity. The server facilitates the computa tion o f the para meters

aggregation, and feedbac k the global m odel updates in ea ch

global iteration. We consider a set of participating clients

K = {1, 2, . . . , K} in the crowdsourcing framework. The

crowdsourcer (platform) can interact with mobile clients via

an application interface, and aims at leveraging FL to build a

global ML model. As an example, consider a case where the

crowdsourcer ( referred to as MEC server hereafter, to avoid

any confusion) wants to build a ML model. Instead of just

relying on available local data to train the global model at the

MEC server, the global model is constructed utilizing the local

training data available acro ss several d istributed mobile clients.

Here, the g lobal model parame te r is ﬁrst sha red by the MEC

server to train the local mod e ls in each p articipating client.

The local model’s parameters minimizing local loss func tions

are then sent back as feedback , and ar e aggregated to update

the global model parameter. The proce ss continues iteratively,

until c onvergence.

A. Federated Learning Backgrou nd

For FL, we consider unevenly partitioned tr aining data over

a large number of participa ting clients to train the local models

under any arbitrary learning algorithm . Each client k stores

its local dataset D

of size D

respectively. Then, we deﬁne

the training da ta size D =

k=1

. In a typical superv ised

learning setting, D

deﬁnes the collection of data samples

given as a set of input-output pairs {x

, y

}

i=1

, where x

∈ R

is an input sample vector with d features, a nd y

∈ R is

the labeled output value fo r the sample x

. The learning

problem, for an input sample vector x

(e.g., the pixels of

an image) is to ﬁnd the model parameter vector w ∈ R

Algorithm 1 Federated Lea rning Framework

1: Input: Initialize dual variable α

∈ R

, D

, ∀k ∈ K.

2: for each aggregation r ound do

3: for k ∈ K do

4: Solve local subproblems (5) in parallel.

5: Update local variables as in (7).

6: end for

7: Aggregate to update global parameter as in (8).

8: end for

that chara c te rizes the output y

(e.g., the labeled output of the

image, such as the corresponding product names in a store)

with the loss function f

(w). Some examples of loss fun ctions

include f

(w) =

w − y

)

, y

∈ R for a linear regre ssion

problem and f

(w) = max{0, 1 − y

w} , y

∈ {−1, 1} for

support vector machine s. The term x

w is often called a linear

mapping function. Therefore, the loss function based on the

local data of client k, termed local subproblem is f ormulated

(w) =

i=1

(w) + λg(w), (1)

where w ∈ R

is the local model parameter, and g(·) is a

regularizer function, commonly expre ssed as g(·) =

k·k

;

∀λ ∈ [0, 1]. This characterizes the local model in the FL

setting.

Global Problem: At the M EC server, the global problem

can be represented as the ﬁnite-sum objec tive of the form

min

w∈R

J(w) where J(w) ≡

k=1

(w)

. (2)

Problems of such structure as in (2) where we aim to minimize

an average of K local objectives are well-known as distributed

consensus problems [38].

Solution Framework under Federated Learning: We recast

the regularized global problem in (2) as

min

w∈R

J(w) :=

i=1

(w) + λg(w), (3)

and decompose it as a dual op timization problem

in a

distributed scenar io [39] am ongst K participating clients. For

this, at ﬁrst, we deﬁne X ∈ R

d×D

as a matrix with columns

having data points for i ∈ D

, ∀k. Then, the corresponding

dual optimization pro blem of (3) for a convex lo ss function f

max

α∈R

G(α) :=

i=1

−f

∗

(−α

) − λg

∗

(φ(α)), (4)

where α ∈ R

is the dual variable mapping to the primal

candidate vector, f

∗

and g

∗

are the convex conjuga tes of f

and

g respectively [40]; φ(α) =

λD

Xα. With the optimal value of

dual variable α

∗

in ( 4), we have w(α

∗

) = ∇g

∗

(φ(α

∗

)) as the

optimal solution of (3) [ 39]. For the ease of representation,

we will use φ ∈ R

for φ(α) hereafter. We consider that

g is a strongly convex function , i.e., g

∗

(·) is continuous

differentiable. Then, the solution is obtained following an

The duality gap provides a certiﬁcate to the quality of local solutions and

facilitates distributed training.

iterative approach to attain a global accuracy 0 ≤ ǫ ≤ 1 (i.e.,

E [G(α) − G(α

∗

)] < ǫ).

Under the distributed setting, we further deﬁne data parti-

tioning notations fo r clients k ∈ K to represent the working

principle of the framework. Let us d eﬁne a weight vector



[k]

∈ R

at the loca l subproblem k with its eleme nts zer o

for the unavailable data points. Following the assumption

of having f

as (1/γ)-smooth and 1-strongly convex of g

to ensure convergence, its consequences is the approximate

solution to the local pr oblem k deﬁned by the dual variables

[k]

, 

[k]

, character iz ed as

max



[k]

∈R

(

[k]

; φ, α

[k]

), (5)

where G

(

[k]

; φ, α

[k]

) = −

− h∇(λg

∗

(φ(α))), 

[k]

i −

λD

[k]



[k]

is deﬁned with a matrix X

[k]

columns having

data points for i ∈ D

, and zero padded otherwise. Eac h

participating client k ∈ K iterates over its com putational

resources using any ar bitrary solver to solve its local problem

(5) with a local relative θ

accuracy that characterizes th e

quality of the loc al solution, and produces a ra ndom ou tput



[k]

satisfying

(

∗

[k]

) − G

(

[k]

)

≤ θ

(

∗

[k]

) − G

(0)

. (6)

Note that, with local (relative) accura cy θ

∈ [0, 1], the value

of θ

= 1 suggests that no impr ovement was made by the local

solvers during successive local iterations. Th e n, the local dual

variable is updated as follows:

t+1

[k]

:= α

[k]

+ 

[k]

, ∀k ∈ K. (7)

Correspon dingly, each pa rticipating client will broadcast the

local parameter deﬁned as ∆φ

[k]

λD

[k]



[k]

, during each

round o f communica tion to the MEC server. T he MEC server

aggregates the local parameter (averaging) with the following

rule:

t+1

:= φ

k=1

∆φ

[k]

, (8)

and distributes the global change in φ to the participating

clients, which is used to solve (5) in the next round of local

iterations. This way we observe the decoupling of global

model parameter from the need of local clients’ data

for

training a global model.

Algorithm 1 brieﬂy summarizes the FL framework as an

iterative process to solve the global problem char acterized in

(3) for a global accuracy level. T he iterative p rocess (S2)-

(S8) of Algorithm 1 terminates when the globa l accuracy ǫ

is reached. A participatin g client k strategically

iterates over

its local training data D

to solve the local sub problem (5)

up to an accuracy θ

. In each communication rou nd with the

MEC server, the participating clients synchronously p ass on

their parameters ∆φ

[k]

using a shared wireless channe l. The

MEC server then aggregates the loc a l model p a rameters φ a s

Note that we consider the availability of quality of data with each

participating client for solving a corresponding local subproblem. Further

related demonstration on dependency of the normalized data size and accuracy

can be found in [41].

Fewer iterations might not be sufﬁcient to have an optimal local solution

[16].

Client 1

MEC Server

Client 2

ݎǡ ԄאԹ

ௗ

ߠ



ǡοԄ

ଶ

Participating Clients (Local Models)

Global Model

Client K

Fig. 2: Interaction environment of federated learning setting

under crowdsourcing framework.

in (8), a nd broadcasts the glob al parameters required for the

participating clients to solve their local subproblems for the

next communication rou nd. Within th e framework, consider

that each participating client uses any arbitrary optimiza-

tion algorithm (su ch as Stochastic Gradient Descent (SGD),

Stochastic Average Gradient (SAG), Stochastic Variance Re-

duced Gradient (SVRG)) to attain a relative θ accuracy per

local subproblem. Then, for strongly c onvex objectives, the

general upper bo und on the num ber of iterations is de pendent

on local relative θ accuracy of the local sub problem and the

global m odel’s accuracy ǫ as [12]:

(ǫ, θ) =

ζ · log(

)

1 − θ

, (9)

where the local relative accur acy measur es the quality of the

local solution as deﬁned in the earlier paragraphs. Further, in

this fo rmulation, we have replaced the term O(log(

)) in the

numerator with ζ · log(

), for a constant ζ > 0. For ﬁxed

iterations I

at the MEC server to solve the global problem,

we observe in (9) that a very hig h local accuracy (small θ) can

signiﬁcantly improve the global accura cy ǫ . However, e a ch

client k has to spend excessive resources in terms of local

iterations, I

to attain a small θ

accuracy as

(θ

) = γ

log





, (10)

where γ

> 0 is a parameter choice of client k that depends

on the data size and conditio n nu mber of the local subproblem

[42]. Theref ore, to address this trade-off, MEC server can

setup an e conomic interaction environment (a crowdsourcing

framework) to motivate the participating clients for improving

the local relative θ

accuracy. Correspondingly, with the

increased reward, the participating c lients are motivated to

attain better local θ

accuracy, which as obser ved in (9)

can improve the global ǫ accuracy for a ﬁxed number of

iterations I

of the MEC server to solve the global problem.

In this scenario , to capture the statistical an d sy stem-level

heteroge neity, the corresponding p erforma nce bound in (9) for

heteroge neous responses θ

can be modiﬁed considering the

worst-case response of the participating client as

(ǫ, θ

) =

ζ · log(

)

1 − max

, ∀k ∈ K. (11)

A Crowdsourcing Framework for On-Device Federated Learning

Figures

Citations

Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems

Federated Learning Meets Blockchain in Edge Computing: Opportunities and Challenges

Federated Learning in Vehicular Edge Computing: A Selective Model Aggregation Approach

Federated Learning for Vehicular Internet of Things: Recent Advances and Open Issues

Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges

References

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Communication-Efficient Learning of Deep Networks from Decentralized Data

Communication-Efficient Learning of Deep Networks from Decentralized Data

Federated Learning: Strategies for Improving Communication Efficiency

Mobile crowdsensing: current state and future challenges

Related Papers (5)

A Game-Theoretic Framework for Incentive Mechanism Design in Federated Learning

Hierarchical Incentive Mechanism Design for Federated Machine Learning in Mobile Networks

Dynamic Digital Twin and Federated Learning with Incentives for Air-Ground Networks

Mechanism Design for An Incentive-aware Blockchain-enabled Federated Learning Platform

Collaborative Machine Learning Markets with Data-Replication-Robust Payments.

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "A crowdsourcing framework for on-device federated learning" ?

Q2. What future works have the authors mentioned in the paper "A crowdsourcing framework for on-device federated learning" ?

Q3. What is the value of the bound in (19)?

Q4. What is the function that broadcasts the global parameters for the next communication round?

Q5. What is the motivation for the MEC server to build a centralized model?

Q6. How does the author show that the heuristic approach can achieve the maximum utility?

Q7. What is the MEC server’s ability to maintain the maximum local consensus accuracy?

Q8. What is the main argument for the improvement in generalization performance of local SGD?

Q9. What is the effect of the proposed framework on the local consensus accuracy?

Q10. What is the way to measure the utility of a MEC server?

Q11. What is the definition of accuracy for the local clients?

Q12. What is the case for the utility maximization problem?

Q13. What is the learning setting for a strongly convex model?

Q14. What is the significance of choosing a local th accuracy?