scispace - formally typeset
Open AccessJournal ArticleDOI

A framework for protecting worker location privacy in spatial crowdsourcing

Reads0
Chats0
TLDR
This paper argues that existing location privacy techniques are not sufficient for SC, and a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers is proposed.
Abstract
Spatial Crowdsourcing (SC) is a transformative platform that engages individuals, groups and communities in the act of collecting, analyzing, and disseminating environmental, social and other spatio-temporal information. The objective of SC is to outsource a set of spatio-temporal tasks to a set of workers, i.e., individuals with mobile devices that perform the tasks by physically traveling to specified locations of interest. However, current solutions require the workers, who in many cases are simply volunteering for a cause, to disclose their locations to untrustworthy entities. In this paper, we introduce a framework for protecting location privacy of workers participating in SC tasks. We argue that existing location privacy techniques are not sufficient for SC, and we propose a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers. We investigate analytical models and task assignment strategies that balance multiple crucial aspects of SC functionality, such as task completion rate, worker travel distance and system overhead. Extensive experimental results on real-world datasets show that the proposed technique protects workers' location privacy without incurring significant performance metrics penalties.

read more

Content maybe subject to copyright    Report

A Framework for Protecting Worker Location Privacy in
Spatial Crowdsourcing
Hien To
Computer Science Dept.
Univ. of Southern California
hto@usc.edu
Gabriel Ghinita
Dept. of Computer Science
UMass Boston
Gabriel.Ghinita@umb.edu
Cyrus Shahabi
Computer Science Dept.
Univ. of Southern California
shahabi@usc.edu
ABSTRACT
Spatial Crowdsourcing (SC) is a transformative platform
that engages individuals, groups and communities in the act
of collecting, analyzing, and disseminating environmental,
social and other spatio-temporal information. The objective
of SC is to outsource a set of spatio-temporal tasks to a set
of workers, i.e., individuals with mobile devices that perform
the tasks by physically traveling to specified locations of in-
terest. However, current solutions require the workers, who
in many cases are simply volunteering for a cause, to dis-
close their locations to untrustworthy entities. In this paper,
we introduce a framework for protecting location privacy of
workers participating in SC tasks. We argue that existing
location privacy techniques are not sufficient for SC, and
we propose a mechanism based on differential privacy and
geocasting that achieves effective SC services while offering
privacy guarantees to workers. We investigate analytical
models and task assignment strategies that balance multiple
crucial aspects of SC functionality, such as task completion
rate, worker travel distance and system overhead. Exten-
sive experimental results on real-world datasets show that
the proposed technique protects workers’ location privacy
without incurring significant performance metrics penalties.
1. INTRODUCTION
Recent years have witnessed a significant growth in the
number of mobile smart phone users, as well as fast develop-
ment in phone hardware performance, software functional-
ity and communication features. Today’s mobile phones are
powerful devices that can act as multi-modal sensors collect-
ing and sharing various types of data, e.g., picture, video, lo-
cation, movement speed, direction and acceleration. In this
context, Spatial Crowdsourcing (SC) [14] is emerging as a
novel and transformative platform that engages individuals,
groups and communities in the act of collecting, analyzing,
and disseminating environmental, social and other informa-
tion for which spatio-temporal features are relevant. With
SC, task requesters outsource their spatio-temporal tasks to
This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this li-
cense, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain per-
mission prior to any use beyond those covered by the license. Contact
copyright holder by emailing info@vldb.org. Articles from this volume
were invited to present their results at the 40th International Conference on
Very Large Data Bases, September 1st - 5th 2014, Hangzhou, China.
Proceedings of the VLDB Endowment, Vol. 7, No. 10
Copyright 2014 VLDB Endowment 2150-8097/14/06.
a set of workers, i.e., individuals with mobile devices that
perform the tasks by physically traveling to specified loca-
tions of interest. The nature of tasks may vary from en-
vironmental sensing to capturing images at social or enter-
tainment events. Typically, requesters and workers register
with a centralized spatial crowdsourcing server (SC-server)
that acts as a broker between parties, and often also plays
a role in how tasks are assigned to workers (i.e., scheduling
according to some performance criteria). SC has numer-
ous applications in domains such as environmental sensing,
journalism, crisis response and urban planning.
Consider an emergency response scenario where the Red
Cross (i.e., requester) is interested in collecting pictures and
videos of disaster areas from various locations in a country
(e.g., typhoon Haiyan in the Philippines in 2013). The re-
quester issues a query to an SC-server, and the request is
forwarded to workers situated in proximity to the zones of
interest. The workers record photos and videos using their
mobile phones, and send the results back to the requester.
Participatory sensing is another domain where SC is very
suitable. Mobile users can leverage their sensor-equipped
mobile devices to collect environmental or traffic data.
SC is feasible only if workers and tasks are matched ef-
fectively, i.e., tasks are completed in a timely fashion, and
workers do not need to travel across very long distances.
To that extent, matching at the SC-server must take into
account the locations of workers. However, the SC-server
may not be trusted, and disclosing individual locations has
serious privacy implications [9, 20, 7, 3]. Knowing worker lo-
cations, an adversary can stage a broad spectrum of attacks
such as physical surveillance and stalking, identity theft, and
breach of sensitive information (e.g., an individual’s health
status, alternative lifestyles, political and religious views).
Thus, ensuring location privacy is an essential aspect of SC,
because mobile users will not accept to engage in spatial
tasks if their privacy is violated.
Several solutions [9, 20, 7] have been proposed to protect
location-based queries, i.e., given an individual’s location,
find points of interest in the proximity without disclosing
the actual coordinates. However, in SC, a worker’s location
is no longer part of the query, but rather the result of a
spatial query around the task. In addition, while some work
considers queries on private locations in the context of out-
sourced databases [28, 27], it is assumed that the data owner
entity and the querying entity trust each other, with protec-
tion being offered only against intermediate service provider
entities. This scenario does not apply in SC, as there is no
inherent trust relationship between requesters and workers.

We propose a framework for protecting privacy of worker
locations, whereby the SC-server only has access to data san-
itized according to differential privacy (DP) [5]. In practice,
there may be many SC-servers run by diverse organizations
that do not have an established trust relationships with the
workers. On the other hand, every worker subscribes to a
cellular service provider (CSP) that already has access to
the worker locations (e.g., through cell tower triangulation).
The CSP signs a contract with its subscribers, which stipu-
lates the terms and conditions of location disclosure. Thus,
the CSP can release worker locations to third party SC-
servers in noisy form, according to DP. However, using DP
introduces two difficult challenges, as discussed next.
First, the SC-server must match workers to tasks using
noisy data, which requires complex strategies to ensure ef-
fective task assignment. To create sanitized data releases
at the CSP, we adopt the Private Spatial Decomposition
(PSD) approach, first introduced in [3]. A PSD is a san-
itized spatial index, where each index node contains a noisy
count of the workers rooted at that node. Specifically, we
devise a mechanism to create a Worker PSD by extending
the Adaptive Grid (AG) technique [23]. To ensure that task
assignment has a high success rate, we introduce an ana-
lytical model that determines with high probability a PSD
partition around the task location that includes sufficient
workers to complete the task.
Second, by the nature of the DP protection model, fake en-
tries may need to be created in the PSD. Thus, the SC-server
cannot directly contact workers, not even if pseudonyms are
used, as merely establishing a network connection to an en-
tity would allow the SC-server to learn whether an entry is
real or not, and breach privacy. To address this challenge,
we propose the use of geocasting [22] as means to deliver
task requests to workers. Once a PSD partition is identified
by the analytical model outlined above, the task request is
geocast to all the workers within the partition. Geocast in-
troduces overhead considerations that need to be carefully
considered in the framework design.
Our specific contributions are:
(i). We identify the specific challenges of location privacy
in the context of SC, and we propose a framework
that achieves differentially-private protection guaran-
tees. To the best of our knowledge, this is the first
work to study location privacy for SC.
(ii). We propose an analytical model that measures the
probability of task completion with uncertain worker
locations, and we devise a search strategy that finds
appropriate PSD partitions to ensure high success rate
of task assignment.
(iii). We introduce a geocast mechanism for task request
dissemination that is necessary to overcome the re-
strictions imposed by DP, and we factor the geocast
system overhead in the PSD partition search strategy.
(iv). We conduct an extensive set of experiments on real-
world datasets which shows that the proposed frame-
work is able to protect workers’ location privacy with-
out significantly affecting the effectiveness and effi-
ciency of the SC system.
The remainder of this paper is organized as follows: Sec-
tion 2 presents necessary background. Section 3 introduces
the proposed privacy framework, whereas Sections 4 and 5
detail the proposed solution. Experimental results are pre-
sented in Section 6, followed by a survey of related work in
Section 7, and conclusions in Section 8.
2. BACKGROUND
2.1 Spatial Crowdsourcing
Spatial Crowdsourcing SC [14] is a type of online crowd-
sourcing where performing a task requires the worker to
travel to the location of the task (termed spatial task). Ac-
cording to the taxonomy in [14], there are two categories of
SC, based on how workers are matched to tasks. In Worker
Selected Tasks (WST) mode, the SC-server publishes the
spatial tasks online, and workers can autonomously choose
any tasks in their vicinity without the need to coordinate
with the SC-server. In Server Assigned Tasks (SAT) mode,
online workers send their location to the SC-server, and the
SC-server assigns tasks to nearby workers.
WST is the simpler protocol, and it does not require work-
ers to share their locations with the SC-server. However, the
assignment is often sub-optimal, as workers do not have a
global system view. Workers typically choose the closest
task to them, which may cause multiple workers to travel to
the same task, while many other tasks remain unassigned.
The SAT mode incurs the overhead of running complex
matching algorithms at the SC-server, but the best-suited
worker is selected for a task. This requires the SC-server to
know the workers’ locations, which poses a privacy threat.
In our work, we consider the SAT mode, but we also pro-
vide location privacy protection for the workers. Instead of
directly disclosing their coordinates to the SC-server, worker
locations are first pooled together by a CSP and sanitized
according to differential privacy. This introduces significant
challenges, as the SC-server has to employ far more complex
task assignment strategies that must take into account the
uncertain nature of the received location data.
2.2 Differential Privacy
Differential Privacy (DP) [5] has emerged as the de-facto
standard in data privacy, thanks to its strong protection
guarantees rooted in statistical analysis. DP is a seman-
tic model which provides protection against realistic adver-
saries with access to background information. DP ensures
that an adversary is not able to learn from the sanitized
data whether a particular individual is present or not in the
original data, regardless of the adversary’s prior knowledge.
DP allows interaction with a database only by means of
aggregate (e.g., count, sum) queries. Random noise is added
to each query result to preserve privacy, such that an adver-
sary that attempts to attack the privacy of some individual
worker w will not be able to distinguish from the set of query
results (called a transcript) whether a record representing w
is present or not in the database.
Definition 1 (-indistinguishability). Consider that
a database produces transcript U on the set of queries QS =
{Q
1
, Q
2
, . . . , Q
q
}, and let > 0 be an arbitrarily-small real
constant. Then, transcript U satisfies -indistinguishability
if for every pair of sibling datasets D
1
, D
2
such that |D
1
| =
|D
2
| and D
1
, D
2
differ in only one record, it holds that
ln
P r[QS
D
1
= U]
P r[QS
D
2
= U]

In other words, an attacker cannot learn whether the tran-
script was obtained by answering the query set QS on dataset
D
1
or D
2
. Parameter is called privacy budget, and speci-
fies the amount of protection required, with smaller values
corresponding to stricter privacy protection. To achieve -
indistinguishability, DP injects noise into each query result,
and the amount of noise required is proportional to the sen-
sitivity of the query set QS, formally defines as:
Definition 2 (L
1
-Sensitivity). Given any arbitrary
sibling datasets D
1
and D
2
, the sensitivity of query set QS
is the maximum change in the query results of D
1
and D
2
σ(QS) = max
D
1
,D
2
q
X
i=1
|QS
D
1
QS
D
2
|
A sufficient condition to achieve differential privacy with pa-
rameter is to add to each query result randomly distributed
Laplace noise with mean λ = σ(QS)/ [6].
Typically, the interaction with a dataset consists of a se-
ries of analyses (i.e., transcripts) A
i
, each required to satisfy
i
-differential privacy. Then, the privacy level of the result-
ing analysis can be computed as follows:
Theorem 1 (Sequential Composition [19]). Let A
i
be a set of analyses such that each provides ε
i
-DP. Then,
running in sequence all analyses A
i
provides (
P
i
ε
i
)-DP.
Theorem 2 (Parallel Composition [19]). If D
i
are
disjoint subsets of the original database, and A
i
is a set of
analyses each providing ε
i
-DP, then applying each analysis
A
i
on partition D
i
provides max (
i
)-DP.
2.3 Private Spatial Decompositions (PSD)
The work in [3] introduced the concept of Private Spatial
Decompositions (PSD) to release spatial datasets in a DP-
compliant manner. A PSD is a spatial index transformed
according to DP, where each index node is obtained by re-
leasing a noisy count of the data points enclosed by that
node’s extent. Various index types such as grids, quad-trees
or k-d trees [24] can be used as a basis for PSD.
Accuracy of PSD is heavily influenced by the type of
PSD structure and its parameters (e.g., height, fan-out).
With space-based partitioning PSD, the split position for a
node does not depend on worker locations. This category
includes flat structures such as grids, or hierarchical ones
such as BSP-trees (Binary Space Partitioning) and quad-
trees [24]. The privacy budget needs to be consumed only
when counting the workers in each index node. Typically,
all nodes at same index level have non-overlapping extents,
which yields a constant and low sensitivity of 2 per level
(i.e., changing a single location in the data may affect at
most two partitions in a level). The budget is best dis-
tributed across levels according to the geometric allocation
[3], where leaf nodes receive more budget than higher levels.
The sequential composition theorem applies across nodes on
the same root-to-leaf path, whereas parallel composition ap-
plies to disjoint paths in the hierarchy. Space-based PSD are
simple to construct, but can become unbalanced.
Object-based structures such as k-d trees and R-trees [3]
perform splits of nodes based on the placement of data
points. To ensure privacy, split decisions must also be done
according to DP, and significant budget may be used in the
process. Typically, the exponential mechanism [3] is used to
assign a merit score to each candidate split point according
to some cost function (e.g., distance from median in case of
k-d trees), and one value is randomly picked based on its
noisy score. The budget must be split between protecting
node counts and building the index structure. Object-based
PSD are more balanced in theory, but they are not very ro-
bust, in the sense that accuracy can decrease abruptly with
only slight changes of the PSD parameters, or for certain
input dataset distributions.
The recent work in [23] compares tree-based methods with
multi-level grids, and shows that two-level grids tend to per-
form better than recursive partitioning counterparts. The
paper also proposes an Adaptive Grid (AG) approach, where
the granularity of the second-level grid is chosen based on
the noisy counts obtained in the first-level (sequential com-
position is applied). AG is a hybrid which inherits the sim-
plicity and robustness of space-based PSD, but still uses a
small amount of data-dependent information in choosing the
granularity for the second level. In our work, we adapt the
AG method to address SC-specific requirements.
3. PRIVACY FRAMEWORK
Section 3.1 presents the system model and the workflow
for privacy-preserving SC. Section 3.2 outlines the privacy
model and assumptions. Section 3.3 discusses design chal-
lenges and associated performance metrics.
3.1 System Model
We consider the problem of privacy-preserving SC task
assignment in the SAT mode. Figure 1 shows the proposed
system architecture. Workers send their locations (Step 0)
to a trusted cellular service provider (CSP) which collects
updates and releases a PSD according to privacy budget
mutually agreed upon with the workers. The PSD is ac-
cessed by the SC-server (Step 1), which also receives tasks
from a number of requesters (Step 2). For simplicity, we fo-
cus on the single-SC-server case, but our system model can
support multiple SC-servers.
When the SC-server receives a task t, it queries the PSD
to determine a geocast region (GR) that encloses with high
probability workers in relative proximity to t. Due to the
uncertain nature of the PSD, this is a challenging process
which will be detailed later in Section 5. Next, the SC-server
initiates a geocast communication [22] process (Step 3) to
disseminate t to all workers within GR. According to DP,
sanitizing a dataset requires creation of fake locations in the
PSD. If the SC-server is allowed to directly contact work-
ers, then failure to establish a communication channel would
breach privacy, as the SC-server is able to distinguish fake
workers from real ones. Using geocast is a unique feature
of our framework which is necessary to achieve protection.
Geocast can be performed either with the help of the CSP
infrastructure, or through a mobile ad-hoc network where
the CSP contacts a single worker in the GR, and then the
message is disseminated on a hop-by-hop basis to the entire
GR. The latter approach keeps CSP overhead low, and can
reduce operation costs for workers.
Upon receiving request t, a worker w decides whether to
perform the task or not. If yes (Step 4), she sends a consent
message to the SC-server confirming w’s availability (alter-
natively, the consent can be directly sent to the requester).
If w is not willing to participate in the task, then no consent
is sent, and no information about the worker is disclosed.

2. Task Request t
Requesters
Workers
SC-Server
Worker
Database
1. Sanitized Release
PSD
4. Consent
Cell Service
Provider
GR
0. Report Locations
3. Geocast {t,GR}
Figure 1: Privacy framework for spatial crowdsourcing
3.2 Privacy Model and Assumptions
Our specific objective is to protect both the location and
the identity of workers during task assignment. Once a
worker consents to a task, the worker herself may directly
disclose information to the task requester (e.g., to enable
a communication channel between worker and requester).
However, such additional disclosure is outside our scope, as
each worker has the right to disclose his or her individual
information. Our focus is on what happens prior to consent,
when worker location and identity must be protected from
both task requesters and the SC server.
Focusing on the SC assignment step is important, given
the fact that SC workers have to travel to the task loca-
tion. Mere completion of a task discloses the fact that some
worker must have been at that location, and this sort of
disclosure is unavoidable in SC. To protect her location af-
ter consent, a worker can still enjoy some form of identity
protection (e.g., using pseudonyms and anonymous routing),
for which solutions are already available (e.g., TOR). On the
other hand, no solution exists to date for the more challeng-
ing problem of privacy-preserving task assignment, hence we
direct our efforts in this direction.
Furthermore, focusing on task assignment also makes sense
from a disclosure volume standpoint. During assignment, all
workers are candidates for participation, therefore locations
of all workers would be exposed, absent a privacy-preserving
mechanism. On the other hand, after task request dissemi-
nation, only few workers will participate in task completion,
and only if they give their explicit consent.
Workers cannot trust the SC-server, especially as there
may be many such entities with diverse backgrounds, e.g.,
private companies, non-profits, government organizations,
academic institutions. On the other hand, the CSP already
has a signed agreement with workers through the service
contract, so there is already a trust relationship established,
as well as mutually-agreed upon rules for data disclosure.
Furthermore, the CSP already knows where subscribers are,
e.g., using cell tower triangulation, so worker location re-
porting does not introduce additional disclosure.
However, the CSP has no expertise, and perhaps no finan-
cial interest, to host an SC service, which needs to deal with
a diverse set of issues such as interacting with various task
requester categories, managing profiles (e.g., some workers
may only volunteer for environmental tasks), etc. The role
of the CSP is to aggregate locations from subscribed work-
ers, transform them according to DP, and release the data
in sanitized form to one or more SC-servers for assignment.
As multiple SC-servers can use the same PSD, it is practical
for the CSP to provide PSDs for a small fee, e.g., a percent-
age of the workers’ payment, or a tax incentive in the case
of public-interest SC applications.
3.3 Design Goals and Performance Metrics
Protecting worker locations significantly complicates task
assignment, and may reduce the effectiveness and efficiency
of worker-task matching. Due to the nature of DP, it is
possible for a region to contain no workers, even if the PSD
shows a positive count. Therefore, no workers (or an insuf-
ficient number thereof) may be notified of the task request.
The task may not be completed. Alternatively, a worker
may be notified of the task even though she is at a long dis-
tance away from the task location, whereas a nearer worker
does not receive the request. Finally, in the non-private SAT
case, only one selected worker, whose location and identity
are known, is notified of the task request. With location
protection, many redundant messages may need to be sent,
increasing system overhead.
Therefore, we focus on the following performance metrics:
Assignment Success Rate (ASR). Due to PSD
data uncertainty, the SC-server may fail to assign work-
ers to tasks (e.g., no worker is reached, or task is too
far and workers do not accept it). ASR measures the
ratio of tasks accepted by a worker
1
to the total num-
ber of task requests. The challenge is to keep ASR
close to 100%.
Worker Travel Distance (WTD). The SC-server
is no longer able to accurately evaluate worker-task
distance, hence workers may have to travel long dis-
tances to tasks. The challenge is to keep the worker
travel distance low, even when exact worker locations
are not known.
System Overhead. Dealing with imprecise locations
increases the complexity of assignment algorithms, which
poses scalability problems. A significant metric to
measure overhead is the a
verage number of notified
workers (ANW). This number affects both the com-
munication overhead required to geocast task requests,
as well as the computational overhead of the matching
algorithm, which depends on how many workers need
to be notified of a task request.
4. BUILDING THE WORKER PSD
The first step consists of building a PSD (at the CSP side)
to be later used for task assignment at the SC-server. Build-
ing the PSD is an essential step, because it determines how
accurate is the released data, which in turn affects ASR,
WTD and ANW . In this section, we modify the state-of-
the-art Adaptive Grid (AG) method proposed in [23] to ad-
dress the specific requirements of the SC framework. Table 1
summarizes the notations used in our paper.
PSDs based on uniform grids treat all regions in the dataset
identically, despite large variances in location density. As a
result, they over-partition the space in sparse regions, and
1
ASR does not capture worker reliability, tasks may still fail
to complete after being accepted. Our focus is on assignment
success, reliability is outside our scope.

Symbol Definition
ε, ε
i
Total privacy budget and level-i budget
α AG budget split, α = 0.5 means ε
1
= ε
2
N Total number of workers
N
0
Noisy worker count of level-1 cells
m
i
× m
i
Level-i grid granularity
¯n Expected noisy worker count of a level-2 cell
t A task or its location, used interchangeably
c
i
A level-2 cell
n
c
i
Noisy worker count of c
i
p
a
c
i
Acceptance rate of workers within c
i
c
0
i
Sub-cell of cell c
i
Table 1: Summary of Notations
under-partition in dense regions. AG avoids these draw-
backs by using a two-level grid and variable cell granular-
ity. At the first level, AG creates a coarse-grained, fixed-
size m
1
× m
1
grid over the data domain. AG uses a data-
independent heuristic to choose level-1 granularity as
m
1
= max(10,
l
1
4
r
N ×
k
1
m
)
where N is the total number of locations and k
1
= 10 [23].
Next, AG issues m
2
1
count queries, one for each level-1
cell, using a fraction of the total privacy budget:
1
= ×α,
where 0 < α < 1. AG then partitions each level-1 cell into
m
2
× m
2
level-2 cells, where m
2
is adaptively chosen based
on the noisy count N
0
of the level-1 cell:
m
2
=
l
r
N
0
×
2
k
2
m
(1)
where
2
=
1
is the remaining budget, and the constant
is set empirically to k
2
= 5. Parameter α determines how
privacy budget is divided between the two levels.
Figure 2 shows a snapshot of an adaptive grid, with four
level-1 cells A,B,C,D. Constructing a differentially private
AG requires two steps. First, the noisy counts N
0
of A,B,C,D
are computed by adding random Laplace noise with mean
λ
1
= 2
1
to the actual counts of these cells. Second, based
on the noisy counts, level-1 cells are further split into level-2
cells. According to Eq. (1), cell D, which has noisy count
200 is partitioned according to a 3x3 grid, while the gran-
ularity for other cells is 2x2. Thereafter, AG adds to each
level-2 cell (c
i
, i = 1..21) random Laplace noise with mean
λ
2
= 2
2
. Finally, their corresponding noisy counts n
c
i
together with the structure of the AG are published. Ac-
cording to Theorem 2, the sanitized release of AG provides
ε-DP.
A B
C D
Level 1
Level 2
1
c
2
c
3
c
4
c
5
c
6
c
7
c
8
c
9
c
10
c
c
c
13
c
14
c
16
c
17
c
15
c
18
c
c
c
c
)100(
'
=
A
N )100(
'
=
B
N
)100(
'
=
C
N
)200(
'
=
D
N
11
c
12
c
19
c
20
c
21
c
Figure 2: A snapshot of adaptive grid (ε = 0.5, α = 0.5)
Although AG was shown to yield good results for general-
purpose spatial queries [23], it is not directly applicable to
SC, due to its rigidity in choosing its parameters. Specif-
ically, the granularity m
2
of the level-2 grid is too coarse,
leading to large geocast areas and high communication over-
head, as we show next. According to Eq. (1), the expected
number of workers (i.e., noisy count) in a level-2 cell is:
¯n = N
0
/m
2
2
k
2
/
2
Table 2a presents different values of m
2
and ¯n when varying
total budget with α = 0.5. Note that, the values of ¯n are
rather large, especially for more restrictive privacy settings
(i.e., lower ). For = 0.1, ¯n is 100. In practice, a geocast
region is likely to include multiple PSD cells, hence 100 is a
lower bound on the ANW , while its typical values can grow
much higher, leading to prohibitive communication cost.
ε ε
2
m
2
¯n
1 0.5 3 11
0.5 0.25 2 25
0.1 0.05 1 100
(a) Original AG (k
2
= 5)
ε ε
2
m
2
¯n
1 0.5 6 2.8
0.5 0.25 5 5.6
0.1 0.05 2 28.2
(b) Modified AG (k
2
=
2)
Table 2: Granularity m
2
and average count per cell ¯n (N
0
= 100)
We propose a more suitable heuristic for choosing k
2
. Re-
call that the primary requirement of SC task assignment is
to achieve high ASR. To that extent, we want to ensure
that the task request is geocast in a non-empty region, i.e.,
the real worker count is strictly positive. According to the
Laplace mechanism of DP, each PSD count is the sum of
noisy and real counts. Given the level-2 privacy budget
2
,
we can also quantify the distribution of added noise, which
has standard deviation µ =
2/
2
. Therefore, if the PSD
count is larger than µ, then with high probability there will
be at least one worker in the level-2 cell.
We increase granularity m
2
in order to decrease overhead,
but only to the point where there is at least one worker in
a cell. Denote by count
P SD
the value reported by PSD for
a certain level-2 cell. Given a Lap(1
2
) distribution, the
probability that the noisy count is larger than zero is:
p
h
= 1
1
2
exp(
count
P SD
1/
2
)
Furthermore, we want to have the PSD count larger than
the noise, i.e., ¯n = k
2
2
2
2
, so at the limit we set
k
2
=
2. The resulting probability of having non-empty
cells is p
h
= 1
1
2
exp(
2) = 0.88. According to Eq. (1),
the corresponding granularity is m
2
=
l
q
N
0
ε
2
/
2
m
.
In summary, we modify AG by carefully reducing the
granularity threshold at level-2 such that ANW is reduced,
while the probability for each level-2 cell to contain a real
worker is at least 88%. Table 2b shows that this new set-
ting significantly reduces ¯n, and as a result ANW . Next, we
present a search strategy which groups cells together such
that the achieved ASR is above a given threshold.
5. TASK ASSIGNMENT
When a request for a task t is posted, the SC-server
queries the PSD and determines a geocast region GR where
the task is disseminated. The goal of the SC-server is to
obtain a high success rate for task assignment, while at the
same time reducing the worker travel distance WTD and
request dissemination overhead ANW .

Citations
More filters
Journal ArticleDOI

A Private and Efficient Mechanism for Data Uploading in Smart Cyber-Physical Systems

TL;DR: This article proposes a novel mechanism for data uploading in smart cyber-physical systems, which considers both energy conservation and privacy preservation, and proposes a heuristic algorithm that achieves an energy-efficient scheme for data upload by introducing an acceptable number of extra contents.
Journal ArticleDOI

CrowdBC: A Blockchain-Based Decentralized Framework for Crowdsourcing

TL;DR: A blockchain-based decentralized framework for crowdsourcing named CrowdBC is conceptualized, in which a requester's task can be solved by a crowd of workers without relying on any third trusted institution, users’ privacy can be guaranteed and only low transaction fees are required.
Proceedings ArticleDOI

Online mobile Micro-Task Allocation in spatial crowdsourcing

TL;DR: This paper identifies a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem, and proposes a two-phase-based framework, based on which the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model is presented.
Journal ArticleDOI

Differentially Private Data Publishing and Analysis: A Survey

TL;DR: This survey compares the diverse release mechanisms of differentially private data publishing given a variety of input data in terms of query type, the maximum number of queries, efficiency, and accuracy.
Journal ArticleDOI

Crowdsourced Data Management: A Survey

TL;DR: This paper surveys and synthesizes a wide spectrum of existing studies on crowdsourced data management and outlines key factors that need to be considered to improve crowdsourcing data management.
References
More filters
Book ChapterDOI

Calibrating noise to sensitivity in private data analysis

TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Book ChapterDOI

Differential privacy

TL;DR: In this article, the authors give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, and suggest a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database.
Journal ArticleDOI

L-diversity: Privacy beyond k-anonymity

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
Journal Article

Calibrating noise to sensitivity in private data analysis

TL;DR: The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.
Book

The design and analysis of spatial data structures

TL;DR: The design and analysis of spatial data structures and applications for predicting stock returns and remembering and imagining palestine identity and service manual are studied.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What have the authors contributed in "A framework for protecting worker location privacy in spatial crowdsourcing" ?

In this paper, the authors introduce a framework for protecting location privacy of workers participating in SC tasks. The authors argue that existing location privacy techniques are not sufficient for SC, and they propose a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers. The authors investigate analytical models and task assignment strategies that balance multiple crucial aspects of SC functionality, such as task completion rate, worker travel distance and system overhead. 

As future work, the authors will extend their framework to also protect privacy of task locations. 

Due to PSD data uncertainty, the SC-server may fail to assign workers to tasks (e.g., no worker is reached, or task is too far and workers do not accept it). 

As expected, a higher acceptance rate yields lower overhead and shorter travel distance, as workers are more willing to accept tasks. 

To obtain a higher probability of task acceptance, the GR construction algorithm will generate a larger geocast region, leading to increased overhead, as measured by ANW , HOP and WTD . 

An efficient solution to find the smallest enclosing circle is a randomized algorithm [26] that runs in linear time to the number of data points in the region. 

One widely accepted measure proposed in [17] is the Digital Compactness Measurement (DCM), which measures region compactness as the ratio between the area of the region and the area of its smallest circumscribing circle. 

To evaluate the effectiveness of using compactness in the GR search strategy, the authors use as metric an estimation of the hop count required to disseminate the task request to all workers, given the communication range of the wireless network (e.g., 50-100 meters for WiFi). 

Protecting worker locations significantly complicates task assignment, and may reduce the effectiveness and efficiency of worker-task matching. 

To create sanitized data releases at the CSP, the authors adopt the Private Spatial Decomposition (PSD) approach, first introduced in [3]. 

Building the PSD is an essential step, because it determines how accurate is the released data, which in turn affects ASR, WTD and ANW . 

As multiple SC-servers can use the same PSD, it is practical for the CSP to provide PSDs for a small fee, e.g., a percentage of the workers’ payment, or a tax incentive in the case of public-interest SC applications. 

ensuring location privacy is an essential aspect of SC, because mobile users will not accept to engage in spatial tasks if their privacy is violated. 

Workers cannot trust the SC-server, especially as there may be many such entities with diverse backgrounds, e.g., private companies, non-profits, government organizations, academic institutions. 

Object-based PSD are more balanced in theory, but they are not very robust, in the sense that accuracy can decrease abruptly with only slight changes of the PSD parameters, or for certain input dataset distributions. 

it is cheaper to geocast within a shape with less skew, such as a circle or a square, as opposed to skewed regions such as line-shaped areas, which have large network diameter.