scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Truth inference in crowdsourcing: is the problem solved?

01 Jan 2017-Vol. 10, Iss: 5, pp 541-552
TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.
Abstract: Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates addressing problems that are hard for computers, e.g., entity resolution and sentiment analysis. However, due to the openness of crowdsourcing, workers may yield low-quality answers, and a redundancy-based method is widely employed, which first assigns each task to multiple workers and then infers the correct answer (called truth) for the task based on the answers of the assigned workers. A fundamental problem in this method is Truth Inference, which decides how to effectively infer the truth. Recently, the database community and data mining community independently study this problem and propose various algorithms. However, these algorithms are not compared extensively under the same framework and it is hard for practitioners to select appropriate algorithms. To alleviate this problem, we provide a detailed survey on 17 existing algorithms and perform a comprehensive evaluation using 5 real datasets. We make all codes and datasets public for future research. Through experiments we find that existing algorithms are not stable across different datasets and there is no algorithm that outperforms others consistently. We believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions.

Summary (3 min read)

1. INTRODUCTION

  • Crowdsourcing solutions have been proposed to address tasks that are hard for machines, e.g., entity resolution [8] and sentiment analysis [32].
  • To address this problem, one can label the ground truth for a small portion of tasks (called golden tasks) and use them to estimate workers’ quality.
  • These algorithms are not compared under the same experimental framework and it is hard for practitioners to select appropriate algorithms.
  • To summarize, the authors make the following contributions: We survey 17 existing algorithms, summarize a framework (Section 3), and provide an in-depth analysis and summary on the 17 algorithms in different perspectives (Sections 4-5), which can help practitioners to easily grasp existing truth inference algorithms.

2. PROBLEM DEFINITION

  • Each task asks workers to answer the task.
  • A direct extension of single-choice task is multiple-choice task, where workers can select multiple choices (not only a single choice) out of a set of candidate choices.
  • In image tagging, given a set of candidate tags for an image, it asks workers to select the tags that the image contains.
  • Let vwi denote the worker w’s answer for task ti, and the set of answers V = {vwi } contains the collected workers’ answers for all tasks.

3. SOLUTION FRAMEWORK

  • A naive solution is Majority Voting (MV) [20, 39, 37], which regards the choice answered by majority workers as the truth.
  • The authors discuss how existing works model a task in Section 4.1.
  • The two iterations will run until convergence, also known as Convergence (lines 9-11).
  • Finally the inferred truth and workers’ qualities are returned.
  • For the 1st iteration, in step 1, it computes each task’s truth from workers’ answers by considering which choice receives the highest aggregated workers’ qualities.

4.1 Task Modeling

  • 1.1 Task Difficulty Different from most existing works which assume that a worker has the same quality for answering different tasks, some recent works [53, 35] model the difficulty in each task.
  • They assume that each task has its difficulty level, and the more difficult a task is, the harder a worker can correctly answer the task.
  • The basic idea is to exploit the diverse topics in a task, where the topic number (i.e., K) is pre-defined.
  • Existing studies [19, 35] make use of the text description in each task and adopt topic model techniques [6, 56] to generate a vector of sizeK for the task; while Multi [51] learns aK-size vector without referring to external information (e.g., text descriptions).
  • Based on the task models, a worker is probable to answer a task correctly if the worker has high qualities on the task’s related topics.

4.2 Worker Modeling

  • 2.1 Worker Probability Worker probability uses a single real number (between 0 and 1) to model a worker w’s quality qw ∈ [0, 1], which represents the ability that workerw correctly answers a task.
  • Some recent works [53, 31] extend the worker probability to model a worker’s quality in a wider range, e.g., qw ∈ (−∞,+∞), and a higher qw means the worker w’s higher quality in answering tasks.
  • 2.3 Worker Bias and Worker Variance Worker bias and variance [51, 41] are proposed to handle nu- meric tasks, where worker bias captures the effect that a worker may underestimate (or overestimate) the truth of a task, and worker variance captures the variation of errors around the bias.
  • A worker may have various levels of expertise for different topics.
  • A sports fan that rarely pays attention to entertainment may answer tasks related to sports more correctly than tasks related to entertainment.

5.2 Optimization

  • The basic idea of optimization methods is to set a self-defined optimization function that captures the relations between workers’ qualities and tasks’ truth, and then derive an iterative method to compute these two sets of parameters collectively.
  • The differences among existing works [5, 31, 30, 61] are that they model workers’ qualities differently and apply different optimization functions to capture the relations between the two sets of parameters.
  • By capturing the intuitions, similar to Algorithm 1, PM [5, 31] develops an iterative approach, and in each iteration, it adopts the two steps as illustrated in Section 3.
  • Finally [61] devises an iterative approach to infer the two sets of parameters {v∗i } and {πw}.

5.3 Probabilistic Graphical Model (PGM)

  • A probabilistic graphical model (PGM) [28] is a graph which expresses the conditional dependency structure (represented by edges) between random variables (represented by nodes).
  • Figure 1 shows the general PGM adopted in existing works.
  • Thus ZC [16] applies the EM (Expectation-Maximization) framework [17] and iteratively updates qw and v∗i to approximate its optimal value.
  • The above method D&S [15], which models a worker as a confusion matrix, is also a widely used model.
  • [35] combines the process of topic model (i.e., TwitterLDA [56]) and truth inference together, and [59] leverages entity linking and knowledge base to exploit a worker’s diverse skills.

6. EXPERIMENTS

  • The authors evaluate 17 existing methods (Table 4) on real datasets.
  • The authors first introduce the experimental setup (Section 6.1), and then analyze the quality of collected crowdsourced data (Section 6.2).
  • Finally the authors compare with existing methods (Section 6.3).
  • The authors implement the experiments in Python on a server with CPU 2.40GHz and 60GB memory.

6.1 Experimental Setup

  • There are many public crowdsourcing datasets [13].
  • In Table 5, for each selected dataset, the authors list four statistics: the number of tasks, or #tasks (n), #collected answers (|V |), the average number of answers for each task (|V |/n), #truth (some large datasets only provide a subset as ground truth) and #workers (|W|).
  • Each task in the dataset contains two products (with descriptions) and two choices (T, F), and it asks workers to identify whether the claim “the two products are the same” is true (‘T’) or false (‘F’).
  • A higher score means a higher degree for the emotion.
  • The authors use different metrics for different task types.

6.2 Crowdsourced Data Quality

  • And then answer them.the authors.
  • In Figure 3, for each dataset, the authors show each worker’s quality, computed based on comparing worker’s answers with tasks’ truth.
  • (1) The Quality of Different Methods in Different Datasets.
  • Other methods with more complicated task models and worker models do not express their benefits in quality.
  • (1) In terms of quality, the methods with Optimization and PGM are more effective than the methods with Direct Computation, as they consider more parameters and study how to infer them iteratively.

7. CONCLUSION & FUTURE DIRECTIONS

  • The authors provide a detailed survey on truth inference in crowdsourcing and perform an in-depth analysis of 17 existing methods.
  • The authors also conduct sufficient experiments to compare these methods on 5 datasets with varying task types and sizes.
  • In order to collect high quality crowdsourced data in an efficient way, it is important to design tasks with friendly User Interface (UI) with a feasible price.
  • M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Truth Inference in Crowdsourcing: Is the Problem Solved?
Yudian Zheng
, Guoliang Li
#
, Yuanbing Li
#
, Caihua Shan
, Reynold Cheng
#
Department of Computer Science, Tsinghua University
Department of Computer Science, The University of Hong Kong
ydzheng2@cs.hku.hk, liguoliang@tsinghua.edu.cn
yb-li16@mails.tsinghua.edu.cn, chshan@cs.hku.hk, ckcheng@cs.hku.hk
ABSTRACT
Crowdsourcing has emerged as a novel problem-solving paradigm,
which facilitates addressing problems that are hard for comput-
ers, e.g., entity resolution and sentiment analysis. However, due to
the openness of crowdsourcing, workers may yield low-quality an-
swers, and a redundancy-based method is widely employed, which
first assigns each task to multiple workers and then infers the cor-
rect answer (called truth) for the task based on the answers of the
assigned workers. A fundamental problem in this method is Truth
Inference, which decides how to effectively infer the truth. Re-
cently, the database community and data mining community in-
dependently study this problem and propose various algorithms.
However, these algorithms are not compared extensively under the
same framework and it is hard for practitioners to select appropriate
algorithms. To alleviate this problem, we provide a detailed survey
on 17 existing algorithms and perform a comprehensive evaluation
using 5 real datasets. We make all codes and datasets public for
future research. Through experiments we find that existing algo-
rithms are not stable across different datasets and there is no algo-
rithm that outperforms others consistently. We believe that the truth
inference problem is not fully solved, and identify the limitations
of existing algorithms and point out promising research directions.
1. INTRODUCTION
Crowdsourcing solutions have been proposed to address tasks
that are hard for machines, e.g., entity resolution [8] and sentiment
analysis [32]. Due to the wide deployment of public crowdsourc-
ing platforms, e.g., Amazon Mechanical Turk (AMT) [2], Crowd-
Flower [12], the access to crowd becomes much easier. As reported
in [1], more than 500K workers from 190 countries have performed
tasks on AMT [2]. The database community has shown great in-
terests in crowdsourcing (see a survey [29]). Several crowdsourced
databases (e.g., CrowdDB [20], Deco [39], Qurk [37]) are built to
incorporate the crowd into query processing, and there are many
studies on implementing crowdsourced operators, e.g., Join [50,
36, 52, 11], Max [47, 22], Top-k [14, 55], Group-by [14], etc.
Due to the openness of crowdsourcing, the crowd (called work-
ers) may yield low-quality or even noisy answers. Thus it is impor-
This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For
any use beyond those covered by this license, obtain permission by emailing
info@vldb.org.
Proceedings of the VLDB Endowment, Vol. 10, No. 5
Copyright 2017 VLDB Endowment 2150-8097/17/01.
tant to control the quality in crowdsourcing. To address this prob-
lem, most of existing crowdsourcing studies employ a redundancy-
based strategy, which assigns each task to multiple workers and
aggregates the answers given by different workers to infer the cor-
rect answer (called truth) of each task. A fundamental problem,
called Truth Inference, is widely studied in existing crowdsourcing
works [34, 16, 15, 53, 51, 41, 26, 33, 61, 19, 35, 30, 27, 10, 46, 5,
31], which decides how to effectively infer the truth for each task.
To address the problem, a straightforward approach is Majority
Voting (MV), which takes the answer given by majority workers
as the truth. However, the biggest limitation of MV is that it re-
gards all workers as equal. In reality, workers may have different
levels of qualities: a high-quality worker carefully answers tasks;
a low-quality (or spammer) may randomly answer tasks in order
to deceive money; a malicious worker may even intentionally give
wrong answers. Thus it is important to capture each worker’s qual-
ity, which can better infer the truth of each task by trusting more on
the answers given by workers with higher qualities.
However, the ground truth of each task is unknown and it is hard
to estimate a worker’s quality. To address this problem, one can
label the ground truth for a small portion of tasks (called golden
tasks) and use them to estimate workers’ quality. There are two
types of methods to utilize golden tasks. The first is qualification
test. Each worker requires to perform a set of golden tasks before
she can really answer tasks, and her quality is computed based on
her answering performance for these golden tasks. The second is
hidden test. The golden tasks are mixed into the tasks and the
workers do not know which are golden tasks. A worker’s quality
is computed based on her answering performance on these golden
tasks. However, the two approaches have some limitations. (1)
For qualification test, workers require to answer these “extra” tasks
without pay, and many workers do not want to answer such tasks.
(2) For hidden test, it is a waste to pay the “extra” tasks. (3) The
two techniques may not improve the quality (see Section 6).
Considering these limitations, the database community [34, 19,
35, 24, 30, 31, 58] and data mining community [16, 53, 15, 61,
27, 46, 41, 51, 26, 33, 5] independently study this problem and
propose various algorithms. However, these algorithms are not
compared under the same experimental framework and it is hard
for practitioners to select appropriate algorithms. To alleviate this
problem, we provide a comprehensive survey on existing truth in-
ference algorithms. We summarize them in terms of task types, task
modeling, worker modeling, and inference techniques. We conduct
a comprehensive comparison of 17 existing representative meth-
ods [16, 53, 15, 61, 27, 46, 41, 30, 5, 31, 51, 26, 33], experimen-
tally compare them on 5 real datasets with varying sizes and task
types in real crowdsourcing platforms, make a deep analysis on the
experimental results, and provide extensive experimental findings.
541

Table 1: A Product Dataset.
ID Product Name
r
1
iPad Two 16GB WiFi White
r
2
iPad 2nd generation 16GB WiFi White
r
3
Apple iPhone 4 16GB White
r
4
iPhone 4th generation White 16GB
Table 2: Collected Workers’ Answers for All Tasks.
t
1
: t
2
: t
3
: t
4
: t
5
: t
6
:
(r
1
=r
2
) (r
1
=r
3
) (r
1
=r
4
) (r
2
=r
3
) (r
2
=r
4
) (r
3
=r
4
)
w
1
F T T F F F
w
2
F F T T F
w
3
T F F F F T
To summarize, we make the following contributions:
We survey 17 existing algorithms, summarize a framework (Sec-
tion 3), and provide an in-depth analysis and summary on the 17
algorithms in different perspectives (Sections 4-5), which can help
practitioners to easily grasp existing truth inference algorithms.
We experimentally conduct a thorough comparison of these meth-
ods on 5 datasets with varying sizes, publicize our codes and datasets
[40], and provide experimental findings, which give guidance for
selecting appropriate methods under various scenarios (Section 6).
We find that the truth inference problem is not fully solved, iden-
tify the limitations of existing algorithms, and point out several
promising research directions (Section 7).
2. PROBLEM DEFINITION
DEFINITION 1 (TASK). A task set T contains n tasks, i.e.,
T = {t
1
, t
2
, . . . , t
n
}. Each task asks workers to answer the task.
Existing studies mainly focus on three types of tasks.
Decision-Making Tasks. A decision-making task has a claim and
asks workers to make a decision on whether the claim is true (de-
noted as ‘T’) or false (denoted as ‘F’). Decision-making tasks are
widely used and studied in existing crowdsourcing works [34, 16,
15, 53, 51, 41, 26, 33, 61, 19, 35, 30, 27, 46, 5] because of its
conceptual simplicity.
Next we take entity resolution as an example, which tries to find
pairs of products in Table 1 that refer to the same real-world en-
tity. A straightforward way is to generate a task set T = {(r
1
=r
2
),
(r
1
=r
3
), (r
1
=r
4
), (r
2
=r
3
), (r
2
=r
4
), (r
3
=r
4
)} with n = 6 decision-
making tasks, where each task has two choices: (true, false), and
asks workers to select a choice for the task. For example, t
2
(or
r
1
=r
3
) asks whether the claim iPad Two 16GB WiFi White = Ap-
ple iPhone 4 16GB White is true (‘T’) or false (‘F’). Tasks are then
published to crowdsourcing platforms (e.g., AMT [2]) and work-
ers’ answers are collected.
Single-Choice (and Multiple-Choice) Tasks. A single-choice task
contains a question and a set of candidate choices, and asks work-
ers to select a single choice out of the candidate choices. For ex-
ample, in sentiment analysis, a task asks workers to select the sen-
timent (‘positive’, neutral’, negative’) of a given tweet. Decision-
making task is a special case of single-choice task, with two special
choices (‘T’ and ‘F’). The single-choice tasks are especially studied
in [34, 16, 15, 53, 41, 61, 35, 30, 27, 46, 5]. A direct extension of
single-choice task is multiple-choice task, where workers can select
multiple choices (not only a single choice) out of a set of candidate
choices. For example, in image tagging, given a set of candidate
tags for an image, it asks workers to select the tags that the im-
age contains. However, as addressed in [60, 38], a multiple-choice
task can be easily transformed to a set of decision-making tasks,
e.g., for an image tagging task (multiple-choice), each transformed
decision-making task asks whether or not a tag is contained in an
image. Thus the methods in decision-making tasks can be directly
extended to handle multiple-choice tasks.
Table 3: Notations.
Notation Description
t
i
the i-th task (1 i n) and T = {t
1
, t
2
, . . . , t
n
}
w the worker w and W = {w} is the set of workers
W
i
the set of workers that have answered task t
i
T
w
the set of tasks that have been answered by worker w
v
w
i
the answer given by worker w for task t
i
V the set of workers’ answers for all tasks, i.e., V = {v
w
i
}
v
i
the (ground) truth for task t
i
(1 i n)
Numeric Tasks. The numeric task asks workers to provide a value.
For example, a task asks about the height of Mount Everest. Dif-
ferent from the tasks above, workers’ inputs are numeric values,
which have inherent orderings (e.g., compared with 8800m, 8845m
is closer to 8848m). Existing works [41, 30] especially study such
tasks by considering the inherent orderings between values.
Others. Besides the above tasks, there are other types of tasks,
e.g., translate a language to another [10], or ask workers to collect
data (e.g., the name of a celebrity) [20, 48]. However, it is hard
to control the quality for such “open” tasks. Thus they are rarely
studied in existing works [10, 20, 48]. In this paper, we focus only
on the above three tasks and leave other tasks for future work.
DEFINITION 2 (WORKER). A worker set W contains a set of
workers, i.e., W = {w}. Let W
i
denote the set of workers that
have answered task t
i
and T
w
denote the set of tasks that have
been answered by worker w.
DEFINITION 3 (ANSWER). Each task t
i
can be answered with
a subset of workers in W. Let v
w
i
denote the worker ws answer
for task t
i
, and the set of answers V = {v
w
i
} contains the collected
workers’ answers for all tasks.
Table 2 shows an example, with answers to T given by three
workers W = {w
1
, w
2
, w
3
}. (The empty cell means that the
worker does not answer the task.) For example, v
w
1
4
= F means
worker w
1
answers t
4
(i.e., r
2
= r
3
) with ‘F’, i.e., w
1
thinks that
r
2
6= r
3
. The set of workers that answer t
1
is W
1
= {w
1
, w
3
}, and
the set of tasks answered by worker w
2
is T
w
2
= {t
2
, t
3
, t
4
, t
5
, t
6
}.
DEFINITION 4 (TRUTH). Each task t
i
has a true answer, called
the ground truth (or truth), denoted as v
i
.
For the example task set T in Table 1, only pairs (r
1
= r
2
) and
(r
3
= r
4
) are true, and thus v
1
= v
6
= T, and others’ truth are F.
Based on the above notations, the truth inference problem is to
infer the (unknown) truth v
i
for each task t
i
based on V .
DEFINITION 5 (TRUTH INFERENCE IN CROWDSOURCING).
Given workers’ answers V , infer the truth v
i
of each task t
i
T .
Table 3 summarizes the notations used in the paper.
3. SOLUTION FRAMEWORK
A naive solution is Majority Voting (MV) [20, 39, 37], which re-
gards the choice answered by majority workers as the truth. Based
on Table 2, the truth derived by MV is v
i
= F for 2 i 6 and
it randomly infers v
1
to break the tie. The MV incorrectly infers
v
6
, and has 50% chance to infer v
1
wrongly. The reason is that MV
assumes that each worker has the same quality, and in reality, work-
ers have different qualities: some are experts or ordinary workers,
while others are spammers (who randomly answer tasks in order to
deceive money) or even malicious workers (who intentionally give
wrong answers). Take a closer look at Table 2, we can observe that
w
3
has a higher quality, and the reason is that if we do not consider
t
1
(which receives 1 ‘T’ and 1 ‘F’), then w
3
gives 4 out of 5 an-
swers that are reported by majority workers, while w
1
and w
2
give
both 3 out of 5, thus we should give higher trust to w
3
s answer and
in this way can infer all tasks’ truth correctly.
542

Based on the above discussions, existing works [16, 15, 53, 51,
41, 33, 26, 61, 62, 19, 35, 30, 46, 27, 5, 34] propose various ways
to model a worker’s quality. Although qualification test and hidden
test can help to estimate a worker’s quality, they require to label
tasks with truth beforehand, and a worker also requires to answer
these “extra” tasks. To address this problem, existing works [16,
15, 53, 51, 41, 33, 26, 61, 62, 19, 35, 30, 46, 27, 5, 34] estimate
each worker’s quality purely based on workers’ answers V . Intu-
itively, they capture the inherent relations between workers’ quali-
ties and tasks’ truth: for a task, the answer given by a high-quality
worker is highly likely to be the truth; conversely, for a worker,
if the worker often correctly answers tasks, then the worker will
be assigned with a high quality. By capturing such relations, they
adopt an iterative approach, which jointly infers both the workers’
qualities and tasks’ truth.
By capturing the above relations, the general approach adopted
by most of existing works [16, 15, 53, 51, 41, 33, 26, 61, 62, 19,
35, 30, 46, 27, 5, 34] is shown in Algorithm 1. The quality of each
worker w W is denoted as q
w
. In Algorithm 1, it first initializes
workers’ qualities randomly or using qualification test (line 1), and
then adopts an iterative approach with two steps (lines 3-11):
Step 1: Inferring the Truth (lines 3-5): it infers each task’s truth
based on workers’ answers and qualities. In this step, different task
types are handled differently. Furthermore, some existing works [53,
51] explicitly model each task, e.g., [53] regards that different tasks
may have different difficulties. We discuss how existing works
model a task in Section 4.1.
Step 2: Estimating Worker Quality (lines 6-8): based on work-
ers’ answers and each task’s truth (derived from step 1), it esti-
mates each worker’s quality. In this step, existing works model
each worker ws quality q
w
differently. For example, [16, 26, 33,
5] model q
w
as a single value, while [15, 41, 33, 27, 46] model q
w
as a matrix. We discuss worker’s models in Section 4.2.
Convergence (lines 9-11): the two iterations will run until conver-
gence. Typically to identify convergence, existing works will check
whether the change of two sets of parameters (i.e., workers’ quali-
ties and tasks’ truth) is below some defined threshold (e.g., 10
3
).
Finally the inferred truth and workers’ qualities are returned.
Running Example. Let us show how the method PM [31, 5] works
for Table 2. PM models each worker w as a single value q
w
[0, +) and a higher value implies a higher quality. Initially, each
worker w W is assigned with the same quality q
w
= 1. Then
the two steps devised in PM are as follows:
Step 1 (line 5): v
i
= argmax
v
P
w∈W
i
q
w
· 1
{v=v
w
i
}
;
Step 2 (line 8): q
w
= log
P
t
i
∈T
w
1
{v
i
6=v
w
i
}
max
w∈W
{
P
t
i
∈T
w
1
{v
i
6=v
w
i
}
}
.
The indicator function 1
{·}
returns 1 if the statement is true; 0,
otherwise. For example, 1
{5=3}
= 0 and 1
{5=5}
= 1. For the 1st
iteration, in step 1, it computes each task’s truth from workers’ an-
swers by considering which choice receives the highest aggregated
workers’ qualities. Intuitively, the answer given by many high qual-
ity workers are likely to be the truth. For example, for task t
2
, as
it receives one T and two F’s from workers and each worker is of
the same quality, then v
2
= F. Similarly we get v
1
= T and v
i
= F
for 2 i 6. In step 2, based on the computed truth in step
1, it gives a high (low) quality to a worker if the worker makes
few (a lot of) mistakes. For example, as the number of mistakes
(i.e.,
P
t
i
∈T
w
1
{v
i
6=v
w
i
}
) for workers w
1
, w
2
, w
3
are 3, 2, 1, re-
spectively, thus the computed qualities are q
w
1
= log(3/3) = 0,
q
w
2
= log(2/3) = 0.41 and q
w
3
= log(1/3) = 1.10. Follow-
ing these two steps, the process will then iterate until convergence.
In the converged results, the truth are v
1
= v
6
= T, and v
i
= F
Algorithm 1: Solution Framework
Input: workers’ answers V
Output: inferred truth v
i
(1 i n), worker quality q
w
(w W)
1 Initialize all workers’ qualities (q
w
for w W);
2 while true do
3 // Step 1: Inferring the Truth
4 for 1 i n do
5 Inferring the truth v
i
based on V and {q
w
| w W};
6 // Step 2: Estimating Worker Quality
7 for w W do
8 Estimating the quality q
w
based on V and {v
i
| 1 i n};
9 // Check for Convergence
10 if Converged then
11 break;
12 return v
i
for 1 i n and q
w
for w W;
(2 i 5); the qualities are q
w
1
= 4.9 × 10
15
, q
w
2
= 0.29 and
q
w
3
= 16.09. We can observe that PM can derive the truth correctly,
and w
3
has a higher quality compared with w
1
and w
2
.
4. IMPORTANT FACTORS
In this section, we categorize existing works [16, 15, 53, 51, 41,
33, 26, 61, 62, 19, 35, 30, 46, 27, 5, 34] following two factors:
Task Modeling (Section 4.1): how existing works model a task
(e.g., task’s difficulty, latent topics).
Worker Modeling (Section 4.2): how existing works model a worker’s
quality (e.g., worker probability, diverse skills).
We summarize how existing works [16, 15, 53, 51, 41, 33, 26,
61, 62, 19, 35, 30, 46, 27, 5, 34] can be categorized based on the
above factors in Table 4. Next we analyze each factor, respectively.
4.1 Task Modeling
4.1.1 Task Difficulty
Different from most existing works which assume that a worker
has the same quality for answering different tasks, some recent
works [53, 35] model the difficulty in each task. They assume that
each task has its difficulty level, and the more difficult a task is,
the harder a worker can correctly answer the task. For example,
in [53], it models the probability that worker w correctly answers
task t
i
as follows: Pr(v
w
i
= v
i
| d
i
, q
w
) = 1/(1 + e
d
i
·q
w
),
where d
i
(0, +) represents the difficulty for task t
i
, and the
higher d
i
is, the easier task t
i
is. Intuitively, for a fixed worker
quality q
w
> 0, an easier task (high value of d
i
) leads to a higher
probability that the worker correctly answers the task.
4.1.2 Latent Topics
Different from modeling each task as a value (e.g., difficulty),
some recent works [19, 35, 57, 51] model each task as a vector
with K values. The basic idea is to exploit the diverse topics in a
task, where the topic number (i.e., K) is pre-defined. For example,
existing studies [19, 35] make use of the text description in each
task and adopt topic model techniques [6, 56] to generate a vector
of size K for the task; while Multi [51] learns a K-size vector with-
out referring to external information (e.g., text descriptions). Based
on the task models, a worker is probable to answer a task correctly
if the worker has high qualities on the task’s related topics.
4.2 Worker Modeling
4.2.1 Worker Probability
Worker probability uses a single real number (between 0 and 1)
to model a worker ws quality q
w
[0, 1], which represents the
ability that worker w correctly answers a task. The higher q
w
is, the
worker w has higher ability to correctly answer tasks. The model
543

Table 4: Comparisons of Different Methods that Address Truth Inference Problem in Crowdsourcing.
Method Task Types Task Modeling Worker Modeling Techniques
MV Decision-Making, Single-Choice No Model No Model Direct Computation
ZC [16] Decision-Making, Single-Choice No Model Worker Probability Probabilistic Graphical Model
GLAD [53] Decision-Making, Single-Choice Task Difficulty Worker Probability Probabilistic Graphical Model
D&S [15] Decision-Making, Single-Choice No Model Confusion Matrix Probabilistic Graphical Model
Minimax [61] Decision-Making, Single-Choice No Model Diverse Skills Optimization
BCC [27] Decision-Making, Single-Choice No Model Confusion Matrix Probabilistic Graphical Model
CBCC [46] Decision-Making, Single-Choice No Model Confusion Matrix Probabilistic Graphical Model
LFC [41] Decision-Making, Single-Choice No Model Confusion Matrix Probabilistic Graphical Model
CATD [30] Decision-Making, Single-Choice, Numeric No Model Worker Probability, Confidence Optimization
PM [5, 31] Decision-Making, Single-Choice, Numeric No Model Worker Probability Optimization
Multi [51] Decision-Making Latent Topics Diverse Skills, Worker Bias, Worker Variance Probabilistic Graphical Model
KOS [26] Decision-Making No Model Worker Probability Probabilistic Graphical Model
VI-BP [33] Decision-Making No Model Confusion Matrix Probabilistic Graphical Model
VI-MF [33] Decision-Making No Model Confusion Matrix Probabilistic Graphical Model
LFC N [41] Numeric No Model Worker Variance Probabilistic Graphical Model
Mean Numeric No Model No Model Direct Computation
Median Numeric No Model No Model Direct Computation
has been widely used in existing works [16, 26, 33, 5]. Some recent
works [53, 31] extend the worker probability to model a worker’s
quality in a wider range, e.g., q
w
(−∞, +), and a higher q
w
means the worker ws higher quality in answering tasks.
4.2.2 Confusion Matrix
Confusion matrix [15, 41, 33, 27, 46] is used to model a worker’s
quality for answering single-choice tasks. Suppose each task in T
has ` fixed choices, then the confusion matrix q
w
is an `× ` matrix,
where the j-th (1 j `) row, i.e., q
w
j,·
= [ q
w
j,1
, q
w
j,2
, . . . , q
w
j,`
],
represents the probability distribution of worker ws possible an-
swers for a task if the truth of the task is the j-th choice. Each
element q
w
j,k
(1 j `, 1 k `) means that “given the truth of
a task is the j-th choice, the probability that worker w selects the
k-th choice”, i.e., q
w
j,k
= Pr(v
w
i
= k | v
i
= j) for any t
i
T .
For example, decision-making tasks ask workers to select ‘T’ (1st
choice) or ‘F’ (2nd choice) for each claim (` = 2), then an exam-
ple confusion matrix for w is q
w
=
h
0.8 0.2
0.3 0.7
i
, where q
w
1,2
= 0.2
means that if the truth of a task is ‘T’, the probability that the worker
answers the task as ‘F’ is 0.2.
4.2.3 Worker Bias and Worker Variance
Worker bias and variance [51, 41] are proposed to handle nu-
meric tasks, where worker bias captures the effect that a worker
may underestimate (or overestimate) the truth of a task, and worker
variance captures the variation of errors around the bias. For ex-
ample, given a set of photos with humans, each numeric task asks
workers to estimate the height of the human on it. Suppose a worker
w is modeled with bias τ
w
and variance σ
w
, then the answer v
w
i
given by worker w is modeled to draw from the Gaussian distri-
bution: v
w
i
N (v
i
+ τ
w
, σ
w
), that is, (1) a worker with bias
τ
w
0 (τ
w
0) will overestimate (underestimate) the height,
while τ
w
0 leads to more accurate estimation; (2) a worker with
variance σ
w
0 means a large variation of error, while σ
w
0
leads to a small variation of error.
4.2.4 Confidence
Existing works [30, 25] observe that if a worker answers plenty
of tasks, then the estimated quality for the worker is confident; oth-
erwise, if a worker answers only a few tasks, then the estimated
quality is not confident. Inspired by this observation, [35] assigns
higher qualities to the workers who answer plenty of tasks, than
the workers who answer a few tasks. To be specific, for a worker
w, it uses the Chi-Square distribution [3] with 95% confidence in-
terval, i.e., X
2
(0.975,|T
w
|)
as a coefficient to scale up the worker’s
quality, where |T
w
| is the number of tasks that worker w has an-
swered. X
2
(0.975,|T
w
|)
increases with |T
w
|, i.e., the more tasks w
has answered, the higher worker ws quality is scaled to.
4.2.5 Diverse Skills
A worker may have various levels of expertise for different top-
ics. For example, a sports fan that rarely pays attention to enter-
tainment may answer tasks related to sports more correctly than
tasks related to entertainment. Different from most of the above
models which have an assumption that a worker has the same qual-
ity to answer different tasks, existing works [19, 35, 61, 51, 57,
59] model the diverse skills in a worker and capture a worker’s di-
verse qualities for different tasks. The basic ideas of [19, 61] are
that they model a worker ws quality as a vector of size n, i.e.,
q
w
= [ q
w
1
, q
w
2
, . . . , q
w
n
], where q
w
i
indicates worker ws quality
for task t
i
. Different from [19, 61], some recent works [35, 51,
57, 59] model a worker’s quality for different latent topics, i.e.,
q
w
= [ q
w
1
, q
w
2
, . . . , q
w
K
], where the number K is pre-defined, in-
dicating the number of latent topics. They [35, 51, 57, 59] assume
that each task is related to one or more topics in these K latent top-
ics, and a worker is highly probable to correctly answer a task if the
worker has a high quality in the task’s related topics.
5. TRUTH INFERENCE ALGORITHMS
Existing works [61, 19, 30, 5, 34, 16, 15, 53, 51, 41, 26, 33, 35,
27, 46] usually adopt the framework in Algorithm 1. Based on the
used techniques, they can be classified into the following three cat-
egories: direct computation [20, 39], optimization methods [61, 19,
30, 5] and probabilistic graphical model methods [34, 16, 15, 53,
51, 41, 26, 33, 35, 27, 46]. Next we talk about them, respectively.
5.1 Direct Computation
Some baseline methods directly estimate v
i
(1 i n) based
on V , without modeling each worker or task. For decision-making
and single-label tasks, Majority Voting (MV) regards the truth of
each task as the answer given by most workers; while for numeric
tasks, Mean and Median are two baseline methods that regard the
mean and median of workers’ answers as the truth for each task.
5.2 Optimization
The basic idea of optimization methods is to set a self-defined
optimization function that captures the relations between workers’
qualities and tasks’ truth, and then derive an iterative method to
compute these two sets of parameters collectively. The differences
among existing works [5, 31, 30, 61] are that they model workers’
qualities differently and apply different optimization functions to
capture the relations between the two sets of parameters.
(1) Worker Probability. PM [5, 31] models each worker’s quality
as a single value, and the optimization function is defined as:
544

min
{q
w
},{v
i
}
f({q
w
}, {v
i
}) =
X
w∈W
q
w
·
X
t
i
∈T
w
d(v
w
i
, v
i
),
where {q
w
} represents the set of all workers’ qualities, and simi-
larly {v
i
} represents the set of all truth. It models a worker ws
quality as q
w
0, and d(v
w
i
, v
i
) 0 defines the distance be-
tween worker’s answer v
w
i
and the truth v
i
: the similar v
w
i
is to
v
i
, the lower the value of d(v
w
i
, v
i
) is. Intuitively, to minimize
f({q
w
}, {v
i
}), a worker ws high quality q
w
corresponds to a low
value in d(v
i
, v
w
i
), i.e., worker ws answer should be close to the
truth. By capturing the intuitions, similar to Algorithm 1, PM [5,
31] develops an iterative approach, and in each iteration, it adopts
the two steps as illustrated in Section 3.
(2) Worker Probability and Confidence. Different from above,
CATD [30] considers both worker probability and confidence in
modeling a worker’s quality. As discussed in Section 4.2.4, each
worker ws quality is scaled up to a coefficient of X
2
(0.975,|T
w
|)
,
i.e., the more tasks w has answered, the higher worker ws quality
is scaled to. It develops an objective function, with the intuitions
that a worker w who gives answers close to the truth and answers
a plenty of tasks should have a high quality q
w
. Similarly it adopts
an iterative approach, and iterates the two steps until convergence.
(3) Diverse Skills. Minimax [61] leverages the idea of minimax en-
tropy [63]. To be specific, it models the diverse skills of a worker
w across different tasks and focuses on single-label tasks (with `
choices). It assumes that for a task t
i
, the answers given by w are
generated by a probability distribution π
w
i,·
= [ π
w
i,1
, π
w
i,2
, . . . , π
w
i,`
],
where each π
w
i,j
is the probability that worker w answers task t
i
with the j-th choice. Following this, an objective function is de-
fined by considering two constraints for tasks and workers: for a
task t
i
, the number of answers collected for a choice equals the sum
of corresponding generated probabilities; for a worker w, among all
tasks answered by w, given the truth is the j-th choice, the number
of answers collected for the k-th choice equals the sum of corre-
sponding generated probabilities. Finally [61] devises an iterative
approach to infer the two sets of parameters {v
i
} and {π
w
}.
5.3 Probabilistic Graphical Model (PGM)
A probabilistic graphical model (PGM) [28] is a graph which ex-
presses the conditional dependency structure (represented by edges)
between random variables (represented by nodes). Figure 1 shows
the general PGM adopted in existing works. Each node represents
a variable. There are two plates, respectively for workers and tasks,
where each one represents the repeating variables. For example, the
plate for workers represents |W| repeating variables, where each
variable corresponds to a worker w W. For the variables, α, β,
and v
w
i
are known (α and β are priors for q
w
and v
i
, which can be
set based on the prior knowledge); q
w
and v
i
are latent or unknown
variables, which are two desired variables to compute. The directed
edges model the conditional dependence between a child node and
its associated parent node(s) in the sense that the child node fol-
lows a probabilistic distribution conditioned on the values taken by
the parent node(s). For example, three conditional distributions in
Figure 1 are Pr(q
w
| α), Pr(v
i
| β) and Pr(v
w
i
| q
w
, v
i
).
Next we illustrate the details (optimization goal and the two steps)
of each method using PGM. In general the methods differ in the
used worker model. It can be classified into three categories: worker
probability [16, 53, 26, 33], confusion matrix [15, 41, 27, 46] and
diverse skills [19, 35, 51]. For each category, we first introduce its
basic method, e.g., ZC [16], and then summarize how other meth-
ods [53, 26, 33] extend the basic method ZC [16].
(1) Worker Probability: ZC [16] and its extensions [53, 26, 33].
workers
tasks
v
i
w
α
β
q
w
v
i
*
|W |
n
Figure 1: A General PGM (Probabilistic Graphical Model).
ZC [16] adopts a PGM similar to Figure 1, with the simplifica-
tion that it does not consider the priors (i.e., α, β). Suppose all
tasks are decision-making tasks (v
i
{T, F}) and each worker’s
quality is modeled as worker probability q
w
[0, 1]. Then
Pr(v
w
i
| q
w
, v
i
) = (q
w
)
1
{v
w
i
=v
i
}
· (1 q
w
)
1
{v
w
i
6=v
i
}
,
which means that the probability worker w correctly (incorrectly)
answers a task is q
w
(1 q
w
). For decision-making tasks, ZC [16]
tries to maximize the probability of the occurrence of workers’ an-
swers, called likelihood, i.e., max
{q
w
}
Pr(V | {q
w
}), which re-
gards {v
i
} as latent variables:
Pr(V | {q
w
}) =
1
2
·
Y
n
i=1
X
z∈{T, F}
Y
w∈W
i
Pr(v
w
i
| q
w
, v
i
= z). (1)
However, it is hard to optimize due to the non-convexity. Thus
ZC [16] applies the EM (Expectation-Maximization) framework [17]
and iteratively updates q
w
and v
i
to approximate its optimal value.
Note ZC [16] develops a system to address entity linking for online
pages. In this paper we focus on the part of leveraging the crowd’s
answers to infer the truth (i.e., Section 4.3 in [16]), and we omit
other parts (e.g., constraints on its probabilistic model).
There are several extensions of ZC, e.g., GLAD [53], KOS [26],
VI-BP [33], VI-MF [33], and they focus on different perspectives:
Task Model. GLAD [53] extends ZC [16] in task model. Rather
than assuming that each task is the same, it [53] models each task
t
i
s difficulty d
i
(0, +) (the higher, the easier). Then it models
the worker’s answer as Pr(v
w
i
= v
i
| d
i
, q
w
) = 1/(1 + e
d
i
·q
w
),
and integrates it into Equation 1 to approximate the optimal value
using Gradient Descent [28] (an iterative method).
Optimization Function. KOS [26], VI-BP [33], and VI-MF [33]
extend ZC [16] in an optimization goal. Recall that ZC tries to
compute the optimal {q
w
} that maximizes Pr(V | {q
w
}), which is
the Point Estimate. Instead, [26, 33] leverage the Bayesian Estima-
tors to calculate the integral of all possible q
w
, and the target is to
estimate the truth v
i
= argmax
z∈{T, F}
Pr(v
i
= z | V ), where
Pr(v
i
= z | V ) =
Z
{q
w
}
Pr(v
i
= z, {q
w
} | V ) d{q
w
}. (2)
It is hard to directly compute Equation 2, and existing works [26,
33] seek for Variational Inference (VI) techniques [49] to approx-
imate the value: KOS [26] first leverages Belief Propagation (one
typical VI technique) to iteratively approximate the value in Equa-
tion 2, then [33] proposes a more general model based on KOS,
called VI-BP. Moreover, it [33] also applies Mean Field (anther VI
technique) in VI-MF to iteratively approach Equation 2.
(2) Confusion Matrix: D&S [15] and its extensions [41, 27, 46].
D&S [15] focuses on single-label tasks (with fixed ` choices)
and models each worker as a confusion matrix q
w
with size ` × `
(Section 4.2.2). The worker ws answer follows the probability
Pr(v
w
i
| q
w
, v
i
) = q
w
v
i
,v
w
i
. Similar to Equation 1, D&S [15] tries
to optimize the function argmax
{q
w
}
Pr(V | {q
w
}), where
Pr(V | {q
w
}) =
Y
n
i=1
X
1z`
Pr(v
i
= z) ·
Y
w∈W
i
q
w
z,v
w
i
,
and it applies the EM framework [17] to devise two iterative steps.
The above method D&S [15], which models a worker as a con-
fusion matrix, is also a widely used model. There are some exten-
sions, e.g., LFC [41], LFC N [41], BCC [27] and CBCC [46].
545

Citations
More filters
01 Jan 2013
TL;DR: This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems and presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research.
Abstract: Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human–machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the system’s communicative competence by including aspects of error correction, cooperation, multimodality, and adaptation in context. This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems. It provides an overview of the basic issues such as system architectures, various dialogue management methods, system evaluation, and also surveys advanced topics concerning extensions of the basic model to more conversational setups. The goal of the book is to provide an introduction to the methods, problems, and solutions that are used in dialogue system development and evaluation. It presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research. vi KEywoRDS Spoken dialogue systems, multimodality, evaluation, error-handling, dialogue management, statistical method v MC_Jok nen_FM. ndd Achorn Internat onal 10/10/2009 04:18AM

304 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This work proposes an Inconsistent Pseudo Annotations to Latent Truth (IPA2LT) framework to train a FER model from multiple inconsistently labeled datasets and large scale unlabeled data and shows that the method outperforms other state-of-the-art and optional methods under a rigorous evaluation protocol involving 7 FER datasets.
Abstract: Annotation errors and bias are inevitable among different facial expression datasets due to the subjectiveness of annotating facial expressions. Ascribe to the inconsistent annotations, performance of existing facial expression recognition (FER) methods cannot keep improving when the training set is enlarged by merging multiple datasets. To address the inconsistency, we propose an Inconsistent Pseudo Annotations to Latent Truth (IPA2LT) framework to train a FER model from multiple inconsistently labeled datasets and large scale unlabeled data. In IPA2LT, we assign each sample more than one labels with human annotations or model predictions. Then, we propose an end-to-end LTNet with a scheme of discovering the latent truth from the inconsistent pseudo labels and the input face images. To our knowledge, IPA2LT serves as the first work to solve the training problem with inconsistently labeled FER datasets. Experiments on synthetic data validate the effectiveness of the proposed method in learning from inconsistent labels. We also conduct extensive experiments in FER and show that our method outperforms other state-of-the-art and optional methods under a rigorous evaluation protocol involving 7 FER datasets.

257 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: A comprehensive and systematic review of existing research on four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection.
Abstract: Crowdsourcing is a computing paradigm where humans are actively involved in a computing task, especially for tasks that are intrinsically easier for humans than for computers. Spatial crowdsourcing is an increasing popular category of crowdsourcing in the era of mobile Internet and sharing economy, where tasks are spatiotemporal and must be completed at a specific location and time. In fact, spatial crowdsourcing has stimulated a series of recent industrial successes including sharing economy for urban services (Uber and Gigwalk) and spatiotemporal data collection (OpenStreetMap and Waze). This survey dives deep into the challenges and techniques brought by the unique characteristics of spatial crowdsourcing. Particularly, we identify four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection. We conduct a comprehensive and systematic review of existing research on the aforementioned four issues. We also analyze representative spatial crowdsourcing applications and explain how they are enabled by these four technical issues. Finally, we discuss open questions that need to be addressed for future spatial crowdsourcing research and applications.

185 citations

Proceedings ArticleDOI
01 Apr 2017
TL;DR: This paper surveys and synthesizes a wide spectrum of existing studies on crowdsourced data management and outlines key factors that need to be considered to improve crowdsourcing data management.
Abstract: Many important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring is an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality, (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost, (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. We survey and synthesize a wide spectrum of existing studies on crowdsourced data management.

130 citations

Journal Article
TL;DR: This research, which explores the prevalence of dishonesty among crowdworkers, how workers respond to both monetary incentives and intrinsic forms of motivation, and how crowdworkers interact with each other, has immediate implications that are distill into best practices that researchers should follow when using crowdsourcing in their own research.
Abstract: This survey provides a comprehensive overview of the landscape of crowdsourcing research, targeted at the machine learning community. We begin with an overview of the ways in which crowdsourcing can be used to advance machine learning research, focusing on four application areas: 1) data generation, 2) evaluation and debugging of models, 3) hybrid intelligence systems that leverage the complementary strengths of humans and machines to expand the capabilities of AI, and 4) crowdsourced behavioral experiments that improve our understanding of how humans interact with machine learning systems and technology more broadly. We next review the extensive literature on the behavior of crowdworkers themselves. This research, which explores the prevalence of dishonesty among crowdworkers, how workers respond to both monetary incentives and intrinsic forms of motivation, and how crowdworkers interact with each other, has immediate implications that we distill into best practices that researchers should follow when using crowdsourcing in their own research. We conclude with a discussion of additional tips and best practices that are crucial to the success of any project that uses crowdsourcing, but rarely mentioned in the literature.

126 citations


Cites background from "Truth inference in crowdsourcing: i..."

  • ...As mentioned above, entire surveys could be written on the topic of crowdsourced data generation alone, and indeed some have [217, 219]....

    [...]

  • ...Unlike other surveys, which go into greater depth on algorithms for aggregating crowdsourced labels [217, 219], we address the label aggregation problem only briefly, devoting relatively more attention and detail to applications that are less well known within the machine learning community, in the hope of inspiring new connections and directions of research....

    [...]

  • ..., 54, 79, 82, 107, 160, 198, 218, 220], still an active area of research [219]....

    [...]

  • ...[219] provide a thorough survey and empirical comparison of seventeen algorithms that are based on this general framework, characterizing them in terms of the way in which instances and workers are modeled as well as the specifics of how the calculations of quality parameters and label assignments are made (through what they call direct computation, using optimization methods, or using probabilistic graphical models)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations


"Truth inference in crowdsourcing: i..." refers methods in this paper

  • ...For example, existing studies [19, 35] make use of the text description in each task and adopt topic model techniques [6, 56] to generate a vector of sizeK for the task; while Multi [51] learns aK-size vector without referring to external information (e....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Book
01 Jan 1948
TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
Abstract: Scientific knowledge grows at a phenomenal pace--but few books have had as lasting an impact or played as important a role in our modern world as The Mathematical Theory of Communication, published originally as a paper on communication theory more than fifty years ago. Republished in book form shortly thereafter, it has since gone through four hardcover and sixteen paperback printings. It is a revolutionary work, astounding in its foresight and contemporaneity. The University of Illinois Press is pleased and honored to issue this commemorative reprinting of a classic.

10,215 citations

Frequently Asked Questions (13)
Q1. What have the authors contributed in "Truth inference in crowdsourcing: is the problem solved?" ?

Recently, the database community and data mining community independently study this problem and propose various algorithms. To alleviate this problem, the authors provide a detailed survey on 17 existing algorithms and perform a comprehensive evaluation using 5 real datasets. Through experiments the authors find that existing algorithms are not stable across different datasets and there is no algorithm that outperforms others consistently. The authors believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions. 

The authors also point out the following future research directions. It is also interesting to study the relations between the design of UI, price, worker ’ s latency and quality. Not all methods can benefit from qualification test, and the quality of some methods even decrease. Although most methods can benefit from them, the improvements vary in different datasets and methods. 

Based on the used techniques, they can be classified into the following three categories: direct computation [20, 39], optimization methods [61, 19, 30, 5] and probabilistic graphical model methods [34, 16, 15, 53, 51, 41, 26, 33, 35, 27, 46]. 

(3) On S Rel, the quality of methods CATD and ZC decrease as r ≥ 4, probably because they are sensitive to low quality workers’ answers. 

Due to the wide deployment of public crowdsourcing platforms, e.g., Amazon Mechanical Turk (AMT) [2], CrowdFlower [12], the access to crowd becomes much easier. 

In terms of worker models, in general, methods with confusion matrix (D&S, BCC, CBCC, LFC, VI-BP, VIMF) perform better than methods with worker probability (ZC, GLAD, CATD, PM, KOS), since confusion matrix is more expressive than worker probability. 

(5) In terms of task models, the methods that model task difficulty (GLAD) or latent topics (Multi) in tasks do not perform significantly better in quality; moreover, they often take more time to converge. 

Different optimization functions often vary significantly in efficiency, e.g., Bayesian Estimator is less efficient than Point Estimation, and some techniques (e.g., Gibbs Sampling, Variational Inference) often take a long time to converge. 

The authors find that there are only 8 methods (i.e., ZC, GLAD, D&S, LFC, CATD, PM, VI-MF and LFC N) that can initialize workers’ qualities using qualification test. 

As the answers obtained for each task has inherent orderings, in order to capture the consistency of workers’ answers, for a task ti, the authors first compute the median vi (a robust metric in statistics and it is not sensitive to outliers) over all its collected answers; then the consistency (C) is defined as the average deviationcompared with the median, i.e., C = 1 n · n∑ i=1√∑ w∈Wi (v w i −vi) 2|Wi| ,where 

For dataset D Product, i.e., Figures 4(a), (b), the authors can observe that (1) as the data redundancy r is varied in [1, 3], the quality increases with r for different methods. 

Suppose all tasks are decision-making tasks (v∗i ∈ {T, F}) and each worker’s quality is modeled as worker probability qw ∈ [0, 1]. 

(4) The quality of methods for single-label tasks are lower than that for decision-making tasks, since workers are not good at answering tasks with multiple choices, and the methods for single-label tasks are sensitive to low quality workers. 

Trending Questions (1)
How truth maintenance system works for non monotonic inference?

We believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions.