scispace - formally typeset
Open AccessJournal ArticleDOI

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Reads0
Chats0
TLDR
This paper extends crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time, and leverages the usergenerated data found in the Social Web to build perceptual spaces.
Abstract
By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time. However, the number of crowd-sourced mini-tasks to fill in missing values may often be prohibitively large and the resulting data quality is doubtful. Instead of simple crowd-sourcing to obtain all values individually, we leverage the usergenerated data found in the Social Web: By exploiting user ratings we build perceptual spaces, i.e., highly-compressed representations of opinions, impressions, and perceptions of large numbers of users. Using few training samples obtained by expert crowd sourcing, we then can extract all missing data automatically from the perceptual space with high quality and at low costs. Extensive experiments show that our approach can boost both performance and quality of crowd-enabled databases, while also providing the flexibility to expand schemas in a query-driven fashion.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Pick-a-crowd: tell me what you like, and i'll tell you what to do

TL;DR: This paper proposes and extensively evaluate a different Crowdsourcing approach based on a push methodology that carefully selects which workers should perform a given task based on worker profiles extracted from social networks and shows that this approach consistently yield better results than usual pull strategies.
Proceedings ArticleDOI

Using the crowd for top-k and group-by queries

TL;DR: The problem of evaluating top-k and group-by queries using the crowd to answer either type or value questions is studied, and efficient algorithms that are guaranteed to achieve good results with high probability are given.
Journal ArticleDOI

Large-scale linked data integration using probabilistic reasoning and crowdsourcing

TL;DR: The ZenCrowd system uses a three-stage blocking technique in order to obtain the best possible instance matches while minimizing both computational complexity and latency, and identifies entities from natural language text using state-of-the-art techniques and automatically connects them to the linked open data cloud.
Proceedings ArticleDOI

Skyline queries in crowd-enabled databases

TL;DR: It is shown that by assessing the individual risk a tuple poses with respect to the overall result quality, crowd-sourcing efforts for eliciting missing values can be narrowly focused on only those tuples that may degenerate the expected quality most strongly, which leads to an algorithm for computing skyline sets on incomplete data with maximum result quality.
Proceedings ArticleDOI

An online cost sensitive decision-making method in crowdsourcing systems

TL;DR: A cost sensitive quantitative analysis method to estimate the profit of the crowdsourcing job so that those questions with no future profit from crowdsourcing can be terminated and the experimental results show that the proposed method outperforms all the state-of-art methods.
References
More filters

Likability-Based Genres: Analysis and Evaluation of the Netflix Dataset

TL;DR: A new approach to defining genre is presented that defines genre based on likability rat- ings rather than features of the content itself, and evidence that likability-based features can be used to predict human annotated genre labels more successfully than content- based features for the same data is given.
Related Papers (5)