Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

doi:10.14778/2168651.2168655

Open AccessJournal ArticleDOI

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Joachim Selke, +2 more

- Vol. 5, Iss: 6, pp 538-549

Chats0

TLDR

This paper extends crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time, and leverages the usergenerated data found in the Social Web to build perceptual spaces.

Abstract:

By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time. However, the number of crowd-sourced mini-tasks to fill in missing values may often be prohibitively large and the resulting data quality is doubtful. Instead of simple crowd-sourcing to obtain all values individually, we leverage the usergenerated data found in the Social Web: By exploiting user ratings we build perceptual spaces, i.e., highly-compressed representations of opinions, impressions, and perceptions of large numbers of users. Using few training samples obtained by expert crowd sourcing, we then can extract all missing data automatically from the perceptual space with high quality and at low costs. Extensive experiments show that our approach can boost both performance and quality of crowd-enabled databases, while also providing the flexibility to expand schemas in a query-driven fashion.

Citations

PDF

Open Access

More filters

Patent

System and method for implementing an artificially intelligent virtual assistant using machine learning

Jason Mars, +4 more

TL;DR: In this paper, the authors propose a slot identification machine learning model to segment the text of a query and label each of the slots of the query, generating a slot value for each slot, and using the slot values to identify an external data source relevant to the user query, fetch user data from the external source, and apply one or more operations to the query to generate response data.

...read moreread less

Proceedings ArticleDOI

CrowdSeed: query processing on microblogs

Zhou Zhao, +2 more

TL;DR: CrowdSeed is presented, a system that automatically integrates human input for processing queries imposed on microblogs, and the effectiveness and efficiency of the system using real world data, as well as presenting interesting results from a game called "Who is in the CrowdSeed?".

...read moreread less

DOI

Closing Information Gaps with Need-driven Knowledge Sharing

Hans-Jörg Happel

TL;DR: This work describes a novel approach called need-driven knowledge sharing (NKS), which consists of three elements: indicators of information need, which are aggregated in order to derive continuous forecasts of organizational information needs, and inverse Search, a tool that identifies documents in the private information space of information providers, which may help closing organizational information gaps if moved to a shared information space.

...read moreread less

Book ChapterDOI

Towards Mobile Sensor-Aware Crowdsourcing: Architecture, Opportunities and Challenges

Jiyin He, +4 more

TL;DR: Switching to mobile clients has the potential to radically change the way crowdsourcing is performed, and allows for a new breed of crowdsourcing tasks.

...read moreread less

Proceedings Article

Just ask a human? - Controlling Quality in Relational Similarity and Analogy Processing using the Crowd.

Christoph Lofi

TL;DR: This paper employs human workers via crowd-sourcing to establish a performance baseline and develops novel techniques paying respect to the intrinsic consensual nature of the task at hand, which will further pave the way for building true hybrid systems with human workers and heuristic algorithms combining their individual strength.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Journal ArticleDOI

A tutorial on support vector regression

Alexander J. Smola, +1 more

- 01 Aug 2004 -

Statistics and Computing

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.

...read moreread less

Journal ArticleDOI

Learning from Imbalanced Data

Haibo He, +1 more

- 01 Sep 2009 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

...read moreread less

Proceedings Article

Support Vector Regression Machines

Harris Drucker, +4 more

TL;DR: This work compares support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space and expects that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

...read moreread less

BookDOI

Semi-Supervised Learning

Olivier Chapelle, +2 more

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).

...read moreread less

Collapse

Communications of The ACM

CDAS: a crowdsourcing data analytics system

Xuan Liu, +5 more

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Citations

System and method for implementing an artificially intelligent virtual assistant using machine learning

CrowdSeed: query processing on microblogs

Closing Information Gaps with Need-driven Knowledge Sharing

Towards Mobile Sensor-Aware Crowdsourcing: Architecture, Opportunities and Challenges

Just ask a human? - Controlling Quality in Relational Similarity and Analogy Processing using the Crowd.

References

Indexing by Latent Semantic Analysis

A tutorial on support vector regression

Learning from Imbalanced Data

Support Vector Regression Machines

Semi-Supervised Learning

Related Papers (5)

CrowdDB: answering queries with crowdsourcing

CrowdER: crowdsourcing entity resolution

Human-powered sorts and joins

Crowdsourcing systems on the World-Wide Web

CDAS: a crowdsourcing data analytics system