scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
Journal ArticleDOI
01 Nov 2014
TL;DR: Empirical evaluation shows that QuickFOIL can scale to large datasets consisting of hundreds of millions tuples, and is often more than order of magnitude more efficient than other existing approaches.
Abstract: Inductive Logic Programming (ILP) is a classic machine learning technique that learns first-order rules from relational-structured data. However, to-date most ILP systems can only be applied to small datasets (tens of thousands of examples). A long-standing challenge in the field is to scale ILP methods to larger data sets. This paper presents a method called QuickFOIL that addresses this limitation. QuickFOIL employs a new scoring function and a novel pruning strategy that enables the algorithm to find high-quality rules. QuickFOIL can also be implemented as an in-RDBMS algorithm. Such an implementation presents a host of query processing and optimization challenges that we address in this paper. Our empirical evaluation shows that QuickFOIL can scale to large datasets consisting of hundreds of millions tuples, and is often more than order of magnitude more efficient than other existing approaches.

71 citations

Posted Content
TL;DR: In this article, a co-attention model is proposed to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine.
Abstract: One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image,question,answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. We propose here instead a more general and scalable approach which exploits the fact that very good methods to achieve these operations already exist, and thus do not need to be trained. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.

71 citations

Proceedings Article
01 Jan 2005
TL;DR: In this paper, the authors propose to continuously adapt the database organization by making reorganization an integral part of the query evaluation process, such that both the required subset is easily retrieved and subsequent queries may benefit from the new partitioning structure.
Abstract: Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures deployed for this purpose, however, are non-discriminative. They assume every subset of the domain being indexed is equally important, and their structures cause a high maintenance overhead during updates. This approach often fails in decision support or scientific environments where index selection represents a weak compromise amongst many plausible plans. An alternative route, explored here, is to continuously adapt the database organization by making reorganization an integral part of the query evaluation process. Every query is first analyzed for its contribution to break the database into multiple pieces, such that both the required subset is easily retrieved and subsequent queries may benefit from the new partitioning structure. To study the potentials for this approach, we developed a small representative multi-query benchmark and ran experiments against several open-source DBMSs. The results obtained are indicative for a significant reduction in system complexity with clear performance benefits.

71 citations

Proceedings ArticleDOI
09 May 2017
TL;DR: A crowd-powered database system CDB is developed that supports crowd-based query optimizations, with focus on join and selection and a unified framework to perform the multi-goal optimization based on the graph model.
Abstract: Crowdsourcing database systems have been proposed to leverage crowd-powered operations to encapsulate the complexities of interacting with the crowd. Existing systems suffer from two major limitations. Firstly, in order to optimize a query, they often adopt the traditional tree model to select an optimized table-level join order. However, the tree model provides a coarse-grained optimization, which generates the same order for different joined tuples and limits the optimization potential that different joined tuples can be optimized by different orders. Secondly, they mainly focus on optimizing the monetary cost. In fact, there are three optimization goals (i.e., smaller monetary cost, lower latency, and higher quality) in crowdsourcing, and it calls for a system to enable multi-goal optimization. To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on AMT, CrowdFlower and ChinaCrowd. We have also created a benchmark for evaluating crowd-powered databases. We have conducted both simulated and real experiments, and the experimental results demonstrate the performance superiority of CDB on cost, latency and quality.

71 citations

Patent
15 Sep 2009
TL;DR: The authors convert data from atomic tuples found in data sources such as spreadsheets (e.g., raw numbers, words, and formatted dates) into semantically enriched schemas and associated tuples.
Abstract: Embodiments of the invention convert data from atomic tuples found in data sources such as spreadsheets (e.g., raw numbers, words, and formatted dates) into semantically enriched schemas and associated tuples. In addition to the data content, visual content, such as font and background color, is also analyzed as a part of the interpretation process. Embodiments of the invention also provide methods of interacting with the raw data via the semantically enriched schema tuples.

71 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023203
2022459
2021210
2020285
2019306
2018266