scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
Patent
21 Oct 2010
TL;DR: In this article, an apparatus and method for packet classification using a Bloom filter is described. But the method is not suitable for the classification of large data sets, as it requires a large number of tuples to search for the best matching rules.
Abstract: The present disclosure provides an apparatus and method for packet classification using a Bloom filter and includes determining a matching length of how long each field value of one or more fields in an input packet coincides with a field value of the corresponding field stored in a rule set by performing a field-by-field search on the fields in the input packet, and generating a tuple list made up of a combination of one or more of the matching length for the respective fields; selecting particular tuples existing in the rule set from the tuple list; filtering each of the selected tuples by using the Bloom filter; and searching for a best matching rule as a search pool exclusively within the tuples with the positive result of the filtering. According to the present disclosure, the object tuples to search can be substantially relieved to improve the searching performance.

33 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: This paper addresses the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties, and develops optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data.
Abstract: In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute lineages of the output tuples, and then compute the probabilities of the lineage formulas. However unlike previous work, we allow for arbitrary and complex correlations to be present in the data, captured via a forest of junction trees. We observe that evaluating even read-once (tree structured) lineages (e.g., those generated by hierarchical conjunctive queries), polynomially computable over tuple independent probabilistic databases, is #P-complete for lightly correlated probabilistic databases like Markov sequences. We characterize the complexity of exact computation of the probability of the lineage formula on a correlated database using a parameter called lwidth (analogous to the notion of treewidth). For lineages that result in low lwidth, we compute exact probabilities using a novel message passing algorithm, and for lineages that induce large lwidths, we develop approximate Monte Carlo algorithms to estimate the result probabilities. We scale our algorithms to very large correlated probabilistic databases using the previously proposed INDSEP data structure. To mitigate the complexity of lineage evaluation, we develop optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data. Our experimental study illustrates the benefits of using our algorithms for processing lineage formulas over correlated probabilistic databases.

33 citations

Proceedings ArticleDOI
01 Apr 2017
TL;DR: A joint solution with a neural sequence model is proposed, and it is shown that it outperforms the pipeline in a cross-lingual open information extraction setting by 1-4 BLEU and 0.5-0.8 F1.
Abstract: Cross-lingual information extraction is the task of distilling facts from foreign language (e.g. Chinese text) into representations in another language that is preferred by the user (e.g. English tuples). Conventional pipeline solutions decompose the task as machine translation followed by information extraction (or vice versa). We propose a joint solution with a neural sequence model, and show that it outperforms the pipeline in a cross-lingual open information extraction setting by 1-4 BLEU and 0.5-0.8 F1.

33 citations

Journal ArticleDOI
TL;DR: A new partitioning strategy, multiattribute grid declustering (MAGIC), which can use two or more attributes of a relation to decluster its tuples across multiple processors and disks, unlike other multiattribute partitioning mechanisms that have been proposed.
Abstract: During the past decade, parallel database systems have gained increased popularity due to their high performance, scalability, and availability characteristics. With the predicted future database sizes and complexity of queries, the scalability of these systems to hundreds and thousands of processors is essential for satisfying the projected demand. Several studies have repeatedly demonstrated that both the performance and scalability of a parallel database system are contingent on the physical layout of the data across the processors of the system. If the data are not declustered appropriately, the execution of an operation might waste system resources, reducing the overall processing capability of the system. With earlier, single-attribute partitioning mechanisms such as those found in the Tandem, Teradata, Gamma, and Bubba parallel database systems, range selections on any attribute other than the partitioning attribute must be sent to all processors containing tuples of the relation, while range selections on the partitioning attribute can be directed to only a subset of the processors. Although using all the processors for an operation is reasonable for resource intensive operations, directing a query with minimal resource requirements to processors that contain no relevant tuples wastes CPU cycles, communication bandwidth, and I/O bandwidth. As a solution, this paper describes a new partitioning strategy, multiattribute grid declustering (MAGIC), which can use two or more attributes of a relation to decluster its tuples across multiple processors and disks. In addition, MAGIC declustering, unlike other multiattribute partitioning mechanisms that have been proposed, is able to support range selections as well as exact match selections on each of the partitioning attributes. This capability enables a greater variety of selection operations to be directed to a restricted subset of the processors in the system. Finally, MAGIC partitions each relation based on the resource requirements of the queries that constitute the workload for the relation and the processing capacity of the system in order to ensure that the proper number of processors are used to execute queries that reference the relation. >

33 citations

Proceedings ArticleDOI
11 Apr 2016
TL;DR: This work introduces the list recommendation problem and proposes a novel two-layered framework that builds upon existing CF algorithms to optimize a list's click probability and evaluates the approach using a novel adaptation of Inverse Propensity Scoring which facilitates off-policy estimation of the method's CTR and showcases its effectiveness in real-world settings.
Abstract: Most Collaborative Filtering (CF) algorithms are optimized using a dataset of isolated user-item tuples. However, in commercial applications recommended items are usually served as an ordered list of several items and not as isolated items. In this setting, inter-item interactions have an effect on the list's Click-Through Rate (CTR) that is unaccounted for using traditional CF approaches. Most CF approaches also ignore additional important factors like click propensity variation, item fatigue, etc. In this work, we introduce the list recommendation problem. We present useful insights gleaned from user behavior and consumption patterns from a large scale real world recommender system. We then propose a novel two-layered framework that builds upon existing CF algorithms to optimize a list's click probability. Our approach accounts for inter-item interactions as well as additional information such as item fatigue, trendiness patterns, contextual information etc. Finally, we evaluate our approach using a novel adaptation of Inverse Propensity Scoring (IPS) which facilitates off-policy estimation of our method's CTR and showcases its effectiveness in real-world settings.

33 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023203
2022459
2021210
2020285
2019306
2018266