Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Book•

Elements of Ml Programming

[...]

Jeffrey D. Ullman¹•Institutions (1)

Stanford University¹

01 Jun 1994

TL;DR: A perspective on ML and SML/NJ and how to program with Datatypes and Solutions to Selected Exercises are presented.

...read moreread less

Abstract: 0. A Perspective on ML and SML/NJ. I. INTRODUCTION TO PROGRAMMING IN ML. 1. Expressions. 2. Type Consistency. 3. Variables and Environments. 4. Tuples and Lists. 5. It's Easy It's "fun." 6. Patterns in Function Definitions. 7. Local Environments Using "let." 8. Exceptions. 9. Side Effects: Input and Output. II. ADVANCED FEATURES OF ML. 10. Polymorphic Functions. 11. Higher-Order Functions. 12. Defining New Types. 13. Programming with Datatypes. 14. The ML Module System. 15. Software Design Using Modules. 16. Arrays. 17. References. III. ADDITIONAL DETAILS AND FEATURES. 18. Record Structures. 19. Matches and Patterns. 20. More About Exceptions. 21. Counting with Functions as Values. 22. More About Input and Output. 23. Creating Executable Files. 24. Controlling Operator Grouping. 25. Built-In Functions of SML/NJ. 26. Summary of ML Syntax. Solutions to Selected Exercises. Index.

...read moreread less

206 citations

Journal Article•DOI•

Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

[...]

Wai-Ho Au, Keith C. C. Chan, Andrew K. C. Wong¹, Yang Wang•Institutions (1)

University of Waterloo¹

01 Apr 2005-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data, and introduces a methodology to solving it.

...read moreread less

Abstract: This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.

...read moreread less

205 citations

Book Chapter•DOI•

Memory-limited execution of windowed stream joins

[...]

Utkarsh Srivastava¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

31 Aug 2004

TL;DR: It is shown formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams, and a broad class of applications for which an age-based model of stream arrival is more appropriate is pointed out.

...read moreread less

Abstract: We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible An alternative scenario is to provide a random sample of the join result, eg, if the output of the join is being aggregated We show formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency-based model of stream arrival We address the sampling problem for this model More importantly, we point out a broad class of applications for which an age-based model of stream arrival is more appropriate, and we address both approximation scenarios under this new model Finally, for the case of multiple joins being executed with an overall memory constraint, we provide an algorithm for memory allocation across the joins that optimizes a combined measure of approximation in all scenarios considered All of our algorithms are implemented and experimental results demonstrate their effectiveness

...read moreread less

204 citations

Proceedings Article•DOI•

Robust identification of fuzzy duplicates

[...]

Surajit Chaudhuri¹, Venky Ganti¹, Rajeev Motwani¹•Institutions (1)

Microsoft¹

05 Apr 2005

TL;DR: This work proposes two novel criteria that enable characterization of fuzzy duplicates more accurately than is possible with existing techniques, and proposes a novel framework for the fuzzy duplicate elimination problem.

...read moreread less

Abstract: Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples, which represent the same real-world entity. We propose two novel criteria that enable characterization of fuzzy duplicates more accurately than is possible with existing techniques. Using these criteria, we propose a novel framework for the fuzzy duplicate elimination problem. We show that solutions within the new framework result in better accuracy than earlier approaches. We present an efficient algorithm for solving instantiations within the framework. We evaluate it on real datasets to demonstrate the accuracy and scalability of our algorithm.

...read moreread less

203 citations

Proceedings Article•DOI•

Clean Answers over Dirty Databases: A Probabilistic Approach

[...]

Periklis Andritsos, Ariel Fuxman¹, Renée J. Miller¹•Institutions (1)

University of Toronto¹

03 Apr 2006

TL;DR: This work rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database, and experimentally study the performance of the rewritten queries.

...read moreread less

Abstract: The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases.

...read moreread less

200 citations

Collapse

Network Information

Performance

Metrics

7,188

Papers

157,520

Citations

No. of papers in the topic in previous years
Year	Papers
2023	203
2022	459
2021	210
2020	285
2019	306
2018	266

Tuple

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics