Topic

Joins

About: Joins is a research topic. Over the lifetime, 2216 publications have been published within this topic receiving 64149 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Access path selection in a relational database management system

[...]

P. Griffiths Selinger¹, Morton M. Astrahan¹, Donald D. Chamberlin¹, Raymond A. Lorie¹, T. G. Price¹ - Show less +1 more•Institutions (1)

IBM¹

30 May 1979

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.

...read moreread less

Abstract: In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research Laboratory.

...read moreread less

2,082 citations

Proceedings Article•DOI•

Structural joins: a primitive for efficient XML query pattern matching

[...]

Shurug Al-Khalifa¹, H. V. Jagadish, Nick Koudas¹, Jignesh M. Patel¹, Divesh Srivastava¹, Yuqing Wu¹ - Show less +2 more•Institutions (1)

University of Michigan¹

07 Aug 2002

TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.

...read moreread less

Abstract: XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.

...read moreread less

895 citations

Journal Article•DOI•

SCOPE: easy and efficient parallel processing of massive data sets

[...]

Ronnie Chaiken¹, Bob Jenkins¹, Per-Ake Larson¹, Bill Ramsey¹, Darren A. Shakib¹, Simon Weaver¹, Jingren Zhou¹ - Show less +3 more•Institutions (1)

Microsoft¹

01 Aug 2008

TL;DR: A new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis, designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters.

...read moreread less

Abstract: Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of shared-nothing commodity machines. It is imperative to develop a programming model that hides the complexity of the underlying system but provides flexibility by allowing users to extend functionality to meet a variety of requirements.In this paper, we present a new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis. The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL. Data is modeled as sets of rows composed of typed columns. The select statement is retained with inner joins, outer joins, and aggregation allowed. Users can easily define their own functions and implement their own versions of operators: extractors (parsing and constructing rows from a file), processors (row-wise processing), reducers (group-wise processing), and combiners (combining rows from two inputs). SCOPE supports nesting of expressions but also allows a computation to be specified as a series of steps, in a manner often preferred by programmers. We also describe how scripts are compiled into efficient, parallel execution plans and executed on large clusters.

...read moreread less

872 citations

Proceedings Article•DOI•

Map-reduce-merge: simplified relational data processing on large clusters

[...]

Hung-chih Yang¹, Ali Dasdan¹, Ruey-Lung Hsiao², D. Stott Parker²•Institutions (2)

Yahoo!¹, University of California, Los Angeles²

11 Jun 2007

TL;DR: A Merge phase is added to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted by map and reduce modules, and it is demonstrated that this new model can express relational algebra operators as well as implement several join algorithms.

...read moreread less

Abstract: Map-Reduce is a programming model that enables easy development of scalable parallel applications to process a vast amount of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of many real-world tasks such as data processing jobs for search engines and machine learning. However,this model does not directly support processing multiple related heterogeneous datasets. While processing relational data is a common need, this limitation causes difficulties and/or inefficiency when Map-Reduce is applied on relational operations like joins. We improve Map-Reduce into a new model called Map-Reduce-Merge. It adds to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted (or hashed) by map and reduce modules. We also demonstrate that this new model can express relational algebra operators as well as implement several join algorithms.

...read moreread less

821 citations

Book Chapter•DOI•

Query processing in spatial network databases

[...]

Dimitris Papadias¹, Jun Zhang¹, Nikos Mamoulis², Yufei Tao³•Institutions (3)

Hong Kong University of Science and Technology¹, University of Hong Kong², City University of Hong Kong³

09 Sep 2003

TL;DR: A Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space are developed and applied to the most popular spatial queries.

...read moreread less

Abstract: Despite the importance of spatial networks in real-life applications, most of the spatial database literature focuses on Euclidean spaces. In this paper we propose an architecture that integrates network and Euclidean information, capturing pragmatic constraints. Based on this architecture, we develop a Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space. These frameworks are successfully applied to the most popular spatial queries, namely nearest neighbors, range search, closest pairs and e-distance joins, in the context of spatial network databases.

...read moreread less

675 citations

Collapse

Network Information

Performance

Metrics

2,691

Papers

68,596

Citations

No. of papers in the topic in previous years
Year	Papers
2023	177
2022	282
2021	80
2020	116
2019	99
2018	103

Joins

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics