Patterns for Indexing Large Datasets

doi:10.1145/3282308.3282314

Proceedings ArticleDOI

Patterns for Indexing Large Datasets

TLDR

In this work, a few basic reusable indexing structures are presented that can be used to create advanced and complexindexing structures with lesser effort and time.

Abstract:

Searching is one of the fundamental tasks in Computer Science. An intuitive way to search is to do it linearly, that is, start at the beginning of the dataset and continue till the searched-for item is found or nothing is found. However, as the volume of data increases, the response time of linear search is no longer acceptable. Indexes are designed to search through massive datasets quickly. There are a number of different ways of building complex and advanced indexes. Appropriate selection and modification of indexing structures according to dynamic business requirements is crucial for data-intensive applications. In this work, we present a few basic reusable indexing structures. These structures can be used to create advanced and complex indexing structures with lesser effort and time.

References

PDF

Open Access

More filters

Journal ArticleDOI

A global geometric framework for nonlinear dimensionality reduction.

Joshua B. Tenenbaum, +2 more

- 22 Dec 2000 -

Science

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.

...read moreread less

Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

Antonin Guttman

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.

...read moreread less

Journal ArticleDOI

Singular value decomposition and least squares solutions

Gene H. Golub, +1 more

- 01 Apr 1970 -

Numerische Mathematik

TL;DR: The decomposition of A is called the singular value decomposition (SVD) and the diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values.

...read moreread less

Book ChapterDOI

The X-tree: an index structure for high-dimensional data

Stefan Berchtold, +2 more

TL;DR: A new organization of the directory is introduced which uses a split algorithm minimizing overlap and additionally utilizes the concept of supernodes to keep the directory as hierarchical as possible, and at the same time to avoid splits in the directory that would result in high overlap.

...read moreread less

Journal ArticleDOI

iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

H. V. Jagadish, +4 more

- 01 Jun 2005 -

ACM Transactions on Database Systems

TL;DR: An efficient B+-tree based indexing method for K-nearest neighbor (KNN) search in a high-dimensional metric space, called iDistance, which partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition.

...read moreread less

Journal of Systems and Software

Order Indexes: supporting highly dynamic hierarchical data in relational main-memory database systems

Jan Finis, +5 more

Patterns for Indexing Large Datasets

References

A global geometric framework for nonlinear dimensionality reduction.

R-trees: a dynamic index structure for spatial searching

Singular value decomposition and least squares solutions

The X-tree: an index structure for high-dimensional data

iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

Related Papers (5)

On the analysis of big data indexing execution strategies

Adaptive indexing in modern database kernels

Time series retrieval: indexing and mining large datasets

ISODAC: A high performance solution for indexing and searching heterogeneous data

Order Indexes: supporting highly dynamic hierarchical data in relational main-memory database systems