Some approaches to best-match file searching

doi:10.1145/362003.362025

Home
/
Papers
/
Some approaches to best-match file searching

Journal Article•DOI•

Some approaches to best-match file searching

W. A. Burkhard¹, Robert M. Keller²•Institutions (2)

University of California, San Diego¹, Princeton University²

01 Apr 1973-Communications of The ACM (ACM)-Vol. 16, Iss: 4, pp 230-236

TL;DR: Three file structures are presented together with their corresponding search algorithms, which are intended to reduce the number of comparisons required to achieve the desired result.

read less

Abstract: The problem of searching the set of keys in a file to find a key which is closest to a given query key is discussed. After “closest,” in terms of a metric on the the key space, is suitably defined, three file structures are presented together with their corresponding search algorithms, which are intended to reduce the number of comparisons required to achieve the desired result. These methods are derived using certain inequalities satisfied by metrics and by graph-theoretic concepts. Some empirical results are presented which compare the efficiency of the methods.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Approximate nearest neighbors: towards removing the curse of dimensionality

[...]

Piotr Indyk¹, Rajeev Motwani¹•Institutions (1)

Stanford University¹

23 May 1998

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.

...read moreread less

Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.

...read moreread less

4,478 citations

Journal Article•DOI•

An Algorithm for Finding Best Matches in Logarithmic Expected Time

[...]

Jerome H. Friedman¹, Jon Louis Bentley², Raphael A. Finkel¹•Institutions (2)

Stanford University¹, University of North Carolina at Chapel Hill²

01 Sep 1977-ACM Transactions on Mathematical Software

TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.

...read moreread less

Abstract: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional to logN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.

...read moreread less

2,910 citations

Cites background from "Some approaches to best-match file ..."

...Burkhard and Keller [2] and later Fukunaga and Narendra [6] described heuristic strategies based on clustering techniques....
[...]
...Burkhard and Keller [2] and later Fukunaga and Narendra [6] described heuristic strategies based on clustering techniques....
[...]

Journal Article•DOI•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

...read moreread less

2,184 citations

Journal Article•DOI•

Searching in metric spaces

[...]

Edgar Chávez¹, Gonzalo Navarro², Ricardo Baeza-Yates², Jose L. Marroquin³•Institutions (3)

Universidad Michoacana de San Nicolás de Hidalgo¹, University of Chile², Centro de Investigación en Matemáticas³

01 Sep 2001-ACM Computing Surveys

TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".

...read moreread less

Abstract: The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.

...read moreread less

1,337 citations

Cites background or methods or result from "Some approaches to best-match file ..."

...For example, in BKTs and FQTs we can begin at the root and measure i =d( p, q)....
[...]
...On the right, the .rst level of a BKT with u11 as root....
[...]
...BKT....
[...]
...The same effect would be obtained if we had a mixture between BKTs and FQTs, so that for k levels we had .xed keys per level, and then we allowed a different key per node of the level k + 1, continuing the process recursively on each subtree of the level k + 1....
[...]
...Note that, historically, FQTs and FHQTs are an evolution over BKTs. 8.2....
[...]

Proceedings Article•DOI•

Data structures and algorithms for nearest neighbor search in general metric spaces

[...]

Peter N. Yianilos¹•Institutions (1)

Princeton University¹

01 Jan 1993

TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.

...read moreread less

Abstract: We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search nroblems. Tree construcI tion executes in O(nlog(n i ) time, and search is under certain circumstances and in the imit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.

...read moreread less

1,145 citations

Cites background from "Some approaches to best-match file ..."

...The ZPS distribution restriction is key to achieving them; and our overall outlook in which nite cases are imagined to be drawn from a larger more continuous space, distinguishes in part this work from the discrete distance setting of [7, 11]....
[...]
...This work is thus highly related to the constructions of [7]....
[...]
...Burkhard and Keller in [7] present three le structures for nearest neighbor retrieval....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

Collapse

References

PDF

Open Access

More filters

Book•

Perceptrons: An Introduction to Computational Geometry

[...]

Marvin Minsky, Seymour A. Papert

01 Jan 1969

TL;DR: The aim of this book is to seek general results from the close study of abstract version of devices known as perceptrons.

...read moreread less

Abstract: Cambridge, Mass.: MIT Press, 1972. 2nd. ed. The book's aim is to seek general results from the close study of abstract version of devices known as perceptrons

...read moreread less

3,004 citations

"Some approaches to best-match file ..." refers background in this paper

...The problem has been discussed in [ 3 ], but no solutions proposed....
[...]
...Minsky and Papert [ 3 ] refer to this as the "best match" problem and comment on its...
[...]
...One use concerns keys which are possible outcomes of tests in large switching networks, such as the Bell System No. 1 ss [ 3 ]....
[...]

Book•

Shift register sequences

[...]

Solomon W. Golomb

01 Jun 1981

TL;DR: The Revised Edition of Shift Register Sequences contains a comprehensive bibliography of some 400 entries which cover the literature concerning the theory and applications of shift register sequences.

...read moreread less

Abstract: From the Publisher: Shift register sequences are used in a broad range of applications, particularly in random number generation, multiple access and polling techniques, secure and privacy communication systems, error detecting and correcting codes, and synchronization pattern generation, as well as in modern cryptographic systems. The first edition of Shift Register Sequences, published in 1967, has been for many years the definitive work on this subject. In the revised edition, Dr. Golomb has added valuable supplemental material. The Revised Edition contains a comprehensive bibliography of some 400 entries which cover the literature concerning the theory and applications of shift register sequences. Written in a clear and lucid style, Dr. Golomb's approach is completely mathematical with rigorous proofs of all assertions. The proofs, however, may be omitted without loss of continuity by the reader who is interested only in results. Dr. Golomb is considered one of the foremost experts in the world with respect to combinatorial and geometrical aspects of coded communications.

...read moreread less

2,501 citations

Journal Article•DOI•

An Analysis of Some Graph Theoretical Cluster Techniques

[...]

J. Gary Augustson¹, Jack Minker¹•Institutions (1)

University of Maryland, College Park¹

01 Oct 1970-Journal of the ACM

TL;DR: Several graph theoretic cluster techniques aimed at the automatic generation of thesauri for information retrieval systems are explored and two algorithms have been tested that find maximal complete subgraphs.

...read moreread less

Abstract: Several graph theoretic cluster techniques aimed at the automatic generation of thesauri for information retrieval systems are explored. Experimental cluster analysis is performed on a sample corpus of 2267 documents. A term-term similarity matrix is constructed for the 3950 unique terms used to index the documents. Various threshold values, T, are applied to the similarity matrix to provide a series of binary threshold matrices. The corresponding graph of each binary threshold matrix is used to obtain the term clusters.Three definitions of a cluster are analyzed: (1) the connected components of the threshold matrix; (2) the maximal complete subgraphs of the connected components of the threshold matrix; (3) clusters of the maximal complete subgraphs of the threshold matrix, as described by Gotlieb and Kumar.Algorithms are described and analyzed for obtaining each cluster type. The algorithms are designed to be useful for large document and index collections. Two algorithms have been tested that find maximal complete subgraphs. An algorithm developed by Bierstone offers a significant time improvement over one suggested by Bonner.For threshold levels T ≥ 0.6, basically the same clusters are developed regardless of the cluster definition used. In such situations one need only find the connected components of the graph to develop the clusters.

...read moreread less

241 citations

"Some approaches to best-match file ..." refers methods in this paper

...An algorithm known as the Bierstone algorithm for computing the set of all cliques of an undirected graph is given in [ 8 , 9]....
[...]

Journal Article•DOI•

Scatter storage techniques

[...]

Robert Morris¹•Institutions (1)

Bell Labs¹

01 Jan 1968-Communications of The ACM

TL;DR: L'article donne une presentation didactique sur les methodes connues utilisees par ceux qui ecrivent les assembleurs and compilateurs de maniere a reduire les temps de recherche dans les tables de symboles.

...read moreread less

Abstract: On rencontre de temps a autre, un article qui resume un nouveau domaine de recherche, qui eclaire les principaux resultats et les rend plus evidents. L'article de Morris est de ce type. L'article donne une presentation didactique sur les methodes connues utilisees par ceux qui ecrivent les assembleurs et compilateurs de maniere a reduire les temps de recherche dans les tables de symboles

...read moreread less

218 citations

Journal Article•DOI•

Corrections to Bierstone's Algorithm for Generating Cliques

[...]

Gordon D. Mulligan¹, Derek G. Corneil¹•Institutions (1)

University of Toronto¹

01 Apr 1972-Journal of the ACM

TL;DR: The counterexamples to their and the modified version of the Bierstone algorithm for finding the set of cliques of a finite undirected linear graph are presented.

...read moreread less

Abstract: Recently Augustson and Minker presented a version of the Bierstone algorithm for finding the set of cliques of a finite undirected linear graph. Their version contains two errors. In this paper the counterexamples to their version and the modified version of the Bierstone algorithm are presented.

...read moreread less

75 citations