# Nearest Neighbors Search Using Point Location in Balls with Applications to Approximate Voronoi Decompositions

IBM

^{1}12 Dec 2002-pp 311-323

TL;DR: Improved reductions of the nearest neighbor searching problem to Point Location in Balls are presented by constructing linear size Approximate Voronoi Diagrams while maintaining the logarithmic search time.

Abstract: We present improved reductions of the nearest neighbor searching problem to Point Location in Balls by constructing linear size Approximate Voronoi Diagrams while maintaining the logarithmic search time. We do this first by simplifying the construction of Har-Peled[9] that reduces the number of balls generated by a logarithmic factor to O(n log n). We further reduce the number of balls by a new hierarchical decomposition scheme and a generalization of PLEBs to achieve linear space decomposition for nearest neighbor searching.

##### Citations

More filters

••

[...]

TL;DR: Two algorithms for the approximate nearest neighbor problem in high dimensional spaces for data sets of size n living in IR are presented, achieving query times that are sub-linear in n and polynomial in d.

Abstract: We present two algorithms for the approximate nearest neighbor problem in high dimensional spaces. For data sets of size n living in IR, the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree.

834 citations

### Additional excerpts

[...]

••

[...]

TL;DR: There is a single approach to nearest neighbor searching, which both improves upon existing results and spans the spectrum of space-time tradeoffs, and new algorithms for constructing AVDs and tools for analyzing their total space requirements are provided.

Abstract: Nearest neighbor searching is the problem of preprocessing a set of n point points in d-dimensional space so that, given any query point q, it is possible to report the closest point to q rapidly. In approximate nearest neighbor searching, a parameter e > 0 is given, and a multiplicative error of (1 + e) is allowed. We assume that the dimension d is a constant and treat n and e as asymptotic quantities. Numerous solutions have been proposed, ranging from low-space solutions having space O(n) and query time O(log n + 1/ed−1) to high-space solutions having space roughly O((n log n)/ed) and query time O(log (n/e)).We show that there is a single approach to this fundamental problem, which both improves upon existing results and spans the spectrum of space-time tradeoffs. Given a tradeoff parameter γ, where 2 ≤ γ ≤ 1/e, we show that there exists a data structure of space O(nγd−1 log(1/e)) that can answer queries in time O(log(nγ) + 1/(eγ)(d−1)/2. When γ = 2, this yields a data structure of space O(n log (1/e)) that can answer queries in time O(log n + 1/e(d−1)/2). When γ = 1/e, it provides a data structure of space O((n/ed−1)log(1/e)) that can answer queries in time O(log(n/e)).Our results are based on a data structure called a (t,e)-AVD, which is a hierarchical quadtree-based subdivision of space into cells. Each cell stores up to t representative points of the set, such that for any query point q in the cell at least one of these points is an approximate nearest neighbor of q. We provide new algorithms for constructing AVDs and tools for analyzing their total space requirements. We also establish lower bounds on the space complexity of AVDs, and show that, up to a factor of O(log (1/e)), our space bounds are asymptotically tight in the two extremes, γ = 2 and γ = 1/e.

260 citations

••

[...]

01 Nov 2011

TL;DR: A new data-driven framework for smoothing trajectory data is presented and an algorithm based on this framework is implemented to smooth an entire collection of trajectories and it is shown that it performs well on both synthetic data and massive collections of GPS traces.

Abstract: Motivated by the increasing availability of large collections of noisy GPS traces, we present a new data-driven framework for smoothing trajectory data. The framework, which can be viewed of as a generalization of the classical moving average technique, naturally leads to efficient algorithms for various smoothing objectives. We analyze an algorithm based on this framework and provide connections to previous smoothing techniques. We implement a variation of the algorithm to smooth an entire collection of trajectories and show that it performs well on both synthetic data and massive collections of GPS traces.

33 citations

### Cites background from "Nearest Neighbors Search Using Poin..."

[...]

••

[...]

TL;DR: A data structure is presented that achieves logarithmic query time with storage of only O(1/e(d−1)/2), which matches the worst-case lower bound on the complexity of any e-approximating polytope.

Abstract: In the polytope membership problem, a convex polytope $K$ in $R^d$ is given, and the objective is to preprocess $K$ into a data structure so that, given a query point $q \in R^d$, it is possible to determine efficiently whether $q \in K$. We consider this problem in an approximate setting and assume that $d$ is a constant. Given an approximation parameter $\varepsilon > 0$, the query can be answered either way if the distance from $q$ to $K$'s boundary is at most $\varepsilon$ times $K$'s diameter. Previous solutions to the problem were on the form of a space-time trade-off, where logarithmic query time demands $O(1/\varepsilon^{d-1})$ storage, whereas storage $O(1/\varepsilon^{(d-1)/2})$ admits roughly $O(1/\varepsilon^{(d-1)/8})$ query time. In this paper, we present a data structure that achieves logarithmic query time with storage of only $O(1/\varepsilon^{(d-1)/2})$, which matches the worst-case lower bound on the complexity of any $\varepsilon$-approximating polytope. Our data structure is based on a new technique, a hierarchy of ellipsoids defined as approximations to Macbeath regions.
As an application, we obtain major improvements to approximate Euclidean nearest neighbor searching. Notably, the storage needed to answer $\varepsilon$-approximate nearest neighbor queries for a set of $n$ points in $O(\log \frac{n}{\varepsilon})$ time is reduced to $O(n/\varepsilon^{d/2})$. This halves the exponent in the $\varepsilon$-dependency of the existing space bound of roughly $O(n/\varepsilon^d)$, which has stood for 15 years (Har-Peled, 2001).

17 citations

### Cites background from "Nearest Neighbors Search Using Poin..."

[...]

••

[...]

TL;DR: The purpose of the WCOID-GM is to reduce both the storage requirements and search time and to focus on balancing case retrieval efficiency and competence for a CB, mainly based on the idea that a large CB with weighted features is transformed to a small CB with improving its quality.

Abstract: The success of the Case Based Reasoning system depends on the quality of the case data and the speed of the retrieval process that can be costly in time, especially when the number of cases gets bulky. To guarantee the system@?s quality, maintaining the contents of a case base (CB) becomes unavoidably. In this paper, we propose a novel case base maintenance policy named WCOID-DG: Weighting, Clustering, Outliers and Internal cases Detection based on Dbscan and Gaussian means. Our WCOID-DG policy uses in addition to feature weights and outliers detection methods, a new efficient clustering technique, named DBSCAN-GM (DG) which is a combination of DBSCAN and Gaussian-Means algorithms. The purpose of our WCOID-GM is to reduce both the storage requirements and search time and to focus on balancing case retrieval efficiency and competence for a CB. WCOID-GM is mainly based on the idea that a large CB with weighted features is transformed to a small CB with improving its quality. We support our approach with empirical evaluation using different benchmark data sets to show its competence in terms of shrinking the size of the CB and the research time, as well as, getting satisfying classification accuracy.

15 citations

### Cites methods from "Nearest Neighbors Search Using Poin..."

[...]

##### References

More filters

••

[...]

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.

Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.

4,288 citations

••

[...]

TL;DR: The Voronoi diagram as discussed by the authors divides the plane according to the nearest-neighbor points in the plane, and then divides the vertices of the plane into vertices, where vertices correspond to vertices in a plane.

Abstract: Computational geometry is concerned with the design and analysis of algorithms for geometrical problems. In addition, other more practically oriented, areas of computer science— such as computer graphics, computer-aided design, robotics, pattern recognition, and operations research—give rise to problems that inherently are geometrical. This is one reason computational geometry has attracted enormous research interest in the past decade and is a well-established area today. (For standard sources, we refer to the survey article by Lee and Preparata [19841 and to the textbooks by Preparata and Shames [1985] and Edelsbrunner [1987bl.) Readers familiar with the literature of computational geometry will have noticed, especially in the last few years, an increasing interest in a geometrical construct called the Voronoi diagram. This trend can also be observed in combinatorial geometry and in a considerable number of articles in natural science journals that address the Voronoi diagram under different names specific to the respective area. Given some number of points in the plane, their Voronoi diagram divides the plane according to the nearest-neighbor

3,981 citations

### "Nearest Neighbors Search Using Poin..." refers background in this paper

[...]

••

[...]

TL;DR: In this paper, it was shown that given an integer k ≥ 1, (1 + ϵ)-approximation to the k nearest neighbors of q can be computed in additional O(kd log n) time.

Abstract: Consider a set of S of n data points in real d-dimensional space, Rd, where distances are measured using any Minkowski metric. In nearest neighbor searching, we preprocess S into a data structure, so that given any query point q∈ Rd, is the closest point of S to q can be reported quickly. Given any positive real ϵ, data point p is a (1 +ϵ)-approximate nearest neighbor of q if its distance from q is within a factor of (1 + ϵ) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in Rd in O(dn log n) time and O(dn) space, so that given a query point q ∈ Rd, and ϵ > 0, a (1 + ϵ)-approximate nearest neighbor of q can be computed in O(cd, ϵ log n) time, where cd,ϵ≤d ⌈1 + 6d/ϵ⌉d is a factor depending only on dimension and ϵ. In general, we show that given an integer k ≥ 1, (1 + ϵ)-approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.

2,713 citations

••

[...]

TL;DR: The notion of a well-separated pair decomposition of points in d-dimensional space is defined and the resulting decomposition is applied to the efficient computation of nearest neighbors and body potential fields.

Abstract: We define the notion of a well-separated pair decomposition of points in d-dimensional space We then develop efficient sequential and parallel algorithms for computing such a decomposition We apply the resulting decomposition to the efficient computation of k-nearest neighbors and n-body potential fields

562 citations

### "Nearest Neighbors Search Using Poin..." refers result in this paper

[...]

[...]

••

[...]

TL;DR: Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.

Abstract: We address the problem ofdesigning data structures that allow efficient search f or approximate nearest neighbors. More specifically, given a database consisting ofa set ofvectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L1 norm and in the Hamming cube. Significantly improving and extending recent results ofKleinberg, we construct data structures whose size is polynomial in the size ofthe database and search algorithms that run in time nearly linear or nearly quadratic in the dimension. (Depending on the case, the extra factors are polylogarithmic in the size ofthe database.)

390 citations

### "Nearest Neighbors Search Using Poin..." refers background in this paper

[...]

##### Related Papers (5)

[...]

[...]

[...]