ź-nets and simplex range queries

doi:10.1007/BF02187876

Home
/
Papers
/
ź-nets and simplex range queries

Journal Article•DOI•

ź-nets and simplex range queries

David Haussler¹, Emo Welzl²•Institutions (2)

University of California, Santa Cruz¹, University of Graz²

01 Dec 1987-Discrete and Computational Geometry (Springer New York)-Vol. 2, Iss: 1, pp 127-151

TL;DR: The concept of an ɛ-net of a set of points for an abstract set of ranges is introduced and sufficient conditions that a random sample is an Ã‚-net with any desired probability are given.

read less

Abstract: We demonstrate the existence of data structures for half-space and simplex range queries on finite point sets ind-dimensional space,dÂ?2, with linear storage andO(nÂ?) query time, $$\alpha = \frac{{d(d - 1)}}{{d(d - 1) + 1}} + \gamma for all \gamma > 0$$ . These bounds are better than those previously published for alldÂ?2. Based on ideas due to Vapnik and Chervonenkis, we introduce the concept of an Â?-net of a set of points for an abstract set of ranges and give sufficient conditions that a random sample is an Â?-net with any desired probability. Using these results, we demonstrate how random samples can be used to build a partition-tree structure that achieves the above query time.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Sign rank versus Vapnik-Chervonenkis dimension

[...]

Noga Alon, Shay Moran¹, Amir Yehudayoff•Institutions (1)

Max Planck Society¹

31 Dec 2017-Sbornik Mathematics

TL;DR: This work studies the maximum possible sign rank of sign (N ×N)-matrices with a given Vapnik-Chervonenkis dimension d, and designs an efficient algorithm that provides an O(N/ log(N) multiplicative approximation for the sign rank.

...read moreread less

Abstract: This work studies the maximum possible sign rank of sign (N ×N)-matrices with a given Vapnik-Chervonenkis dimension d. For d = 1, this maximum is three. For d = 2, this maximum is Θ̃(N). For d > 2, similar but slightly less accurate statements hold. The lower bounds improve on previous ones by Ben-David et al., and the upper bounds are novel. The lower bounds are obtained by probabilistic constructions, using a theorem of Warren in real algebraic topology. The upper bounds are obtained using a result of Welzl about spanning trees with low stabbing number, and using the moment curve. The upper bound technique is also used to: (i) provide estimates on the number of classes of a given Vapnik-Chervonenkis dimension, and the number of maximum classes of a given Vapnik-Chervonenkis dimension— answering a question of Frankl from 1989, and (ii) design an efficient algorithm that provides an O(N/ log(N)) multiplicative approximation for the sign rank. We also observe a general connection between sign rank and spectral gaps which is based on Forster’s argument. Consider the adjacency (N×N)matrix of a ∆-regular graph with a second eigenvalue of absolute value λ and ∆ 6 N/2. We show that the sign rank of the signed version of this matrix is at least ∆/λ. We use this connection to prove the existence of a maximum class C ⊆ {±1} with Vapnik-Chervonenkis dimension 2 and sign rank Θ̃(N). This answers a question of Ben-David et al. regarding the sign rank of large Vapnik-Chervonenkis classes. We also describe limitations of this approach, in the spirit of the Alon-Boppana theorem. We further describe connections to communication complexity, geometry, learning theory, and combinatorics. Bibliography: 69 titles.

...read moreread less

2 citations

Cites background from "ź-nets and simplex range queries"

...The Vapnik-Chervonenkis dimension captures the size of the minimum ε-net for the underlying set system (see [38] and [42])....
[...]

Dissertation•

Geometric Optimization for Classification Problems.

[...]

Pablo Pérez Lantero

07 Jun 2013

TL;DR: This thesis solves some natural variants of the so-called "Maximum Box Problem" by considering: two boxes (one per class), the minimum number of boxes to cover a class, or the maximum box in kinetic scenarios.

...read moreread less

Abstract: Data Mining is a relevant discipline in Computer Science, the main goal of which is to explore data and extract information that is potentially useful and previously unknown, By using mathematical tools, such as Operation Research, Statistics, Artificial Intelligence and more recently Computational Geometry, Data Mining solves problems in many areas where there are big databases. Within Computational Geometry, the techniques of Geometric Optimization can be applied to solve many problems in this field. Typically, problems in Data Mining concern data belonging to two classes, say red and blue, and mainly appear in important subareas such as the classification of new data and the recognition of patterns. This thesis focuses on the study of optimization problems with application in data classification and pattern recognition. In all of them, we are given a two-class data set represented as red and blue points in the plane, and the objective is to find simple geometric shapes meeting some requirements for classification. The problems are approached from the Computational Geometry point of view, and efficient algorithms that use the inherent geometry of the problems are proposed. A crucial problem in Data Mining is the so-called "Maximum Box Problem", where the geometric shape to be found is a maximum box, that is, an axis-aligned rectangle containing the maximum number of elements of only one class in the given data set. This thesis solves some natural variants of this basic problem by considering: two boxes (one per class), the minimum number of boxes to cover a class, or the maximum box in kinetic scenarios. Commonly, classification methods suppose a "good" data distribution, so a clustering procedure can be applied. However, if the classes are "well mixed", a clustering for selecting prototypes that represent a class is not possible. In that sense, this thesis studies a new parameter to measure, a priori, if a given two-class data set is suitable or not for classification.

...read moreread less

2 citations

Cites background or methods from "ź-nets and simplex range queries"

...In fact, a random sample of X of this size is an ε-net with high probability [70]....
[...]
...This result is based on the fact that, for every range space with finite VC-dimension d, there exists an ε-net of size O(dε log d ε ) [70]....
[...]
...We review the theory of ε-nets, which has strong applications to the class cover problem [25, 35, 70, 105], and show that our problem admits an O(log c)-approximation, where c is the size of an optimal covering....
[...]
...We proved the NPhardness by a reduction from the Rectilinear Polygon Covering Problem [39], and showed that there is an O(log c)-approximation algorithm due to known results on ε-nets [25, 70, 105], where c is the size of an optimal covering....
[...]
...nets, as candidate hitting sets, and it works for range spaces with finite VCdimension [25, 70, 105]....
[...]

Journal Article•DOI•

On the VC-Dimension of Unique Round-Trip Shortest Path Systems.

[...]

Chun Jiang Zhu¹, Kam-Yiu Lam², Joseph Kee-Yin Ng³, Jinbo Bi¹•Institutions (3)

University of Connecticut¹, City University of Hong Kong², Hong Kong Baptist University³

10 Jan 2019-Information Processing Letters

TL;DR: This paper proves that the VC-dimension of URTSP is at most 32, and applies the result to the minimum k-round-trip shortest path cover problem (k-RTSPC), which is to find for a directed graph a minimum vertex set to intersect every round- Trip shortest path containing at least k vertices, and derive an upper bound on the size of the vertex set.

...read moreread less

2 citations

Posted Content•

Small Approximate Pareto Sets for Bi-objective Shortest Paths and Other Problems

[...]

Ilias Diakonikolas¹, Mihalis Yannakakis¹•Institutions (1)

Columbia University¹

17 May 2008-arXiv: Data Structures and Algorithms

TL;DR: It is shown that for a broad class of bi-objective problems, one can compute in polynomial time an i-Pareto set that contains at most twice as many solutions as the minimum such set, and that the factor of 2 is tight for these problems, i.e., it is NP-hard to do better.

...read moreread less

Abstract: We investigate the problem of computing a minimum set of solutions that approximates within a specified accuracy $\epsilon$ the Pareto curve of a multiobjective optimization problem. We show that for a broad class of bi-objective problems (containing many important widely studied problems such as shortest paths, spanning tree, and many others), we can compute in polynomial time an $\epsilon$-Pareto set that contains at most twice as many solutions as the minimum such set. Furthermore we show that the factor of 2 is tight for these problems, i.e., it is NP-hard to do better. We present upper and lower bounds for three or more objectives, as well as for the dual problem of computing a specified number $k$ of solutions which provide a good approximation to the Pareto curve.

...read moreread less

2 citations

Additional excerpts

...A set N ⊆ T is called an 1/r-net for (T, R) [ HW ], if N ∩ S 6= ∅ for all S ∈ R having |S| > |T |/r....
[...]

Proceedings Article•DOI•

Predicting {0,1}-functions on randomly drawn points

[...]

David Haussler, Nick Littlestone, Manfred K. Warmuth

01 Dec 1988

TL;DR: In this paper, the authors consider the problem of predicting {0, 1}-valued functions on Rn and smaller domains, based on their values on randomly drawn points, and construct prediction strategies that are optimal to within a constant factor for any reasonable class F of target functions.

...read moreread less

Abstract: We consider the problem of predicting {0, 1}-valued functions on Rn and smaller domains, based on their values on randomly drawn points. Our model is related to Valiant′s PAC learning model, but does not require the hypotheses used for prediction to be represented in any specified form. In our main result we show how to construct prediction strategies that are optimal to within a constant factor for any reasonable class F of target functions. This result is based on new combinatorial results about classes of functions of finite VC dimension. We also discuss more computationally efficient algorithms for predicting indicator functions of axis-parallel rectangles, more general intersection closed concept classes, and halfspaces in Rn. These are also optimal to within a constant factor. Finally, we compare the general performance of prediction strategies derived by our method to that of those derived from methods in PAC learning theory.

...read moreread less

2 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
…
127
128
129
130
131
132
133
…
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities

[...]

Vladimir Vapnik, A. Ya. Chervonenkis

01 Jan 1971-Theory of Probability and Its Applications

TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.

...read moreread less

Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

...read moreread less

3,939 citations

"ź-nets and simplex range queries" refers background or methods or result in this paper

...The drawback is that the constants, if deri~,ed from the results in [ 17 ], can be quite large....
[...]
...More generally, we characterize the classes of ranges for which there exists a function f(E) for e S0 such that any finite point set A has an e-net of size f(e), independently of the size of A. These are precisely the classes of ranges with finite Vapnik-Chervonenkis dimension, known as Vapnik-Chervonenkis classes [ 17 ], [9], [19], [1]....
[...]
...The key concepts and proof techniques of this section are based on the pioneering work of Vapnik and Chervonenkis [ 17 ]....
[...]
...Example 5. Let A be a set of n points in E 2. Since the dimension of (E 2, H~-) is 2, the results in [ 17, Theorem 2 ] show that there exists a 0.01-approximation V of A for positive half-planes (and thus for all half-planes) with I VI = 2,525,039....
[...]
...Using the related notion of an e-approxirnation (directly from [ 17 ]), we also point out trivial data structures of constant size that give approximate solutions to the counting problem for halfspaces in constant time (compare [13])....
[...]

Book•

Algorithms in Combinatorial Geometry

[...]

Herbert Edelsbrunner¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1987

TL;DR: This book offers a modern approach to computational geo- metry, an area thatstudies the computational complexity of geometric problems with an important role in this study.

...read moreread less

Abstract: This book offers a modern approach to computational geo- metry, an area thatstudies the computational complexity of geometric problems. Combinatorial investigations play an important role in this study.

...read moreread less

2,284 citations

"ź-nets and simplex range queries" refers background in this paper

...We conclude this section by examining the relationship between the notion of an e-net and the established notion of a centerpoint [21], [11] in combinatorial geometry....
[...]
..., [11] for a general treatment of arrangements....
[...]

Journal Article•DOI•

On the density of families of sets

[...]

Norbert Sauer¹•Institutions (1)

University of Calgary¹

01 Jul 1972-Journal of Combinatorial Theory, Series A

TL;DR: This paper will answer the question in the affirmative by determining the exact upper bound of T if T is a family of subsets of some infinite set S then either there exists to each number n a set A ⊂ S with |A| = n such that |T ∩ A| = 2n or there exists some number N such that •A| c for each A⩾ N and some constant c.

...read moreread less

1,029 citations

"ź-nets and simplex range queries" refers background in this paper

...Now the assertion can be seen as the dual formulation of Caratheodry's theorem (see [ 15 ], Theorem 2.3.5), which states that if a point x is in the convex hull of a set A in E d, then there exists a subset A' of A such that JA'I -< d + 1 and x is in the convex hull of A'. []...
[...]

Journal Article•DOI•

Central Limit Theorems for Empirical Measures

[...]

Richard M. Dudley

01 Dec 1978-Annals of Probability

TL;DR: In this article, the convergence of a stochastic process indexed by a Gaussian process to a certain Gaussian processes indexed by the supremum norm was studied in a Donsker class.

...read moreread less

Abstract: Let $(X, \mathscr{A}, P)$ be a probability space. Let $X_1, X_2,\cdots,$ be independent $X$-valued random variables with distribution $P$. Let $P_n := n^{-1}(\delta_{X_1} + \cdots + \delta_{X_n})$ be the empirical measure and let $ u_n := n^\frac{1}{2}(P_n - P)$. Given a class $\mathscr{C} \subset \mathscr{a}$, we study the convergence in law of $ u_n$, as a stochastic process indexed by $\mathscr{C}$, to a certain Gaussian process indexed by $\mathscr{C}$. If convergence holds with respect to the supremum norm $\sup_{C \in \mathscr{C}}|f(C)|$, in a suitable (usually nonseparable) function space, we call $\mathscr{C}$ a Donsker class. For measurability, $X$ may be a complete separable metric space, $\mathscr{a} =$ Borel sets, and $\mathscr{C}$ a suitable collection of closed sets or open sets. Then for the Donsker property it suffices that for some $m$, and every set $F \subset X$ with $m$ elements, $\mathscr{C}$ does not cut all subsets of $F$ (Vapnik-Cervonenkis classes). Another sufficient condition is based on metric entropy with inclusion. If $\mathscr{C}$ is a sequence $\{C_m\}$ independent for $P$, then $\mathscr{C}$ is a Donsker class if and only if for some $r, \sigma_m(P(C_m)(1 - P(C_m)))^r < \infty$.

...read moreread less

555 citations

Journal Article•DOI•

The power of geometric duality

[...]

Bernard Chazelle¹, Leonidas J. Guibas², Der-Tsai Lee³•Institutions (3)

Brown University¹, PARC², Northwestern University³

01 Jun 1985-Bit Numerical Mathematics

TL;DR: A new formulation of the notion of duality that allows the unified treatment of a number of geometric problems is used, to solve two long-standing problems of computational geometry and to obtain a quadratic algorithm for computing the minimum-area triangle with vertices chosen amongn points in the plane.

...read moreread less

Abstract: This paper uses a new formulation of the notion of duality that allows the unified treatment of a number of geometric problems. In particular, we are able to apply our approach to solve two long-standing problems of computational geometry: one is to obtain a quadratic algorithm for computing the minimum-area triangle with vertices chosen amongn points in the plane; the other is to produce an optimal algorithm for the half-plane range query problem. This problem is to preprocessn points in the plane, so that given a test half-plane, one can efficiently determine all points lying in the half-plane. We describe an optimalO(k + logn) time algorithm for answering such queries, wherek is the number of points to be reported. The algorithm requiresO(n) space andO(n logn) preprocessing time. Both of these results represent significant improvements over the best methods previously known. In addition, we give a number of new combinatorial results related to the computation of line arrangements.

...read moreread less

286 citations

"ź-nets and simplex range queries" refers methods in this paper

...It should be noted that better bounds are possible for reporting in two dimensions (specifically O(log n + t) time, where t is the number of points reported [3]), but these techniques only work for half-planes....
[...]