Property Testing and its connection to Learning and Approximation

Home
/
Papers
/
Property Testing and its connection to Learning and Approximation

Journal Article•

Property Testing and its connection to Learning and Approximation

Oded Goldreich¹, Shafi Goldwasser², Dana Ron²•Institutions (2)

Weizmann Institute of Science¹, Massachusetts Institute of Technology²

01 Jan 1996-Electronic Colloquium on Computational Complexity-Vol. 3

TL;DR: In this paper, the authors consider the question of determining whether a function f has property P or is e-far from any function with property P. In some cases, it is also allowed to query f on instances of its choice.

read less

Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Correlation Clustering

[...]

Nikhil Bansal¹, Avrim Blum¹, Shuchi Chawla¹•Institutions (1)

Carnegie Mellon University¹

25 Jun 2004

TL;DR: This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.

...read moreread less

Abstract: We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or − depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of − edges between clusters (equivalently, minimizes the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possibles it can also be viewed as a kind of “agnostic learning” problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg (1996). We also show how to extend some of these results to graphs with edge labels in [−1, +1], and give some results for the case of random noise.

...read moreread less

996 citations

Book•

Large Networks and Graph Limits

[...]

László Lovász¹•Institutions (1)

Eötvös Loránd University¹

12 Dec 2012

TL;DR: Laszlo Lovasz has written an admirable treatise on the exciting new theory of graph limits and graph homomorphisms, an area of great importance in the study of large networks.

...read moreread less

Abstract: Recently, it became apparent that a large number of the most interesting structures and phenomena of the world can be described by networks. To develop a mathematical theory of very large networks is an important challenge. This book describes one recent approach to this theory, the limit theory of graphs, which has emerged over the last decade. The theory has rich connections with other approaches to the study of large networks, such as "property testing" in computer science and regularity partition in graph theory. It has several applications in extremal graph theory, including the exact formulations and partial answers to very general questions, such as which problems in extremal graph theory are decidable. It also has less obvious connections with other parts of mathematics (classical and non-classical, like probability theory, measure theory, tensor algebras, and semidefinite optimization). This book explains many of these connections, first at an informal level to emphasize the need to apply more advanced mathematical methods, and then gives an exact development of the theory of the algebraic theory of graph homomorphisms and of the analytic theory of graph limits. This is an amazing book: readable, deep, and lively. It sets out this emerging area, makes connections between old classical graph theory and graph limits, and charts the course of the future. --Persi Diaconis, Stanford University This book is a comprehensive study of the active topic of graph limits and an updated account of its present status. It is a beautiful volume written by an outstanding mathematician who is also a great expositor. --Noga Alon, Tel Aviv University, Israel Modern combinatorics is by no means an isolated subject in mathematics, but has many rich and interesting connections to almost every area of mathematics and computer science. The research presented in Lovasz's book exemplifies this phenomenon. This book presents a wonderful opportunity for a student in combinatorics to explore other fields of mathematics, or conversely for experts in other areas of mathematics to become acquainted with some aspects of graph theory. --Terence Tao, University of California, Los Angeles, CA Laszlo Lovasz has written an admirable treatise on the exciting new theory of graph limits and graph homomorphisms, an area of great importance in the study of large networks. It is an authoritative, masterful text that reflects Lovasz's position as the main architect of this rapidly developing theory. The book is a must for combinatorialists, network theorists, and theoretical computer scientists alike. --Bela Bollobas, Cambridge University, UK

...read moreread less

896 citations

Journal Article•DOI•

Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing

[...]

Christian Borgs¹, Jennifer Chayes¹, László Lovász¹, Vera T. Sós², Katalin Vesztergombi³ - Show less +1 more•Institutions (3)

Microsoft¹, Alfréd Rényi Institute of Mathematics², Eötvös Loránd University³

20 Dec 2008-Advances in Mathematics

TL;DR: In Part I of this series, we showed that left convergence is equivalent to convergence in metric, both for simple graphs and for graphs with nodeweights and edgeweights as discussed by the authors.

...read moreread less

702 citations

Monograph•DOI•

Analysis of Boolean Functions: List of Notation

[...]

Ryan O'Donnell

01 Jan 2014

575 citations

Journal Article•DOI•

Clustering Large Graphs via the Singular Value Decomposition

[...]

Petros Drineas¹, Alan Frieze², Ravi Kannan³, Santosh Vempala⁴, V. Vinay⁵ - Show less +1 more•Institutions (5)

Rensselaer Polytechnic Institute¹, Carnegie Mellon University², Yale University³, Massachusetts Institute of Technology⁴, Indian Institute of Science⁵

25 Jun 2004-Machine Learning

TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.

...read moreread less

Abstract: We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k e 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m pointss this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we show that the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

...read moreread less

523 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A theory of the learnable

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

05 Nov 1984

TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.

...read moreread less

Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

...read moreread less

5,311 citations

Book Chapter•DOI•

On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities

[...]

Vladimir Vapnik, A. Ya. Chervonenkis

01 Jan 1971-Theory of Probability and Its Applications

TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.

...read moreread less

Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

...read moreread less

3,939 citations

Journal Article•DOI•

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

[...]

Herman Chernoff

01 Dec 1952-Annals of Mathematical Statistics

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.

...read moreread less

Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

...read moreread less

3,760 citations

Book•

Geometric Algorithms and Combinatorial Optimization

[...]

Martin Grötschel, László Lovász, Alexander Schrijver

01 Jan 1988

TL;DR: In this article, the Fulkerson Prize was won by the Mathematical Programming Society and the American Mathematical Society for proving polynomial time solvability of problems in convexity theory, geometry, and combinatorial optimization.

...read moreread less

Abstract: This book develops geometric techniques for proving the polynomial time solvability of problems in convexity theory, geometry, and - in particular - combinatorial optimization. It offers a unifying approach based on two fundamental geometric algorithms: - the ellipsoid method for finding a point in a convex set and - the basis reduction method for point lattices. The ellipsoid method was used by Khachiyan to show the polynomial time solvability of linear programming. The basis reduction method yields a polynomial time procedure for certain diophantine approximation problems. A combination of these techniques makes it possible to show the polynomial time solvability of many questions concerning poyhedra - for instance, of linear programming problems having possibly exponentially many inequalities. Utilizing results from polyhedral combinatorics, it provides short proofs of the poynomial time solvability of many combinatiorial optimization problems. For a number of these problems, the geometric algorithms discussed in this book are the only techniques known to derive polynomial time solvability. This book is a continuation and extension of previous research of the authors for which they received the Fulkerson Prize, awarded by the Mathematical Programming Society and the American Mathematical Society.

...read moreread less

3,676 citations

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

[...]

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

...read moreread less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

1,967 citations