scispace - formally typeset
Search or ask a question
Journal Article

Property Testing and its connection to Learning and Approximation

TL;DR: In this paper, the authors consider the question of determining whether a function f has property P or is e-far from any function with property P. In some cases, it is also allowed to query f on instances of its choice.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.
Citations
More filters
Journal ArticleDOI
25 Jun 2004
TL;DR: This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.
Abstract: We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or − depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of − edges between clusters (equivalently, minimizes the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possibles it can also be viewed as a kind of “agnostic learning” problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg (1996). We also show how to extend some of these results to graphs with edge labels in [−1, +1], and give some results for the case of random noise.

996 citations

Book
12 Dec 2012
TL;DR: Laszlo Lovasz has written an admirable treatise on the exciting new theory of graph limits and graph homomorphisms, an area of great importance in the study of large networks.
Abstract: Recently, it became apparent that a large number of the most interesting structures and phenomena of the world can be described by networks. To develop a mathematical theory of very large networks is an important challenge. This book describes one recent approach to this theory, the limit theory of graphs, which has emerged over the last decade. The theory has rich connections with other approaches to the study of large networks, such as "property testing" in computer science and regularity partition in graph theory. It has several applications in extremal graph theory, including the exact formulations and partial answers to very general questions, such as which problems in extremal graph theory are decidable. It also has less obvious connections with other parts of mathematics (classical and non-classical, like probability theory, measure theory, tensor algebras, and semidefinite optimization). This book explains many of these connections, first at an informal level to emphasize the need to apply more advanced mathematical methods, and then gives an exact development of the theory of the algebraic theory of graph homomorphisms and of the analytic theory of graph limits. This is an amazing book: readable, deep, and lively. It sets out this emerging area, makes connections between old classical graph theory and graph limits, and charts the course of the future. --Persi Diaconis, Stanford University This book is a comprehensive study of the active topic of graph limits and an updated account of its present status. It is a beautiful volume written by an outstanding mathematician who is also a great expositor. --Noga Alon, Tel Aviv University, Israel Modern combinatorics is by no means an isolated subject in mathematics, but has many rich and interesting connections to almost every area of mathematics and computer science. The research presented in Lovasz's book exemplifies this phenomenon. This book presents a wonderful opportunity for a student in combinatorics to explore other fields of mathematics, or conversely for experts in other areas of mathematics to become acquainted with some aspects of graph theory. --Terence Tao, University of California, Los Angeles, CA Laszlo Lovasz has written an admirable treatise on the exciting new theory of graph limits and graph homomorphisms, an area of great importance in the study of large networks. It is an authoritative, masterful text that reflects Lovasz's position as the main architect of this rapidly developing theory. The book is a must for combinatorialists, network theorists, and theoretical computer scientists alike. --Bela Bollobas, Cambridge University, UK

896 citations

Journal ArticleDOI
TL;DR: In Part I of this series, we showed that left convergence is equivalent to convergence in metric, both for simple graphs and for graphs with nodeweights and edgeweights as discussed by the authors.

702 citations

MonographDOI
01 Jan 2014

575 citations

Journal ArticleDOI
TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.
Abstract: We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k e 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m pointss this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we show that the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

523 citations

References
More filters
Proceedings ArticleDOI
05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations

Book ChapterDOI
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

3,939 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

3,760 citations

Book
01 Jan 1988
TL;DR: In this article, the Fulkerson Prize was won by the Mathematical Programming Society and the American Mathematical Society for proving polynomial time solvability of problems in convexity theory, geometry, and combinatorial optimization.
Abstract: This book develops geometric techniques for proving the polynomial time solvability of problems in convexity theory, geometry, and - in particular - combinatorial optimization. It offers a unifying approach based on two fundamental geometric algorithms: - the ellipsoid method for finding a point in a convex set and - the basis reduction method for point lattices. The ellipsoid method was used by Khachiyan to show the polynomial time solvability of linear programming. The basis reduction method yields a polynomial time procedure for certain diophantine approximation problems. A combination of these techniques makes it possible to show the polynomial time solvability of many questions concerning poyhedra - for instance, of linear programming problems having possibly exponentially many inequalities. Utilizing results from polyhedral combinatorics, it provides short proofs of the poynomial time solvability of many combinatiorial optimization problems. For a number of these problems, the geometric algorithms discussed in this book are the only techniques known to derive polynomial time solvability. This book is a continuation and extension of previous research of the authors for which they received the Fulkerson Prize, awarded by the Mathematical Programming Society and the American Mathematical Society.

3,676 citations

Journal ArticleDOI
TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

1,967 citations