scispace - formally typeset
Search or ask a question
Author

Joshua Brody

Bio: Joshua Brody is an academic researcher from Swarthmore College. The author has contributed to research in topics: Upper and lower bounds & Communication complexity. The author has an hindex of 16, co-authored 38 publications receiving 833 citations. Previous affiliations of Joshua Brody include Tsinghua University & Massachusetts Institute of Technology.

Papers
More filters
Journal ArticleDOI
08 Jun 2011
TL;DR: In this article, a technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity, was developed, which is general and implies a number of new testing bounds, as well as simpler proofs of several known bounds.
Abstract: We develop a new technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity. We give a simple scheme for reducing communication problems to testing problems, thus allowing us to use known lower bounds in communication complexity to prove lower bounds in testing. This scheme is general and implies a number of new testing bounds, as well as simpler proofs of several known bounds. For the problem of testing whether a boolean function is k-linear (a parity function on k variables), we achieve a lower bound of Omega(k) queries, even for adaptive algorithms with two-sided error, thus confirming a conjecture of Goldreich (2010). The same argument behind this lower bound also implies a new proof of known lower bounds for testing related classes such as k-juntas. For some classes, such as the class of monotone functions and the class of s-sparse GF(2) polynomials, we significantly strengthen the best known bounds.

150 citations

Journal Article
TL;DR: A new technique for proving lower bounds in property testing is developed, by showing a strong connection between testing and communication complexity, and significantly strengthens the best known bounds.
Abstract: We develop a new technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity. We give a simple scheme for reducing communication problems to testing problems, thus allowing us to use known lower bounds in communication complexity to prove lower bounds in testing. This scheme is general and implies a number of new testing bounds, as well as simpler proofs of several known bounds. For the problem of testing whether a boolean function is k-linear (a parity function on k variables), we achieve a lower bound of Omega(k) queries, even for adaptive algorithms with two-sided error, thus confirming a conjecture of Goldreich (2010). The same argument behind this lower bound also implies a new proof of known lower bounds for testing related classes such as k-juntas. For some classes, such as the class of monotone functions and the class of s-sparse GF(2) polynomials, we significantly strengthen the best known bounds.

107 citations

Book ChapterDOI
06 Jul 2009
TL;DR: These are the first nontrivial algorithms for distributed monitoring of non-monotone functions when f is either H, the empirical Shannon entropy of a stream, or any of a related class of entropy functions (Tsallis entropies).
Abstract: The notion of distributed functional monitoring was recently introduced by Cormode, Muthukrishnan and Yi to initiate a formal study of the communication cost of certain fundamental problems arising in distributed systems, especially sensor networks. In this model, each of k sites reads a stream of tokens and is in communication with a central coordinator, who wishes to continuously monitor some function f of *** , the union of the k streams. The goal is to minimize the number of bits communicated by a protocol that correctly monitors f (*** ), to within some small error. As in previous work, we focus on a threshold version of the problem, where the coordinator's task is simply to maintain a single output bit, which is 0 whenever f (*** ) ≤ *** (1 *** *** ) and 1 whenever f (*** ) *** *** . Following Cormode et al., we term this the (k ,f ,*** ,*** ) functional monitoring problem. In previous work, some upper and lower bounds were obtained for this problem, with f being a frequency moment function, e.g., F 0 , F 1 , F 2 . Importantly, these functions are monotone . Here, we further advance the study of such problems, proving three new classes of results. First, we provide nontrivial monitoring protocols when f is either H , the empirical Shannon entropy of a stream, or any of a related class of entropy functions (Tsallis entropies). These are the first nontrivial algorithms for distributed monitoring of non-monotone functions. Second, we study the effect of non-monotonicity of f on our ability to give nontrivial monitoring protocols, by considering f = F p with deletions allowed, as well as f = H . Third, we prove new lower bounds on this problem when f = F p , for several values of p .

100 citations

Proceedings ArticleDOI
23 Oct 2010
TL;DR: It is proved that in order to succeed in this model of read-once width-$w branching programs, $\beta$ must be at least $1/ (\log n)^{\Theta(w)}$.
Abstract: The \emph{Coin Problem} is the following problem: a coin is given, which lands on head with probability either $1/2 + \beta$ or $1/2 - \beta$. We are given the outcome of $n$ independent tosses of this coin, and the goal is to guess which way the coin is biased, and to answer correctly with probability $\ge 2/3$. When our computational model is unrestricted, the majority function is optimal, and succeeds when $\beta \ge c /\sqrt{n}$ for a large enough constant $c$. The coin problem is open and interesting in models that cannot compute the majority function. In this paper we study the coin problem in the model of \emph{read-once width-$w$ branching programs}. We prove that in order to succeed in this model, $\beta$ must be at least $1/ (\log n)^{\Theta(w)}$. For constant $w$ this is tight by considering the recursive tribes function, and for other values of $w$ this is nearly tight by considering other read-once AND-OR trees. We generalize this to a \emph{Dice Problem}, where instead of independent tosses of a coin we are given independent tosses of one of two $m$-sided dice. We prove that if the distributions are too close and the mass of each side of the dice is not too small, then the dice cannot be distinguished by small-width read-once branching programs. We suggest one application for this kind of theorems: we prove that Nisan's Generator fools width-$w$ read-once \emph{regular} branching programs, using seed length $O(w^4 \log n \log \log n + \log n \log (1/\eps))$. For $w=\eps=\Theta(1)$, this seed length is $O(\log n \log \log n)$. The coin theorem and its relatives might have other connections to PRGs. This application is related to the independent, but chronologically-earlier, work of Braver man, Rao, Raz and Yehudayoff~\cite{BRRY}.

96 citations

Proceedings ArticleDOI
15 Jul 2014
TL;DR: A smooth communication/round tradeoff is given which shows that with O(log* k) rounds, O(k) bits of communication is possible, which improves upon the trivial protocol by an order of magnitude.
Abstract: We consider the following fundamental communication problem - there is data that is distributed among servers, and the servers want to compute the intersection of their data sets, e.g., the common records in a relational database. They want to do this with as little communication and as few messages (rounds) as possible. They are willing to use randomization, and fail with a tiny probability. Given a protocol for computing the intersection, it can also be used to compute the exact Jaccard similarity, the rarity, the number of distinct elements, and joins between databases. Computing the intersection is at least as hard as the set disjointness problem, which asks whether the intersection is empty. Formally, in the two-server setting, the players hold subsets S, T ⊆ [n]. In many realistic scenarios, the sizes of S and T are significantly smaller than n, so we impose the constraint that |S|, |T| ≤ k. We study the minimum number of bits the parties need to communicate in order to compute the intersection set S ∩ T, given a certain number r of messages that are allowed to be exchanged. While O(k log (n/k)) bits is achieved trivially and deterministically with a single message, we ask what is possible with more than one message and with randomization. We give a smooth communication/round tradeoff which shows that with O(log* k) rounds, O(k) bits of communication is possible, which improves upon the trivial protocol by an order of magnitude. This is in contrast to other basic problems such as computing the union or symmetric difference, for which Ω(k log(n/k)) bits of communication is required for any number of rounds. For two players, known lower bounds for the easier problem of set disjointness imply our algorithms are optimal up to constant factors in communication and number of rounds. We extend our protocols to $m$-player protocols, obtaining an optimal O(mk) bits of communication with a similarly small number of rounds.

34 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Journal Article
TL;DR: In this paper, the authors consider the question of determining whether a function f has property P or is e-far from any function with property P. In some cases, it is also allowed to query f on instances of its choice.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.

870 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: The first optimal algorithm for estimating the number of distinct elements in a data stream is given, closing a long line of theoretical research on this problem, and has optimal O(1) update and reporting times.
Abstract: We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,...,n}, our algorithm computes a (1 ± e)-approximation using an optimal O(1/e-2 + log(n)) bits of space with 2/3 success probability, where 0

378 citations

Book ChapterDOI
01 Jan 1996

378 citations

Book
01 Nov 2017
TL;DR: In this article, a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions are presented.
Abstract: Property testing is concerned with the design of super-fast algorithms for the structural analysis of large quantities of data. The aim is to unveil global features of the data, such as determining whether the data has a particular property or estimating global parameters. Remarkably, it is possible for decisions to be made by accessing only a small portion of the data. Property testing focuses on properties and parameters that go beyond simple statistics. This book provides an extensive and authoritative introduction to property testing. It provides a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions.

343 citations