scispace - formally typeset
Search or ask a question

Showing papers by "Eric Blais published in 2016"


Proceedings ArticleDOI
19 Jun 2016
TL;DR: In this paper, it was shown that there is an exponential gap between the query complexity of adaptive and non-adaptive algorithms for testing regular linear threshold functions (LTFs) for monotonicity.
Abstract: We show that every algorithm for testing n-variate Boolean functions for monotonicityhas query complexity Ω(n1/4). All previous lower bounds for this problem were designed for non-adaptive algorithms and, as a result, the best previous lower bound for general (possibly adaptive) monotonicity testers was only Ω(logn). Combined with the query complexity of the non-adaptive monotonicity tester of Khot, Minzer, and Safra (FOCS 2015), our lower bound shows that adaptivity can result in at most a quadratic reduction in the query complexity for testing monotonicity. By contrast, we show that there is an exponential gap between the query complexity of adaptive and non-adaptive algorithms for testing regular linear threshold functions (LTFs) for monotonicity. Chen, De, Servedio, and Tan (STOC 2015)recently showed that non-adaptive algorithms require almost Ω(n1/2) queries for this task. We introduce a new adaptive monotonicity testing algorithm which has query complexity O(logn) when the input is a regular LTF.

63 citations


Proceedings ArticleDOI
16 Sep 2016
TL;DR: A novel mathematical model for performance-relevant, or quantitative in general, feature interactions, based on the theory of Boolean functions is proposed, which provides two algorithms for detecting all such interactions with little measurement effort and potentially guaranteed accuracy and confidence level.
Abstract: Modern software systems have grown significantly in their size and complexity, therefore understanding how software systems behave when there are many configuration options, also called features, is no longer a trivial task. This is primarily due to the potentially complex interactions among the features. In this paper, we propose a novel mathematical model for performance-relevant, or quantitative in general, feature interactions, based on the theory of Boolean functions. Moreover, we provide two algorithms for detecting all such interactions with little measurement effort and potentially guaranteed accuracy and confidence level. Empirical results on real-world configurable systems demonstrated the feasibility and effectiveness of our approach.

14 citations


Proceedings Article
06 Jun 2016
TL;DR: It is shown that it is possible to learn k-junta distributions with respect to the uniform distribution over the Boolean hypercube in time poly(n, 1/ ).
Abstract: We consider the problem of learning distributions in the presence of irrelevant features. This problem is formalized by introducing a new notion of k-junta distributions. Informally, a distribution D over the domain X is a k-junta distribution with respect to another distribution U over the same domain if there is a set J ⊆ [n] of size |J | ≤ k that captures the difference between D and U . We show that it is possible to learn k-junta distributions with respect to the uniform distribution over the Boolean hypercube {0, 1} in time poly(n, 1/ ). This result is obtained via a new Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution with respect to the uniform distribution. We give a nearly-optimal algorithm for this task. Both the analysis of the algorithm and the lower bound showing its optimality are obtained by establishing connections between the problem of testing junta distributions and testing uniformity of weighted collections of distributions.

14 citations


Posted Content
TL;DR: In this article, the authors characterize the set of properties of Boolean-valued functions on a finite domain that are testable with a constant number of samples, and obtain a number of corollaries.
Abstract: We characterize the set of properties of Boolean-valued functions on a finite domain $\mathcal{X}$ that are testable with a constant number of samples. Specifically, we show that a property $\mathcal{P}$ is testable with a constant number of samples if and only if it is (essentially) a $k$-part symmetric property for some constant $k$, where a property is {\em $k$-part symmetric} if there is a partition $S_1,\ldots,S_k$ of $\mathcal{X}$ such that whether $f:\mathcal{X} \to \{0,1\}$ satisfies the property is determined solely by the densities of $f$ on $S_1,\ldots,S_k$. We use this characterization to obtain a number of corollaries, namely: (i) A graph property $\mathcal{P}$ is testable with a constant number of samples if and only if whether a graph $G$ satisfies $\mathcal{P}$ is (essentially) determined by the edge density of $G$. (ii) An affine-invariant property $\mathcal{P}$ of functions $f:\mathbb{F}_p^n \to \{0,1\}$ is testable with a constant number of samples if and only if whether $f$ satisfies $\mathcal{P}$ is (essentially) determined by the density of $f$. (iii) For every constant $d \geq 1$, monotonicity of functions $f : [n]^d \to \{0, 1\}$ on the $d$-dimensional hypergrid is testable with a constant number of samples.

9 citations


Journal Article
TL;DR: It is proved that the sample complexity of the aforementioned problem is essentially determined by a fundamental operator in the theory of interpolation of Banach spaces, known as Peetre's K-functional, which stems from an unexpected connection to functional analysis and refined concentration of measure inequalities, which arise naturally in the reduction.
Abstract: We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [BBM12], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method allows us to prove new distribution testing lower bounds, as well as to provide simple proofs of known lower bounds. Our main result is concerned with testing identity to a specific distribution p, given as a parameter. In a recent and influential work, Valiant and Valiant [VV14] showed that the sample complexity of the aforementioned problem is closely related to the `2/3-quasinorm of p. We obtain alternative bounds on the complexity of this problem in terms of an arguably more intuitive measure and using simpler proofs. More specifically, we prove that the sample complexity is essentially determined by a fundamental operator in the theory of interpolation of Banach spaces, known as Peetre’s K-functional. We show that this quantity is closely related to the size of the effective support of p (loosely speaking, the number of supported elements that constitute the vast majority of the mass of p). This result, in turn, stems from an unexpected connection to functional analysis and refined concentration of measure inequalities, which arise naturally in our reduction. ∗This work appeared in CCC’17 as [BCG17]. †University of Waterloo. Email: eric.blais@uwaterloo.ca. Research supported by NSERC Discovery grant. ‡Columbia University. Email: ccanonne@cs.columbia.edu. Research supported by NSF grants CCF-1115703 and NSF CCF-1319788. §UC Berkeley. Email: tom.gur@berkeley.edu. Research partially supported by the ISF grant number 671/13 and Irit Dinur’s ERC grant number 239985. ISSN 1433-8092 Electronic Colloquium on Computational Complexity, Revision 1 of Report No. 168 (2016)

7 citations


Book ChapterDOI
18 Feb 2016
TL;DR: This chapter focuses on a formalization of approximate solutions that has been widely studied in an area of theoretical computer science known as property testing, and a close connection between property testing and the general parameter estimation.
Abstract: What computational problems can we solve when we only have time to look at a tiny fraction of the data? This general question has been studied from many different angles in statistics. More recently, with the recent proliferation of massive datasets, it has also become a central question in computer science as well. Essentially all of the research on this question starts with a simple observation: except for a very small number of special cases, the only problems that can be solved in this very restrictive setting are those that admit approximate solutions. In this chapter, we focus on a formalization of approximate solutions that has been widely studied in an area of theoretical computer science known as property testing. Let X denote the underlying dataset, and consider the setting where this dataset represents a combinatorial object. Let P be any property of this type of combinatorial object. We say that X is -close to having property P if we can modify at most an fraction of X to obtain the description X ′ of an object that does have property P ; otherwise we say that X is -far from having the property. A randomized algorithm A is an -tester for P if it can distinguish with large constant probability1 between datasets that represent objects with the property P from those that are -far from having the same property. (The algorithm A is free to output anything on inputs that don’t have the property P but are also not -far from having this property; it is this leeway that will enable property testers to be so efficient.) There is a close connection between property testing and the general parameter estimation. Let X be a dataset and θ = θ(X) be any parameter of this dataset. For every threshold t, we can define the property of having θ ≤ t. If we have an efficient algorithm for testing this property, we can also use it to efficiently obtain an estimate θ̂ that is close to θ in the sense that θ̂ ≤ θ and the underlying object X is -close to another dataset X ′ with θ(X ′) = θ̂. Note that this notion of closeness is very different from the notions usually considered in parameter estimation; instead of determining it as a function L(θ, θ̂) of the true and estimated values of the parameter itself, here the quality of the estimate is a function of the underlying dataset.

6 citations



Posted Content
TL;DR: It is shown that for any constant $\epsilon > 0$ and $p \ge 1$, it is possible to distinguish functions that are submodular from those that are $\epSilon$-far from every sub modular function in $\ell_p$ distance with a constant number of queries.
Abstract: We show that for any constant $\epsilon > 0$ and $p \ge 1$, it is possible to distinguish functions $f : \{0,1\}^n \to [0,1]$ that are submodular from those that are $\epsilon$-far from every submodular function in $\ell_p$ distance with a constant number of queries. More generally, we extend the testing-by-implicit-learning framework of Diakonikolas et al. (2007) to show that every property of real-valued functions that is well-approximated in $\ell_2$ distance by a class of $k$-juntas for some $k = O(1)$ can be tested in the $\ell_p$-testing model with a constant number of queries. This result, combined with a recent junta theorem of Feldman and Vondrak (2016), yields the constant-query testability of submodularity. It also yields constant-query testing algorithms for a variety of other natural properties of valuation functions, including fractionally additive (XOS) functions, OXS functions, unit demand functions, coverage functions, and self-bounding functions.

2 citations


Posted Content
TL;DR: An algorithm is designed that gives an algorithm for this problem whose query complexity only depends on the (unknown) smallest $k$ such that either $f$ or $g$ is close to being a $k-junta.
Abstract: A function $f\colon \{-1,1\}^n \to \{-1,1\}$ is a $k$-junta if it depends on at most $k$ of its variables. We consider the problem of tolerant testing of $k$-juntas, where the testing algorithm must accept any function that is $\epsilon$-close to some $k$-junta and reject any function that is $\epsilon'$-far from every $k'$-junta for some $\epsilon'= O(\epsilon)$ and $k' = O(k)$. Our first result is an algorithm that solves this problem with query complexity polynomial in $k$ and $1/\epsilon$. This result is obtained via a new polynomial-time approximation algorithm for submodular function minimization (SFM) under large cardinality constraints, which holds even when only given an approximate oracle access to the function. Our second result considers the case where $k'=k$. We show how to obtain a smooth tradeoff between the amount of tolerance and the query complexity in this setting. Specifically, we design an algorithm that given $\rho\in(0,1/2)$ accepts any function that is $\frac{\epsilon\rho}{16}$-close to some $k$-junta and rejects any function that is $\epsilon$-far from every $k$-junta. The query complexity of the algorithm is $O\big( \frac{k\log k}{\epsilon\rho(1-\rho)^k} \big)$. Finally, we show how to apply the second result to the problem of tolerant isomorphism testing between two unknown Boolean functions $f$ and $g$. We give an algorithm for this problem whose query complexity only depends on the (unknown) smallest $k$ such that either $f$ or $g$ is close to being a $k$-junta.