Showing papers in &quot;Journal of the ACM in 2012&quot;

Dresden University of Technology¹, French Institute for Research in Computer Science and Automation²

TL;DR: It is proved that obfuscation is impossible, by constructing a family of efficient programs that are unobfuscatable, in the sense that given any efficient program, the “source code” of that program can be efficiently reconstructed.

...read moreread less

Abstract: Informally, an obfuscatorO is an (efficient, probabilistic) “compiler” that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is “unintelligible” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexity-theoretic applications, ranging from software protection to homomorphic encryption to complexity-theoretic analogues of Rice's theorem. Most of these applications are based on an interpretation of the “unintelligibility” condition in obfuscation as meaning that O(P) is a “virtual black box,” in the sense that anything one can efficiently compute given O(P), one could also efficiently compute given oracle access to P.In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of efficient programs P that are unobfuscatable in the sense that (a) given any efficient program P' that computes the same function as a program P ∈ p, the “source code” P can be efficiently reconstructed, yet (b) given oracle access to a (randomly selected) program P ∈ p, no efficient algorithm can reconstruct P (or even distinguish a certain bit in the code from random) except with negligible probability.We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation (TC0). We also rule out several potential applications of obfuscators, by constructing “unobfuscatable” signature schemes, encryption schemes, and pseudorandom function families.

...read moreread less

598 citations

Journal Article•DOI•

Probabilistic ω-automata

[...]

Christel Baier¹, Marcus Grösser¹, Nathalie Bertrand²•Institutions (2)

New Techniques for Noninteractive Zero-Knowledge

TL;DR: This paper addresses closure properties under the Boolean operators union, intersection and complementation and algorithmic aspects, such as checking emptiness or language containment, and provides a comparison of probabilistic ω-automata concerning expressiveness and efficiency.

...read moreread less

Abstract: Probabilistic ω-automata are variants of nondeterministic automata over infinite words where all choices are resolved by probabilistic distributions Acceptance of a run for an infinite input word can be defined using traditional acceptance criteria for ω-automata, such as Buchi, Rabin or Streett conditions The accepted language of a probabilistic ω-automata is then defined by imposing a constraint on the probability measure of the accepting runs In this paper, we study a series of fundamental properties of probabilistic ω-automata with three different language-semantics: (1) the probable semantics that requires positive acceptance probability, (2) the almost-sure semantics that requires acceptance with probability 1, and (3) the threshold semantics that relies on an additional parameter λ i r0,1l that specifies a lower probability bound for the acceptance probability We provide a comparison of probabilistic ω-automata under these three semantics and nondeterministic ω-automata concerning expressiveness and efficiency Furthermore, we address closure properties under the Boolean operators union, intersection and complementation and algorithmic aspects, such as checking emptiness or language containment

...read moreread less

396 citations

Journal Article•DOI•

[...]

Jens Groth¹, Rafail Ostrovsky², Amit Sahai²•Institutions (2)

University College London¹, University of California, Los Angeles²

The Power of Simple Tabulation Hashing

TL;DR: A non-interactive zap for all NP is constructed, which is the first that is based on a standard cryptographic security assumption and allows for dramatic reduction in the length of the common reference string and the size of the proofs.

...read moreread less

Abstract: Noninteractive zero-knowledge (NIZK) proof systems are fundamental primitives used in many cryptographic constructions, including public-key encryption secure against chosen ciphertext attack, digital signatures, and various other cryptographic protocols. We introduce new techniques for constructing NIZK proofs based on groups with a bilinear map. Compared to previous constructions of NIZK proofs, our techniques yield dramatic reduction in the length of the common reference string (proportional to security parameter) and the size of the proofs (proportional to security parameter times the circuit size). Our novel techniques allow us to answer several long-standing open questions in the theory of noninteractive proofs. We construct the first perfect NIZK argument system for all NP. We construct the first universally composable NIZK argument for all NP in the presence of an adaptive adversary. We construct a non-interactive zap for all NP, which is the first that is based on a standard cryptographic security assumption.

...read moreread less

196 citations

Journal Article•DOI•

[...]

Mihai Pǎtraşcu¹, Mikkel Thorup¹•Institutions (1)

AT&T Labs¹

Sublinear optimization for machine learning

TL;DR: In this paper, the authors show that the simplest possible tabulation hashing provides unexpectedly strong guarantees, such as Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.

...read moreread less

Abstract: Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees.The scheme itself dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters. We initialize c tables H1, ..., Hc mapping characters to random hash codes. A key x = (x1, ..., xc) is hashed to H1[x1] ⊕ c ⊕ Hc[xc], where ⊕ denotes bit-wise exclusive-or.While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, for example, Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.

...read moreread less

137 citations

Journal Article•DOI•

[...]

Kenneth L. Clarkson¹, Elad Hazan², David P. Woodruff¹•Institutions (2)

IBM¹, Technion – Israel Institute of Technology²

Approximating the partition function of the ferromagnetic Potts model

TL;DR: In this article, the authors describe and analyze sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls, and give lower bounds which show the running times of many of their algorithms to be nearly best possible in the unit-cost RAM model.

...read moreread less

Abstract: In this article we describe and analyze sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and L2-SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques and a new multiplicative update algorithm. We give lower bounds which show the running times of many of our algorithms to be nearly best possible in the unit-cost RAM model.

...read moreread less

112 citations

Journal Article•DOI•

[...]

Leslie Ann Goldberg¹, Mark Jerrum²•Institutions (2)

University of Liverpool¹, Queen Mary University of London²

Graph expansion and communication costs of fast matrix multiplication

TL;DR: It is as hard to approximate the partition function as it is to find approximate solutions to a wide range of counting problems, including that of determining the number of independent sets in a bipartite graph.

...read moreread less

Abstract: We provide evidence that it is computationally difficult to approximate the partition function of the ferromagnetic q-state Potts model when q > 2. Specifically, we show that the partition function is hard for the complexity class #RHPi under approximation-preserving reducibility. Thus, it is as hard to approximate the partition function as it is to find approximate solutions to a wide range of counting problems, including that of determining the number of independent sets in a bipartite graph. Our proof exploits the first-order phase transition of the “random cluster” model, which is a probability distribution on graphs that is closely related to the q-state Potts model.

...read moreread less

102 citations

Journal Article•DOI•

[...]

Grey Ballard¹, James Demmel¹, Olga Holtz¹, Oded Schwartz¹•Institutions (1)

University of California, Berkeley¹

01 Dec 2012-Journal of the ACM

TL;DR: Lower bounds on the communication cost of algorithms are shown to be closely related to the expansion properties of the corresponding computation graphs, and these bounds are attainable both for sequential and for parallel algorithms and hence optimal.

...read moreread less

Abstract: The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, and obtain the first lower bounds on their communication costs. In the sequential case, where the processor has a fast memory of size M, too small to store three n-by-n matrices, the lower bound on the number of words moved between fast and slow memory is, for a large class of matrix multiplication algorithms, Ω( (n/√M)ω0 ·M), where ω0 is the exponent in the arithmetic count (e.g., ω0 = lg 7 for Strassen, and ω0 = 3 for conventional matrix multiplication). With p parallel processors, each with fast memory of size M, the lower bound is asymptotically lower by a factor of p. These bounds are attainable both for sequential and for parallel algorithms and hence optimal.

...read moreread less

89 citations

Journal Article•DOI•

Fixed-point definability and polynomial time on graphs with excluded minors

[...]

Martin Grohe¹•Institutions (1)

Humboldt University of Berlin¹

A Primal-Dual Randomized Algorithm for Weighted Paging

TL;DR: It is proved that graphs with excluded minors can be decomposed into pieces arranged in a treelike structure, together with a linear order of each of the pieces, and the decomposition and the linear orders on the pieces are definable in fixed-point logic (without counting).

...read moreread less

Abstract: We give a logical characterization of the polynomial-time properties of graphs embeddable in some surface. For every surface S, a property P of graphs embeddable in S is decidable in polynomial time if and only if it is definable in fixed-point logic with counting. It is a consequence of this result that for every surface S there is a k such that a simple combinatorial algorithm, namely “the k-dimensional Weisfeiler-Lehman algorithm”, decides isomorphism of graphs embeddable in S in polynomial time.We also present (without proof) generalizations of these results to arbitrary classes of graphs with excluded minors.

...read moreread less

88 citations

Journal Article•DOI•

[...]

Nikhil Bansal¹, Niv Buchbinder², Joseph (Seffi) Naor³•Institutions (3)

Eindhoven University of Technology¹, Open University of Israel², Technion – Israel Institute of Technology³

01 Aug 2012-Journal of the ACM

TL;DR: In this paper, a randomized O(log k)-competitive algorithm for the online paging problem with cache size k was proposed, where k is the cache size, and its competitive ratio matches the known lower bound for the problem up to constant factors.

...read moreread less

Abstract: We study the weighted version of the classic online paging problem where there is a weight (cost) for fetching each page into the cache. We design a randomized O(log k)-competitive online algorithm for this problem, where k is the cache size. This is the first randomized o(k)-competitive algorithm and its competitive ratio matches the known lower bound for the problem, up to constant factors. More generally, we design an O(log(k/(k − h + 1)))-competitive online algorithm for the version of the problem where the online algorithm has cache size k and it is compared to an optimal offline solution with cache size h ≤ k.Our solution is based on a two-step approach. We first obtain an O(log k)-competitive fractional algorithm based on an online primal-dual approach. Next, we obtain a randomized algorithm by rounding in an online manner the fractional solution to a probability distribution on the possible cache states. We also give an online primal-dual randomized O(log N)-competitive algorithm for the Metrical Task System problem (MTS) on a weighted star metric on N leaves.

...read moreread less

81 citations

Journal Article•DOI•

Continuous sampling from distributed streams

[...]

Graham Cormode¹, S. Muthukrishnan², Ke Yi³, Qin Zhang⁴•Institutions (4)

AT&T Labs¹, Rutgers University², Hong Kong University of Science and Technology³, Aarhus University⁴

New combinatorial topology bounds for renaming: The upper bound

TL;DR: This article presents communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams, and shows that these protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

...read moreread less

Abstract: A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The main challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol on the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this article, we present communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams. These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent elements, or arrivals within the last w time units. We show that our protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

...read moreread less

78 citations

Journal Article•DOI•

[...]

Armando Castañeda¹, Sergio Rajsbaum¹•Institutions (1)

National Autonomous University of Mexico¹

Size and Treewidth Bounds for Conjunctive Queries

TL;DR: The main theorem states that there exists a wait-free renaming protocol for K < 2n, if and only if

...read moreread less

Abstract: In the renaming task, n+1 processes start with unique input names from a large space and must choose unique output names taken from a smaller name space, 0,1,…, K. To rule out trivial solutions, a protocol must be anonymous: the value chosen by a process can depend on its input name and on the execution, but not on the specific process ID.Attiya et al. [1990] showed that renaming has a wait-free solution when K≥ 2n. Several algebraic topology proofs of a lower bound stating that no such protocol exists when K

...read moreread less

Journal Article•DOI•

[...]

Georg Gottlob¹, Stephanie Tien Lee¹, Gregory Valiant², Paul Valiant²•Institutions (2)

University of Oxford¹, University of California, Berkeley²

The notion of a rational convex program, and an algorithm for the arrow-debreu Nash bargaining game

TL;DR: New worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q applied to a database D are provided, based on a novel “coloring” of the query variables that associates a coloring number C(Q) to each query Q.

...read moreread less

Abstract: This article provides new worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q applied to a database D. We derive bounds for the result size |Q(D)| in terms of structural properties of Q, both in the absence and in the presence of keys and functional dependencies. These bounds are based on a novel “coloring” of the query variables that associates a coloring number C(Q) to each query Q. Intuitively, each color used represents some possible entropy of that variable. Using this coloring number, we derive tight bounds for the size of Q(D) in case (i) no functional dependencies or keys are specified, and (ii) simple functional dependencies (keys) are given. These results generalize recent size-bounds for join queries obtained by Atserias et al. [2008]. In the case of arbitrary (compound) functional dependencies, we use tools from information theory to provide lower and upper bounds, establishing a close connection between size bounds and a basic question in information theory. Our new coloring scheme also allows us to precisely characterize (both in the absence of keys and with simple keys) the treewidth-preserving queries---the queries for which the treewidth of the output relation is bounded by a function of the treewidth of the input database. Finally, we give some results on the computational complexity of determining the size bounds, and of deciding whether the treewidth is preserved.

...read moreread less

Journal Article•DOI•

[...]

Vijay V. Vazirani¹•Institutions (1)

Georgia Institute of Technology¹

SINR Diagrams: Convexity and Its Applications in Wireless Networks

TL;DR: This work defines a new Nash bargaining game, called ADNB, which is derived from the linear case of the Arrow-Debreu market model, and shows that the convex program for ADNB is a logarithmic RCP, but unlike other known members of this class, it is nontotal.

...read moreread less

Abstract: We introduce the notion of a rational convex program (RCP) and we classify the known RCPs into two classes: quadratic and logarithmic. The importance of rationality is that it opens up the possibility of computing an optimal solution to the program via an algorithm that is either combinatorial or uses an LP-oracle. Next, we define a new Nash bargaining game, called ADNB, which is derived from the linear case of the Arrow-Debreu market model. We show that the convex program for ADNB is a logarithmic RCP, but unlike other known members of this class, it is nontotal.Our main result is a combinatorial, polynomial-time algorithm for ADNB. It turns out that the reason for infeasibility of logarithmic RCPs is quite different from that for LPs and quadratic RCPs. We believe that our ideas for surmounting the new difficulties will be useful for dealing with other nontotal RCPs as well. We give an application of our combinatorial algorithm for ADNB to an important “fair” throughput allocation problem on a wireless channel. Finally, we present a number of interesting questions that the new notion of RCP raises.

...read moreread less

Journal Article•DOI•

[...]

Chen Avin¹, Yuval Emek², Erez Kantor³, Zvi Lotker¹, David Peleg⁴, Liam Roditty⁵ - Show less +2 more•Institutions (5)

Ben-Gurion University of the Negev¹, ETH Zurich², Technion – Israel Institute of Technology³, Weizmann Institute of Science⁴, Bar-Ilan University⁵

01 Aug 2012-Journal of the ACM

TL;DR: It is shown that assuming uniform power transmissions, the reception zones in the SINR model are convex and relatively well-rounded, which is used to develop an efficient approximation algorithm for a fundamental point location problem in wireless networks.

...read moreread less

Abstract: The rules governing the availability and quality of connections in a wireless network are described by physical models such as the signal-to-interference & noise ratio (SINR) model. For a collection of simultaneously transmitting stations in the plane, it is possible to identify a reception zone for each station, consisting of the points where its transmission is received correctly. The resulting SINR diagram partitions the plane into a reception zone per station and the remaining plane where no station can be heard.SINR diagrams appear to be fundamental to understanding the behavior of wireless networks, and may play a key role in the development of suitable algorithms for such networks, analogous perhaps to the role played by Voronoi diagrams in the study of proximity queries and related issues in computational geometry. So far, however, the properties of SINR diagrams have not been studied systematically, and most algorithmic studies in wireless networking rely on simplified graph-based models such as the unit disk graph (UDG) model, which conveniently abstract away interference-related complications, and make it easier to handle algorithmic issues, but consequently fail to capture accurately some important aspects of wireless networks.This article focuses on obtaining some basic understanding of SINR diagrams, their properties and their usability in algorithmic applications. Specifically, we have shown that assuming uniform power transmissions, the reception zones are convex and relatively well-rounded. These results are then used to develop an efficient approximation algorithm for a fundamental point location problem in wireless networks.

...read moreread less

Journal Article•DOI•

Polylogarithmic concurrent data structures from monotone circuits

[...]

James Aspnes¹, Hagit Attiya², Keren Censor-Hillel³•Institutions (3)

Yale University¹, Technion – Israel Institute of Technology², Massachusetts Institute of Technology³

A theory of goal-oriented communication

TL;DR: This article presents constructions of useful concurrent data structures, including max registers and counters, with step complexity that is sublinear in the number of processes, and it is shown that the upper bounds are almost optimal.

...read moreread less

Abstract: This article presents constructions of useful concurrent data structures, including max registers and counters, with step complexity that is sublinear in the number of processes, n. This result avoids a well-known lower bound by having step complexity that is polylogarithmic in the number of values the object can take or the number of operations applied to it.The key step in these implementations is a method for constructing a max register, a linearizable, wait-free concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an m-valued max register is constructed from one-bit multi-writer multi-reader registers at a cost of at most ⌈log m⌉ atomic register operations per write or read. An unbounded max register is constructed with cost O(min(log v, n)) to read or write a value v.Max registers are used to transform any monotone circuit into a wait-free concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. One application is a simple, linearizable, wait-free counter with a cost of O(min(log n log v, n)) to perform an increment and O(min(log v, n)) to perform a read, where v is the current value of the counter. For polynomially-many increments, this becomes O(log2n), an exponential improvement on the best previously known upper bounds of O(n) for exact counting and O(n4/5+ϵ) for approximate counting.Finally, it is shown that the upper bounds are almost optimal. It is shown that for deterministic implementations, even if they are only required to satisfy solo-termination, min(⌈log m⌉, n-1) is a lower bound on the worst-case complexity for an m-valued bounded max register, which is exactly equal to the upper bound for m ≤ 2n-1, and min(n-1, ⌈ log m⌉ - log(⌈ log m⌉ + k)) is a lower bound for the read operation of an m-valued k-additive-accurate counter, which is a bounded counter in which a read operation is allowed to return a value within an additive error of ± k of the number of increment operations linearized before it. Furthermore, even in a solo-terminating randomized implementation of an n-valued max register with an oblivious adversary and global coins, there exist simple schedules in which, with high probability, the worst-case step complexity of a read operation is Ω(log n/log log n) if the write operations have polylogarithmic step complexity.

...read moreread less

Journal Article•DOI•

[...]

Oded Goldreich¹, Brendan Juba², Madhu Sudan³•Institutions (3)

Weizmann Institute of Science¹, Harvard University², Microsoft³

Highly acyclic groups, hypergraph covers, and the guarded fragment

TL;DR: In this paper, the authors put forward a general theory of goal-oriented communication, where communication is not an end in itself, but rather a means to achieving some goals of the communicating parties.

...read moreread less

Abstract: We put forward a general theory of goal-oriented communication, where communication is not an end in itself, but rather a means to achieving some goals of the communicating parties. Focusing on goals provides a framework for addressing the problem of potential “misunderstanding” during communication, where the misunderstanding arises from lack of initial agreement on what protocol and/or language is being used in communication. In this context, “reliable communication” means overcoming any initial misunderstanding between parties towards achieving a given goal. Despite the enormous diversity among the goals of communication, we propose a simple model that captures all goals.In the simplest form of communication we consider, two parties, a user and a server, attempt to communicate with each other in order to achieve some goal of the user. We show that any goal of communication can be modeled mathematically by introducing a third party, which we call the referee, who hypothetically monitors the conversation between the user and the server and determines whether or not the goal has been achieved. Potential misunderstanding between the players is captured by allowing each player (the user/server) to come from a (potentially infinite) class of players such that each player is unaware which instantiation of the other it is talking to. We identify a main concept, which we call sensing, that allows goals to be achieved even under misunderstanding. Informally, sensing captures the user's ability (potentially using help from the server) to simulate the referee's assessment on whether the communication is achieving the goal. We show that when the user can sense progress, the goal of communication can be achieved despite initial misunderstanding. We also show that in certain settings sensing is necessary for overcoming such initial misunderstanding.Our results significantly extend the scope of the investigation started by Juba and Sudan (STOC 2008) who studied the foregoing phenomenon in the case of a single specific goal. Our study shows that their main suggestion, that misunderstanding can be detected and possibly corrected by focusing on the goal, can be proved in full generality.

...read moreread less

Journal Article•DOI•

[...]

Martin Otto¹•Institutions (1)

Technische Universität Darmstadt¹

Defining Fairness in Reactive and Concurrent Systems

TL;DR: It is shown that finite groups whose Cayley graphs have large girth even w.r.t. a discounted distance measure that contracts arbitrarily long sequences of edges from the same colour class, and only counts transitions between colour classes are useful in the construction of finite bisimilar hypergraph covers.

...read moreread less

Abstract: We construct finite groups whose Cayley graphs have large girth even with respect to a discounted distance measure that contracts arbitrarily long sequences of edges from the same color class (subgroup), and only counts transitions between color classes (cosets). These groups are shown to be useful in the construction of finite bisimilar hypergraph covers that avoid any small cyclic configurations. We present two applications to the finite model theory of the guarded fragment: a strengthening of the known finite model property for GF and the characterization of GF as the guarded bisimulation invariant fragment of first-order logic in the sense of finite model theory.

...read moreread less

Journal Article•DOI•

[...]

Hagen Völzer¹, Daniele Varacca²•Institutions (2)

IBM¹, Paris Diderot University²

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

TL;DR: It turns out that the fairness properties are the sets that are “large” from a topological point of view, that is, they are the co-meager sets in the natural topology of runs of a given system.

...read moreread less

Abstract: We define when a linear-time temporal property is a fairness property with respect to a given system. This captures the essence shared by most fairness assumptions that are used in the specification and verification of reactive and concurrent systems, such as weak fairness, strong fairness, k-fairness, and many others. We provide three characterizations of fairness: a language-theoretic, a game-theoretic, and a topological characterization. It turns out that the fairness properties are the sets that are “large” from a topological point of view, that is, they are the co-meager sets in the natural topology of runs of a given system.This insight provides a link to probability theory where a set is “large” when it has measure 1. While these two notions of largeness are similar, they do not coincide in general. However, we show that they coincide for ω-regular properties and bounded Borel measures. That is, an ω-regular temporal property of a finite-state system has measure 1 under a bounded Borel measure if and only if it is a fairness property with respect to that system.The definition of fairness leads to a generic relaxation of correctness of a system in linear-time semantics. We define a system to be fairly correct if there exists a fairness assumption under which it satisfies its specification. Equivalently, a system is fairly correct if the set of runs satisfying the specification is topologically large. We motivate this notion of correctness and show how it can be verified in a system.

...read moreread less

Journal Article•DOI•

[...]

Adam Kirsch¹, Michael Mitzenmacher¹, Andrea Pietracaprina², Geppino Pucci², Eli Upfal³, Fabio Vandin³ - Show less +2 more•Institutions (3)

Harvard University¹, University of Padua², Brown University³

A quantum Lovász local lemma

TL;DR: A novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset.

...read moreread less

Abstract: As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.

...read moreread less

Journal Article•DOI•

[...]

Andris Ambainis¹, Julia Kempe², Or Sattath³•Institutions (3)

University of Latvia¹, University of Paris², Hebrew University of Jerusalem³