scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A threshold of ln n for approximating set cover

01 Jul 1998-Journal of the ACM (ACM)-Vol. 45, Iss: 4, pp 634-652
TL;DR: It is proved that (1 - o(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.
Abstract: Given a collection ℱ of subsets of S = {1,…,n}, set cover is the problem of selecting as few as possible subsets from ℱ such that their union covers S,, and max k-cover is the problem of selecting k subsets from ℱ such that their union has maximum cardinality. Both these problems are NP-hard. We prove that (1 - o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low-order terms) between the ratio of approximation achievable by the greedy alogorithm (which is (1 - o(1)) ln n), and provious results of Lund and Yanakakis, that showed hardness of approximation within a ratio of (log2n) / 2 ≃0.72 ln n. For max k-cover, we show an approximation threshold of (1 - 1/e)(up to low-order terms), under assumption that P ≠ NP.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

11,104 citations

Proceedings ArticleDOI
01 Jun 1998
TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.
Abstract: Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.

2,782 citations


Cites background from "A threshold of ln n for approximati..."

  • ...of [16] [28], would be the obvious choice....

    [...]

  • ...approximating the smallest set cover gives an approximation factor of ln n where n is the size of the universe being covered [16] [28]....

    [...]

Journal ArticleDOI
TL;DR: It is proved optimal, up to an arbitrary ε > 0, inapproximability results for Max-E k-Sat for k ≥ 3, maximizing the number of satisfied linear equations in an over-determined system of linear equations modulo a prime p and Set Splitting.
Abstract: We prove optimal, up to an arbitrary e > 0, inapproximability results for Max-E k-Sat for k ≥ 3, maximizing the number of satisfied linear equations in an over-determined system of linear equations modulo a prime p and Set Splitting. As a consequence of these results we get improved lower bounds for the efficient approximability of many optimization problems studied previously. In particular, for Max-E2-Sat, Max-Cut, Max-di-Cut, and Vertex cover.

1,938 citations


Cites background or methods from "A threshold of ln n for approximati..."

  • ...It has later been established [Feige 1998] that we can make each variable ap­pear exactly 5 times even if we require each clause to be of length exactly 3....

    [...]

  • ...has later been established [13] that we can make each variable appear exactly 5 times even if we require each clause to be of length exactly 3....

    [...]

Proceedings ArticleDOI
25 Jul 2010
TL;DR: The results from extensive simulations demonstrate that the proposed algorithm is currently the best scalable solution to the influence maximization problem and significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.
Abstract: Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread --- it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.

1,709 citations


Cites background from "A threshold of ln n for approximati..."

  • ...3 of [7] is sufficient to show the following....

    [...]

Journal ArticleDOI
TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Abstract: In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

1,598 citations

References
More filters
Journal ArticleDOI
TL;DR: A computational complexity theory of the “knowledge” contained in a proof is developed and examples of zero-knowledge proof systems are given for the languages of quadratic residuosity and 'quadratic nonresiduosity.
Abstract: Usually, a proof of a theorem contains more knowledge than the mere fact that the theorem is true. For instance, to prove that a graph is Hamiltonian it suffices to exhibit a Hamiltonian tour in it; however, this seems to contain more knowledge than the single bit Hamiltonian/non-Hamiltonian.In this paper a computational complexity theory of the “knowledge” contained in a proof is developed. Zero-knowledge proofs are defined as those proofs that convey no additional knowledge other than the correctness of the proposition in question. Examples of zero-knowledge proof systems are given for the languages of quadratic residuosity and 'quadratic nonresiduosity. These are the first examples of zero-knowledge proofs for languages not known to be efficiently recognizable.

3,117 citations

Journal ArticleDOI
TL;DR: It turns out that the ratio between the two grows at most logarithmically in the largest column sum of A when all the components of cT are the same, which reduces to a theorem established previously by Johnson and Lovasz.
Abstract: Let A be a binary matrix of size m × n, let cT be a positive row vector of length n and let e be the column vector, all of whose m components are ones. The set-covering problem is to minimize cTx subject to Ax ≥ e and x binary. We compare the value of the objective function at a feasible solution found by a simple greedy heuristic to the true optimum. It turns out that the ratio between the two grows at most logarithmically in the largest column sum of A. When all the components of cT are the same, our result reduces to a theorem established previously by Johnson and Lovasz.

2,645 citations

Book
01 Jan 1996
TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.
Abstract: Approximation algorithms have developed in response to the impossibility of solving a great variety of important optimization problems. Too frequently, when attempting to get a solution for a problem, one is confronted with the fact that the problem is NP-hard. This, in the words of Garey and Johnson, means "I can't find an efficient algorithm, but neither can all of these famous people." While this is a significant theoretical step, it hardly qualifies as a cheering piece of news.If the optimal solution is unattainable then it is reasonable to sacrifice optimality and settle for a "good" feasible solution that can be computed efficiently. Of course, we would like to sacrifice as little optimality as possible, while gaining as much as possible in efficiency. Trading-off optimality in favor of tractability is the paradigm of approximation algorithms.The main themes of this book revolve around the design of such algorithms and the "closeness" to optimum that is achievable in polynomial time. To evaluate the limits of approximability, it is important to derive lower bounds or inapproximability results. In some cases, approximation algorithms must satisfy additional structural requirements such as being on-line, or working within limited space. This book reviews the design techniques for such algorithms and the developments in this area since its inception about three decades ago.

2,488 citations

Journal ArticleDOI
TL;DR: For the problem of finding the maximum clique in a graph, no algorithm has been found for which the ratio does not grow at least as fast as n^@e, where n is the problem size and @e>0 depends on the algorithm.

2,472 citations

Journal ArticleDOI
TL;DR: It follows that such a complete problem has a polynomial-time approximation scheme iff the whole class does, and that a number of common optimization problems are complete for MAX SNP under a kind of careful transformation that preserves approximability.

1,919 citations