A threshold of ln n for approximating set cover

doi:10.1145/285055.285059

Home
/
Papers
/
A threshold of ln n for approximating set cover

Journal Article•DOI•

A threshold of ln n for approximating set cover

Uriel Feige¹•Institutions (1)

Weizmann Institute of Science¹

01 Jul 1998-Journal of the ACM (ACM)-Vol. 45, Iss: 4, pp 634-652

TL;DR: It is proved that (1 - o(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.

read less

Abstract: Given a collection ℱ of subsets of S = {1,…,n}, set cover is the problem of selecting as few as possible subsets from ℱ such that their union covers S,, and max k-cover is the problem of selecting k subsets from ℱ such that their union has maximum cardinality. Both these problems are NP-hard. We prove that (1 - o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low-order terms) between the ratio of approximation achievable by the greedy alogorithm (which is (1 - o(1)) ln n), and provious results of Lund and Yanakakis, that showed hardness of approximation within a ratio of (log2n) / 2 ≃0.72 ln n. For max k-cover, we show an approximation threshold of (1 - 1/e)(up to low-order terms), under assumption that P ≠ NP.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

[...]

Marco Tulio Ribeiro¹, Sameer Singh¹, Carlos Guestrin¹•Institutions (1)

University of Washington¹

13 Aug 2016

TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.

...read moreread less

Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

...read moreread less

11,104 citations

Proceedings Article•DOI•

Automatic subspace clustering of high dimensional data for data mining applications

[...]

Rakesh Agrawal¹, Johannes Gehrke¹, Dimitrios Gunopulos¹, Prabhakar Raghavan¹•Institutions (1)

IBM¹

01 Jun 1998

TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.

...read moreread less

Abstract: Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.

...read moreread less

2,782 citations

Cites background from "A threshold of ln n for approximati..."

...of [16] [28], would be the obvious choice....
[...]
...approximating the smallest set cover gives an approximation factor of ln n where n is the size of the universe being covered [16] [28]....
[...]

Journal Article•DOI•

Some optimal inapproximability results

[...]

Johan Håstad¹•Institutions (1)

Royal Institute of Technology¹

01 Jul 2001-Journal of the ACM

TL;DR: It is proved optimal, up to an arbitrary ε > 0, inapproximability results for Max-E k-Sat for k ≥ 3, maximizing the number of satisfied linear equations in an over-determined system of linear equations modulo a prime p and Set Splitting.

...read moreread less

Abstract: We prove optimal, up to an arbitrary e > 0, inapproximability results for Max-E k-Sat for k ≥ 3, maximizing the number of satisfied linear equations in an over-determined system of linear equations modulo a prime p and Set Splitting. As a consequence of these results we get improved lower bounds for the efficient approximability of many optimization problems studied previously. In particular, for Max-E2-Sat, Max-Cut, Max-di-Cut, and Vertex cover.

...read moreread less

1,938 citations

Cites background or methods from "A threshold of ln n for approximati..."

...It has later been established [Feige 1998] that we can make each variable appear exactly 5 times even if we require each clause to be of length exactly 3....
[...]
...has later been established [13] that we can make each variable appear exactly 5 times even if we require each clause to be of length exactly 3....
[...]

Proceedings Article•DOI•

Scalable influence maximization for prevalent viral marketing in large-scale social networks

[...]

Wei Chen¹, Chi Wang², Yajun Wang¹•Institutions (2)

Microsoft¹, University of Illinois at Urbana–Champaign²

25 Jul 2010

TL;DR: The results from extensive simulations demonstrate that the proposed algorithm is currently the best scalable solution to the influence maximization problem and significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.

...read moreread less

Abstract: Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread --- it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.

...read moreread less

1,709 citations

Cites background from "A threshold of ln n for approximati..."

...3 of [7] is sufficient to show the following....
[...]

Journal Article•DOI•

Data streams: algorithms and applications

[...]

S. Muthukrishnan¹•Institutions (1)

Rutgers University¹

01 Aug 2005-Foundations and Trends in Theoretical Computer Science

TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.

...read moreread less

Abstract: In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

...read moreread less

1,598 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The knowledge complexity of interactive proof systems

[...]

Shafi Goldwasser¹, Silvio Micali¹, Charles Rackoff²•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

01 Feb 1989-SIAM Journal on Computing

TL;DR: A computational complexity theory of the “knowledge” contained in a proof is developed and examples of zero-knowledge proof systems are given for the languages of quadratic residuosity and 'quadratic nonresiduosity.

...read moreread less

Abstract: Usually, a proof of a theorem contains more knowledge than the mere fact that the theorem is true. For instance, to prove that a graph is Hamiltonian it suffices to exhibit a Hamiltonian tour in it; however, this seems to contain more knowledge than the single bit Hamiltonian/non-Hamiltonian.In this paper a computational complexity theory of the “knowledge” contained in a proof is developed. Zero-knowledge proofs are defined as those proofs that convey no additional knowledge other than the correctness of the proposition in question. Examples of zero-knowledge proof systems are given for the languages of quadratic residuosity and 'quadratic nonresiduosity. These are the first examples of zero-knowledge proofs for languages not known to be efficiently recognizable.

...read moreread less

3,117 citations

Journal Article•DOI•

A Greedy Heuristic for the Set-Covering Problem

[...]

Vašek Chvátal¹•Institutions (1)

McGill University¹

01 Aug 1979-Mathematics of Operations Research

TL;DR: It turns out that the ratio between the two grows at most logarithmically in the largest column sum of A when all the components of cT are the same, which reduces to a theorem established previously by Johnson and Lovasz.

...read moreread less

Abstract: Let A be a binary matrix of size m × n, let cT be a positive row vector of length n and let e be the column vector, all of whose m components are ones. The set-covering problem is to minimize cTx subject to Ax ≥ e and x binary. We compare the value of the objective function at a feasible solution found by a simple greedy heuristic to the true optimum. It turns out that the ratio between the two grows at most logarithmically in the largest column sum of A. When all the components of cT are the same, our result reduces to a theorem established previously by Johnson and Lovasz.

...read moreread less

2,645 citations

Book•

Approximation Algorithms for NP-Hard Problems

[...]

Dorit S. Hochba¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1996

TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.

...read moreread less

Abstract: Approximation algorithms have developed in response to the impossibility of solving a great variety of important optimization problems. Too frequently, when attempting to get a solution for a problem, one is confronted with the fact that the problem is NP-hard. This, in the words of Garey and Johnson, means "I can't find an efficient algorithm, but neither can all of these famous people." While this is a significant theoretical step, it hardly qualifies as a cheering piece of news.If the optimal solution is unattainable then it is reasonable to sacrifice optimality and settle for a "good" feasible solution that can be computed efficiently. Of course, we would like to sacrifice as little optimality as possible, while gaining as much as possible in efficiency. Trading-off optimality in favor of tractability is the paradigm of approximation algorithms.The main themes of this book revolve around the design of such algorithms and the "closeness" to optimum that is achievable in polynomial time. To evaluate the limits of approximability, it is important to derive lower bounds or inapproximability results. In some cases, approximation algorithms must satisfy additional structural requirements such as being on-line, or working within limited space. This book reviews the design techniques for such algorithms and the developments in this area since its inception about three decades ago.

...read moreread less

2,488 citations

Journal Article•DOI•

Approximation algorithms for combinatorial problems

[...]

David S. Johnson¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 1974-Journal of Computer and System Sciences

TL;DR: For the problem of finding the maximum clique in a graph, no algorithm has been found for which the ratio does not grow at least as fast as n^@e, where n is the problem size and @e>0 depends on the algorithm.

...read moreread less

2,472 citations

Journal Article•DOI•

Optimization, approximation, and complexity classes

[...]

Christos H. Papadimitriou¹, Mihalis Yannakakis²•Institutions (2)

University of California, San Diego¹, AT&T²

01 Dec 1991-Journal of Computer and System Sciences

TL;DR: It follows that such a complete problem has a polynomial-time approximation scheme iff the whole class does, and that a number of common optimization problems are complete for MAX SNP under a kind of careful transformation that preserves approximability.

...read moreread less

1,919 citations