scispace - formally typeset
Search or ask a question

Showing papers on "Approximation algorithm published in 2013"


Journal ArticleDOI
TL;DR: A method for efficiently constructing polar codes is presented and analyzed, proving that for any fixed ε > 0 and all sufficiently large code lengths n, polar codes whose rate is within ε of channel capacity can be constructed in time and space that are both linear in n.
Abstract: A method for efficiently constructing polar codes is presented and analyzed. Although polar codes are explicitly defined, straightforward construction is intractable since the resulting polar bit-channels have an output alphabet that grows exponentially with the code length. Thus, the core problem that needs to be solved is that of faithfully approximating a bit-channel with an intractably large alphabet by another channel having a manageable alphabet size. We devise two approximation methods which “sandwich” the original bit-channel between a degraded and an upgraded version thereof. Both approximations can be efficiently computed and turn out to be extremely close in practice. We also provide theoretical analysis of our construction algorithms, proving that for any fixed e > 0 and all sufficiently large code lengths n, polar codes whose rate is within e of channel capacity can be constructed in time and space that are both linear in n.

755 citations


Journal ArticleDOI
TL;DR: This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.
Abstract: We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.

398 citations


Proceedings Article
24 Sep 2013
TL;DR: Several population-based meta-heuristics in continuous (real) and discrete (binary) search spaces are explained in details and design, main algorithm, advantages and disadvantages of the algorithms are covered.
Abstract: Exact optimization algorithms are not able to provide an appropriate solution in solving optimization problems with a high-dimensional search space. In these problems, the search space grows exponentially with the problem size therefore; exhaustive search is not practical. Also, classical approximate optimization methods like greedy-based algorithms make several assumptions to solve the problems. Sometimes, the validation of these assumptions is difficult in each problem. Hence, meta-heuristic algorithms which make few or no assumptions about a problem and can search very large spaces of candidate solutions have been extensively developed to solve optimization problems these days. Among these algorithms, population-based meta-heuristic algorithms are proper for global searches due to global exploration and local exploitation ability. In this paper, a survey on meta-heuristic algorithms is performed and several population-based meta-heuristics in continuous (real) and discrete (binary) search spaces are explained in details. This covers design, main algorithm, advantages and disadvantages of the algorithms.

294 citations


Journal ArticleDOI
Shi Li1
TL;DR: It is shown that if @c is randomly selected, the approximation ratio can be improved to 1.488 and the gap with the 1.463 approximability lower bound is cut by almost 1/3.
Abstract: We present a 1.488-approximation algorithm for the metric uncapacitated facility location (UFL) problem. Previously, the best algorithm was due to Byrka (2007). Byrka proposed an algorithm parametrized by @c and used it with @c~1.6774. By either running his algorithm or the algorithm proposed by Jain, Mahdian and Saberi ([email protected]?02), Byrka obtained an algorithm that gives expected approximation ratio 1.5. We show that if @c is randomly selected, the approximation ratio can be improved to 1.488. Our algorithm cuts the gap with the 1.463 approximability lower bound by almost 1/3.

293 citations


Journal ArticleDOI
TL;DR: This article presents an LP-based approximation algorithm for Steiner tree with an improved approximation factor based on a, seemingly novel, iterative randomized rounding technique, and shows that the integrality gap of the LP is at most 1.55, answering the mentioned open question.
Abstract: The Steiner tree problem is one of the most fundamental NP-hard problems: given a weighted undirected graph and a subset of terminal nodes, find a minimum-cost tree spanning the terminals. In a sequence of papers, the approximation ratio for this problem was improved from 2 to 1.55 [Robins and Zelikovsky 2005]. All these algorithms are purely combinatorial. A long-standing open problem is whether there is an LP relaxation of Steiner tree with integrality gap smaller than 2 [Rajagopalan and Vazirani 1999].In this article we present an LP-based approximation algorithm for Steiner tree with an improved approximation factor. Our algorithm is based on a, seemingly novel, iterative randomized rounding technique. We consider an LP relaxation of the problem, which is based on the notion of directed components. We sample one component with probability proportional to the value of the associated variable in a fractional solution: the sampled component is contracted and the LP is updated consequently. We iterate this process until all terminals are connected. Our algorithm delivers a solution of cost at most ln(4) + e

291 citations


Proceedings ArticleDOI
11 Aug 2013
TL;DR: This paper defines a novel density function, which gives subgraphs of much higher quality than densest sub graphs: the graphs found by the method are compact, dense, and with smaller diameter.
Abstract: Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs. We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.

290 citations


Proceedings ArticleDOI
01 Jun 2013
TL;DR: This paper presents the first improvement over the diameter approximation algorithm of Aingworth et.
Abstract: The diameter and the radius of a graph are fundamental topological parameters that have many important practical applications in real world networks. The fastest combinatorial algorithm for both parameters works by solving the all-pairs shortest paths problem (APSP) and has a running time of ~O(mn) in m-edge, n-node graphs. In a seminal paper, Aingworth, Chekuri, Indyk and Motwani [SODA'96 and SICOMP'99] presented an algorithm that computes in ~O(m√ n + n2) time an estimate D for the diameter D, such that ⌊ 2/3 D ⌋ ≤ ^D ≤ D. Their paper spawned a long line of research on approximate APSP. For the specific problem of diameter approximation, however, no improvement has been achieved in over 15 years.Our paper presents the first improvement over the diameter approximation algorithm of Aingworth et. al, producing an algorithm with the same estimate but with an expected running time of ~O(m√ n). We thus show that for all sparse enough graphs, the diameter can be 3/2-approximated in o(n2) time. Our algorithm is obtained using a surprisingly simple method of neighborhood depth estimation that is strong enough to also approximate, in the same running time, the radius and more generally, all of the eccentricities, i.e. for every node the distance to its furthest node.We also provide strong evidence that our diameter approximation result may be hard to improve. We show that if for some constant e>0 there is an O(m2-e) time (3/2-e)-approximation algorithm for the diameter of undirected unweighted graphs, then there is an O*( (2-δ)n) time algorithm for CNF-SAT on n variables for constant δ>0, and the strong exponential time hypothesis of [Impagliazzo, Paturi, Zane JCSS'01] is false.Motivated by this negative result, we give several improved diameter approximation algorithms for special cases. We show for instance that for unweighted graphs of constant diameter D not divisible by 3, there is an O(m2-e) time algorithm that gives a (3/2-e) approximation for constant e>0. This is interesting since the diameter approximation problem is hardest to solve for small D.

276 citations



Proceedings ArticleDOI
26 Oct 2013
TL;DR: This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.
Abstract: Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising. In many of these application domains the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the time horizon. The literature lacks a general model encompassing these sorts of problems. We introduce such a model, called "bandits with knapsacks", that combines aspects of stochastic integer programming with online learning. A distinctive feature of our problem, in comparison to the existing regret-minimization literature, is that the optimal policy for a given latent distribution may significantly outperform the policy that plays the optimal fixed arm. Consequently, achieving sub linear regret in the bandits-with-knapsacks problem is significantly more challenging than in conventional bandit problems. We present two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates. Further, we prove that the regret achieved by both algorithms is optimal up to polylogarithmic factors. We illustrate the generality of the problem by presenting applications in a number of different domains including electronic commerce, routing, and scheduling. As one example of a concrete application, we consider the problem of dynamic posted pricing with limited supply and obtain the first algorithm whose regret, with respect to the optimal dynamic policy, is sub linear in the supply.

247 citations


Journal ArticleDOI
TL;DR: This paper defines a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing.
Abstract: Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources, such as time or space. Our approach to this problem is to define a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In the current paper, we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data.

234 citations


Journal ArticleDOI
TL;DR: In this paper, a low-rank approximation of the space-wavenumber wave propagation matrix is proposed for wave propagation in 3D heterogeneous isotropic or anisotropic media.
Abstract: We consider the problem of constructing a wave extrapolation operator in a variable and possibly anisotropic medium. Our construction involves Fourier transforms in space combined with the help of a lowrank approximation of the space‐wavenumber wave‐propagator matrix. A lowrank approximation implies selecting a small set of representative spatial locations and a small set of representative wavenumbers. We present a mathematical derivation of this method, a description of the lowrank approximation algorithm and numerical examples that confirm the validity of the proposed approach. Wave extrapolation using lowrank approximation can be applied to seismic imaging by reverse‐time migration in 3D heterogeneous isotropic or anisotropic media.

Journal ArticleDOI
TL;DR: Comparison of methods to prevent multi-layer perceptron neural networks from overfitting of the training data in the case of daily catchment runoff modelling shows that the elaborated noise injection method may prevent overfitting slightly better than the most popular early stopping approach.

Journal ArticleDOI
TL;DR: In this paper, a robust penalty function involving the sum of unsquared deviations and a relaxation that leads to a convex optimization problem is introduced. And the alternating direction method is applied to minimize the penalty function.
Abstract: The problem has found applications in computer vision, computer graphics, and sensor network localization, among others. Its least squares solution can be approximated by either spectral relaxation or semidefinite programming followed by a rounding procedure, analogous to the approximation algorithms of MAX-CUT. The contribution of this paper is three-fold: First, we introduce a robust penalty function involving the sum of unsquared deviations and derive a relaxation that leads to a convex optimization problem; Second, we apply the alternating direction method to minimize the penalty function; Finally, under a specific model of the measurement noise and for both complete and random measurement graphs, we prove that the rotations are exactly and stably recovered, exhibiting a phase transition behavior in terms of the proportion of noisy measurements. Numerical simulations confirm the phase transition behavior for our method as well as its improved accuracy compared to existing methods.

Proceedings Article
05 Dec 2013
TL;DR: It is shown that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other, and hardness results for both problems are provided, thus showing that the approximation factors are tight up to log-factors.
Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms.

Proceedings ArticleDOI
01 Jun 2013
TL;DR: The connection between the hereditary discrepancy and the privacy mechanism enables the first polylogarithmic approximation to the hereditary discrepancies of a matrix A to be derived.
Abstract: We study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been the focus of a long line of work. For a given set of d linear queries over a database x ∈ RN, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [5, 32] give an O(log2 d) approximation to the optimal mechanism. Our first contribution is to give an efficient O(log2 d) approximation guarantee for the case of (e,δ)-differential privacy. Our mechanism adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [44], using tools from convex geometry. We next consider the sparse case when the number of queries exceeds the number of individuals in the database, i.e. when d > n Δ |x|1. The lower bounds used in the previous approximation algorithm no longer apply --- in fact better mechanisms are known in this setting [7, 27, 28, 31, 49]. Our second main contribution is to give an efficient (e,δ)-differentially private mechanism that, for any given query set A and an upper bound n on |x|1, has mean squared error within polylog(d,N) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the l1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the optimal e-differentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0,1}, there is an e-differentially private mechanism with expected error ~O(√n) per query, improving on the ~O(n2/3) bound of [7] and matching the lower bound implied by [15] up to logarithmic factors.The connection between the hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.

Proceedings ArticleDOI
08 Apr 2013
TL;DR: A scalable influence approximation algorithm, Independent Path Algorithm (IPA) for Independent Cascade (IC) diffusion model, which efficiently approximates influence by considering an independent influence path as an influence evaluation unit and is implemented in the demo application for influence maximization.
Abstract: As social network services connect people across the world, influence maximization, i.e., finding the most influential nodes (or individuals) in the network, is being actively researched with applications to viral marketing. One crucial challenge in scalable influence maximization processing is evaluating influence, which is #P-hard and thus hard to solve in polynomial time. We propose a scalable influence approximation algorithm, Independent Path Algorithm (IPA) for Independent Cascade (IC) diffusion model. IPA efficiently approximates influence by considering an independent influence path as an influence evaluation unit. IPA are also easily parallelized by simply adding a few lines of OpenMP meta-programming expressions. Also, overhead of maintaining influence paths in memory is relieved by safely throwing away insignificant influence paths. Extensive experiments conducted on large-scale real social networks show that IPA is an order of magnitude faster and uses less memory than the state of the art algorithms. Our experimental results also show that parallel versions of IPA speeds up further as the number of CPU cores increases, and more speed-up is achieved for larger datasets. The algorithms have been implemented in our demo application for influence maximization (available at http://dm.postech.ac.kr/ipa demo), which efficiently finds the most influential nodes in a social network.

Journal ArticleDOI
TL;DR: A Bayesian approximate message passing algorithm is proposed for solving the multiple measurement vector (MMV) problem in compressive sensing, in which a collection of sparse signal vectors that share a common support are recovered from undersampled noisy measurements.
Abstract: In this work, a Bayesian approximate message passing algorithm is proposed for solving the multiple measurement vector (MMV) problem in compressive sensing, in which a collection of sparse signal vectors that share a common support are recovered from undersampled noisy measurements. The algorithm, AMP-MMV, is capable of exploiting temporal correlations in the amplitudes of non-zero coefficients, and provides soft estimates of the signal vectors as well as the underlying support. Central to the proposed approach is an extension of recently developed approximate message passing techniques to the amplitude-correlated MMV setting. Aided by these techniques, AMP-MMV offers a computational complexity that is linear in all problem dimensions. In order to allow for automatic parameter tuning, an expectation-maximization algorithm that complements AMP-MMV is described. Finally, a detailed numerical study demonstrates the power of the proposed approach and its particular suitability for application to high-dimensional problems.

Journal ArticleDOI
TL;DR: This tutorial paper focuses on the research that has been conducted to improve the Karnik-Mendel (KM) algorithms, understand the KM algorithms, leading to further improved algorithms, and eliminate the need for KM algorithms.
Abstract: Computing the centroid and performing type-reduction for type-2 fuzzy sets and systems are operations that must be taken into consideration. Karnik-Mendel (KM) algorithms are the standard ways to do these operations; however, because these algorithms are iterative, much research has been conducted during the past decade about centroid and type-reduction computations. This tutorial paper focuses on the research that has been conducted to 1) improve the KM algorithms; 2) understand the KM algorithms, leading to further improved algorithms; 3) eliminate the need for KM algorithms; 4) use the KM algorithms to solve other (nonfuzzy logic system) problems; and 5) use (or not use) KM algorithms for general type-2 fuzzy sets and fuzzy logic systems.

Proceedings ArticleDOI
08 Jul 2013
TL;DR: This article addresses the Least Cost Rumor Blocking (LCRB) problem where rumors originate from a community Cr in the network and a notion of protectors are used to limit the bad influence of rumors, and proposes a Set Cover Based Greedy (SCBG) algorithm which achieves a O(ln n)-approximation ratio.
Abstract: In many real-world scenarios, social network serves as a platform for information diffusion, alongside with positive information (truth) dissemination, negative information (rumor) also spread among the public. To make the social network as a reliable medium, it is necessary to have strategies to control rumor diffusion. In this article, we address the Least Cost Rumor Blocking (LCRB) problem where rumors originate from a community Cr in the network and a notion of protectors are used to limit the bad influence of rumors. The problem can be summarized as identifying a minimal subset of individuals as initial protectors to minimize the number of people infected in neighbor communities of Cr at the end of both diffusion processes. Observing the community structure property, we pay attention to a kind of vertex set, called bridge end set, in which each node has at least one direct in-neighbor in Cr and is reachable from rumors. Under the OOAO model, we study LCRB-P problem, in which α (0 <; α <; 1) fraction of bridge ends are required to be protected. We prove that the objective function of this problem is submodular and a greedy algorithm is adopted to derive a (1-1/e)-approximation. Furthermore, we study LCRB-D problem over the DOAA model, in which all the bridge ends are required to be protected, we prove that there is no polynomial time o(ln n)-approximation for the LCRB-D problem unless P = NP, and propose a Set Cover Based Greedy (SCBG) algorithm which achieves a O(ln n)-approximation ratio. Finally, to evaluate the efficiency and effectiveness of our algorithm, we conduct extensive comparison simulations in three real-world datasets, and the results show that our algorithm outperforms other heuristics.

Journal ArticleDOI
TL;DR: It is shown that when using the log-sum-exp function to approximate the optimal value of any combinatorial problem, the solution can be interpreted as the stationary probability distribution of a class of time-reversible Markov chains.
Abstract: Many important network design problems are fundamentally combinatorial optimization problems. A large number of such problems, however, cannot readily be tackled by distributed algorithms. The Markov approximation framework studied in this paper is a general technique for synthesizing distributed algorithms. We show that when using the log-sum-exp function to approximate the optimal value of any combinatorial problem, we end up with a solution that can be interpreted as the stationary probability distribution of a class of time-reversible Markov chains. Selected Markov chains among this class yield distributed algorithms that solve the log-sum-exp approximated combinatorial network optimization problem. By examining three applications, we illustrate that the Markov approximation technique not only provides fresh perspectives to existing distributed solutions, but also provides clues leading to the construction of new distributed algorithms in various domains with provable performance. We believe the Markov approximation techniques will find applications in many other network optimization problems.

Proceedings ArticleDOI
Honglei Zhuang1, Yihan Sun1, Jie Tang1, Jialin Zhang, Xiaoming Sun 
01 Dec 2013
TL;DR: This paper proposes a novel algorithm to approximate the optimal solution to the problem of maximizing influence diffusion in a dynamic social network, through probing a small portion of the network, and minimizes the possible error between the observed network and the real network.
Abstract: Social influence and influence diffusion has been widely studied in online social networks. However, most existing works on influence diffusion focus on static networks. In this paper, we study the problem of maximizing influence diffusion in a dynamic social network. Specifically, the network changes over time and the changes can be only observed by periodically probing some nodes for the update of their connections. Our goal then is to probe a subset of nodes in a social network so that the actual influence diffusion process in the network can be best uncovered with the probing nodes. We propose a novel algorithm to approximate the optimal solution. The algorithm, through probing a small portion of the network, minimizes the possible error between the observed network and the real network. We evaluate the proposed algorithm on both synthetic and real large networks. Experimental results show that our proposed algorithm achieves a better performance than several alternative algorithms.

Proceedings ArticleDOI
26 Oct 2013
TL;DR: This paper introduces a new framework which is a two-stage stochastic optimization model designed to leverage the potential that typically lies in neighboring nodes of arbitrary samples of social networks, and provides a constant factor approximation to the optimal adaptive policy in the Triggering model.
Abstract: The algorithmic challenge of maximizing information diffusion through word-of-mouth processes in social networks has been heavily studied in the past decade. While there has been immense progress and an impressive arsenal of techniques has been developed, the algorithmic frameworks make idealized assumptions regarding access to the network that can often result in poor performance of state-of-the-art techniques. In this paper we introduce a new framework which we call Adaptive Seeding. The framework is a two-stage stochastic optimization model designed to leverage the potential that typically lies in neighboring nodes of arbitrary samples of social networks. Our main result is an algorithm which provides a constant factor approximation to the optimal adaptive policy for any influence function in the Triggering model.

Journal ArticleDOI
TL;DR: This paper studies alternative optimization problems which are naturally motivated by resource and time constraints on viral marketing campaigns and establishes the value of the approximation algorithms, by conducting an experimental evaluation, comparing their quality against that achieved by various heuristics.
Abstract: In recent years, study of influence propagation in social networks has gained tremendous attention. In this context, we can identify three orthogonal dimensions—the number of seed nodes activated at the beginning (known as budget), the expected number of activated nodes at the end of the propagation (known as expected spread or coverage), and the time taken for the propagation. We can constrain one or two of these and try to optimize the third. In their seminal paper, Kempe et al. constrained the budget, left time unconstrained, and maximized the coverage: this problem is known as Influence Maximization (or MAXINF for short). In this paper, we study alternative optimization problems which are naturally motivated by resource and time constraints on viral marketing campaigns. In the first problem, termed minimum target set selection (or MINTSS for short), a coverage threshold η is given and the task is to find the minimum size seed set such that by activating it, at least η nodes are eventually activated in the expected sense. This naturally captures the problem of deploying a viral campaign on a budget. In the second problem, termed MINTIME, the goal is to minimize the time in which a predefined coverage is achieved. More precisely, in MINTIME, a coverage threshold η and a budget threshold k are given, and the task is to find a seed set of size at most k such that by activating it, at least η nodes are activated in the expected sense, in the minimum possible time. This problem addresses the issue of timing when deploying viral campaigns. Both these problems are NP-hard, which motivates our interest in their approximation. For MINTSS, we develop a simple greedy algorithm and show that it provides a bicriteria approximation. We also establish a generic hardness result suggesting that improving this bicriteria approximation is likely to be hard. For MINTIME, we show that even bicriteria and tricriteria approximations are hard under several conditions. We show, however, that if we allow the budget for number of seeds k to be boosted by a logarithmic factor and allow the coverage to fall short, then the problem can be solved exactly in PTIME, i.e., we can achieve the required coverage within the time achieved by the optimal solution to MINTIME with budget k and coverage threshold η. Finally, we establish the value of the approximation algorithms, by conducting an experimental evaluation, comparing their quality against that achieved by various heuristics.

Journal ArticleDOI
01 Sep 2013
TL;DR: A new variant of LSH is described, called Parallel LSH (PLSH), designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data.
Abstract: Finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kd-trees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects.In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data. Our approach employs several novel ideas, including: cache-conscious hash table layout, using a 2-level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hash-table querying; an insert-optimized hash table structure and efficient data expiration algorithm for streaming data; and a performance model that accurately estimates performance of the algorithm and can be used to optimize parameter settings. We show that on a workload where we perform similarity search on a dataset of > 1 Billion tweets, with hundreds of millions of new tweets per day, we can achieve query times of 1-2.5 ms. We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes. To the best of our knowledge, this is the fastest implementation of LSH, with table construction times up to 3.7× faster and query times that are 8.3× faster than a basic implementation.

Proceedings ArticleDOI
26 Oct 2013
TL;DR: In this article, the authors presented an O(m10/7) -time algorithm for the maximum s-t flow and the minimum t-t cut problems in directed graphs with unit capacities.
Abstract: We present an O(m10/7) = O(m1.43)-time1 algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing O(m min{√m, n2/3}) running time bound due to Even and Tarjan [16]. By well-known reductions, this also establishes an O(m107)-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated O(m√n) running time bound of Hopcroft and Karp [25] whenever the input graph is sufficiently sparse. At a very high level, our results stem from acquiring a deeper understanding of interior-point methods - a powerful tool in convex optimization - in the context of flow problems, as well as, utilizing certain interplay between maximum flows and bipartite matchings.

Journal ArticleDOI
TL;DR: An improved differential evolution algorithm that is empowered by a covariance adaptation matrix evolution strategy algorithm as a local search and shows a superior performance in comparison with other algorithms that also solved these problems.
Abstract: Many real-world optimization problems are difficult to solve as they do not possess the nice mathematical properties required by the exact algorithms. Evolutionary algorithms are proven to be appropriate for such problems. In this paper, we propose an improved differential evolution algorithm that uses a mix of different mutation operators. In addition, the algorithm is empowered by a covariance adaptation matrix evolution strategy algorithm as a local search. To judge the performance of the algorithm, we have solved well-known benchmark as well as a variety of real-world optimization problems. The real-life problems were taken from different sources and disciplines. According to the results obtained, the algorithm shows a superior performance in comparison with other algorithms that also solved these problems.

Proceedings ArticleDOI
26 Oct 2013
TL;DR: The main result is a data structure that maintains a (1+ϵ) approximation of maximum matching under edge insertions/deletions in worst case Õ(√mϵ-2) time per update, which improves the 3/2 approximation given by Neiman and Solomon [20] which runs in similar time.
Abstract: We present the first data structures that maintain near optimal maximum cardinality and maximum weighted matchings on sparse graphs in sub linear time per update. Our main result is a data structure that maintains a (1+ϵ) approximation of maximum matching under edge insertions/deletions in worst case O(√mϵ-2) time per update. This improves the 3/2 approximation given by Neiman and Solomon [20] which runs in similar time. The result is based on two ideas. The first is to re-run a static algorithm after a chosen number of updates to ensure approximation guarantees. The second is to judiciously trim the graph to a smaller equivalent one whenever possible. We also study extensions of our approach to the weighted setting, and combine it with known frameworks to obtain arbitrary approximation ratios. For a constant ϵ and for graphs with edge weights between 1 and N, we design an algorithm that maintains an (1+ϵ) approximate maximum weighted matching in O(√m log N) time per update. The only previous result for maintaining weighted matchings on dynamic graphs has an approximation ratio of 4.9108, and was shown by An and et al. [2], [3].

Journal ArticleDOI
TL;DR: Using the information theoretic linear program introduced in Blasiak (2011), a polynomial-time algorithm is given for recognizing instances with β = 2 and pinpoint β precisely for various classes of graphs (e.g., various Cayley graphs of cyclic groups).
Abstract: Index coding has received considerable attention recently motivated in part by applications such as fast video-on-demand and efficient communication in wireless networks and in part by its connection to network coding. Optimal encoding schemes and efficient heuristics were studied in various settings, while also leading to new results for network coding such as improved gaps between linear and non-linear capacity as well as hardness of approximation. The problem of broadcasting with side information, a generalization of the index coding problem, begins with a sender and sets of users and messages. Each user possesses a subset of the messages and desires an additional message from the set. The sender wishes to broadcast a message so that on receipt of the broadcast each user can compute her desired message. The fundamental parameter of interest is the broadcast rate, β, the average communication cost for sufficiently long broadcasts. Though there have been many new nontrivial bounds on β by Bar-Yossef (2006), Lubetzky and Stav (2007), Alon (2008), and Blasiak (2011) there was no known polynomial-time algorithm for approximating β within a nontrivial factor, and the exact value of β remained unknown for all nontrivial instances. Using the information theoretic linear program introduced in Blasiak (2011), we give a polynomial-time algorithm for recognizing instances with β = 2 and pinpoint β precisely for various classes of graphs (e.g., various Cayley graphs of cyclic groups). Further, extending ideas from Ramsey theory, we give a polynomial-time algorithm with a nontrivial approximation ratio for computing β. Finally, we provide insight into the quality of previous bounds by giving constructions showing separations between β and the respective bounds. In particular, we construct graphs where β is uniformly bounded while its upper bound derived from the naive encoding scheme is polynomially worse.

Journal ArticleDOI
TL;DR: Four different approximation algorithms are empirically investigated on four different prediction problems, and the quality of the predictions obtained as a function of the compute time taken are assessed.
Abstract: Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n2) space and O(n3) time for a data set of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.

Journal ArticleDOI
TL;DR: This work presents a general approach to deriving inapproximability results in the value oracle model, based on the notion of symmetry gap, and unifies several known hardness results for submodular maximization.
Abstract: A number of recent results on optimization problems involving submodular functions have made use of the multilinear relaxation of the problem These results hold typically in the value oracle model, where the objective function is accessible via a black box returning $f(S)$ for a given $S$ We present a general approach to deriving inapproximability results in the value oracle model, based on the notion of symmetry gap Our main result is that for any fixed instance that exhibits a certain symmetry gap in its multilinear relaxation, there is a naturally related class of instances for which a better approximation factor than the symmetry gap would require exponentially many oracle queries This unifies several known hardness results for submodular maximization, eg, the optimality of $(1-1/e)$-approximation for monotone submodular maximization under a cardinality constraint and the impossibility of $(\frac12+\epsilon)$-approximation for unconstrained (nonmonotone) submodular maximization As a new applica