Showing papers in "arXiv: Data Structures and Algorithms in 2013"

PDF

Open Access

Posted Content•

From Theory to Practice: Plug and Play with Succinct Data Structures

[...]

Simon Gog¹, Timo Beller², Alistair Moffat¹, Matthias Petri¹•Institutions (2)

University of Melbourne¹, University of Ulm²

05 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements.

...read moreread less

Abstract: Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.

...read moreread less

291 citations

Posted Content•

Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

[...]

Takuya Akiba¹, Yoichi Iwata¹, Yuichi Yoshida²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

17 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: This work proposes a new exact method for shortest-path distance queries on large-scale networks that can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods.

...read moreread less

Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

...read moreread less

278 citations

Posted Content•

Navigating Central Path with Electrical Flows: from Flows to Matchings, and Back

[...]

Aleksander Madry¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

08 Jul 2013-arXiv: Data Structures and Algorithms

TL;DR: A deeper understanding of interior-point methods is acquired - a powerful tool in convex optimization - in the context of flow problems, as well as, utilizing certain interplay between maximum flows and bipartite matchings.

...read moreread less

Abstract: We present an $\tilde{O}(m^{10/7})=\tilde{O}(m^{1.43})$-time algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing $O(m \min(\sqrt{m},n^{2/3}))$ time bound due to Even and Tarjan [EvenT75]. By well-known reductions, this also establishes an $\tilde{O}(m^{10/7})$-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated celebrated $O(m \sqrt{n})$ time bound of Hopcroft and Karp [HK73] whenever the input graph is sufficiently sparse.

...read moreread less

191 citations

Posted Content•

An efficient reconciliation algorithm for social networks

[...]

Nitish Korula¹, Silvio Lattanzi¹•Institutions (1)

Google¹

05 Jul 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, a small fraction of individuals explicitly link their accounts across multiple online social networks (e.g., Facebook, Twitter, Google+, LinkedIn, etc.) and leverage these connections to identify a very large fraction of the network.

...read moreread less

Abstract: People today typically use multiple online social networks (Facebook, Twitter, Google+, LinkedIn, etc.). Each online network represents a subset of their "real" ego-networks. An interesting and challenging problem is to reconcile these online networks, that is, to identify all the accounts belonging to the same individual. Besides providing a richer understanding of social dynamics, the problem has a number of practical applications. At first sight, this problem appears algorithmically challenging. Fortunately, a small fraction of individuals explicitly link their accounts across multiple networks; our work leverages these connections to identify a very large fraction of the network. Our main contributions are to mathematically formalize the problem for the first time, and to design a simple, local, and efficient parallel algorithm to solve it. We are able to prove strong theoretical guarantees on the algorithm's performance on well-established network models (Random Graphs, Preferential Attachment). We also experimentally confirm the effectiveness of the algorithm on synthetic and real social network data sets.

...read moreread less

189 citations

Posted Content•

A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time

[...]

Jonathan A. Kelner¹, Lorenzo Orecchia¹, Aaron Sidford¹, Zeyuan Allen-Zhu¹•Institutions (1)

Massachusetts Institute of Technology¹

28 Jan 2013-arXiv: Data Structures and Algorithms

TL;DR: A simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time and has the fastest known running time under the standard unit-cost RAM model.

...read moreread less

Abstract: In this paper, we present a simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, spectral sparsification, or even the Chebyshev Method or Conjugate Gradient. After constructing a "nice" spanning tree of a graph associated with the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm has the fastest known running time under the standard unit-cost RAM model. We hope that the simplicity of the algorithm and the insights yielded by its analysis will be useful in both theory and practice.

...read moreread less

177 citations

Posted Content•

Optimal Algorithms for Testing Closeness of Discrete Distributions

[...]

Siu On Chan¹, Ilias Diakonikolas², Gregory Valiant³, Paul Valiant⁴•Institutions (4)

Microsoft¹, University of Edinburgh², Stanford University³, Brown University⁴

19 Aug 2013-arXiv: Data Structures and Algorithms

TL;DR: The first sub-linear time algorithm for this problem was presented in this paper, which matched the lower bounds of Valiant up to a logarithmic factor in $n, and a polynomial factor in O(n 2 ).

...read moreread less

Abstract: We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions $p$ and $q$ over an $n$-element set, we wish to distinguish whether $p=q$ versus $p$ is at least $\eps$-far from $q$, in either $\ell_1$ or $\ell_2$ distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in $n$, and a polynomial factor of $\eps.$ In this work, we present simple (and new) testers for both the $\ell_1$ and $\ell_2$ settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on $n$, and the dependence on $\eps$; for the $\ell_1$ testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\eps^{4/3}, n^{1/2}/\eps^2 \}).$

...read moreread less

148 citations

Posted Content•

Faster all-pairs shortest paths via circuit complexity

[...]

Ryan Williams

23 Dec 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, a randomized algorithm for computing the min-plus product (a.k.a., tropical product) of two $n \times n$ matrices was presented, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense $n$-node directed graphs with arbitrary edge weights.

...read moreread less

Abstract: We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two $n \times n$ matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense $n$-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time \[\frac{n^3}{2^{\Omega(\log n)^{1/2}}}\] and is correct with high probability. On the word RAM, the algorithm runs in $n^3/2^{\Omega(\log n)^{1/2}} + n^{2+o(1)}\log M$ time for edge weights in $([0,M] \cap {\mathbb Z})\cup\{\infty\}$. Prior algorithms used either $n^3/(\log^c n)$ time for various $c \leq 2$, or $O(M^{\alpha}n^{\beta})$ time for various $\alpha > 0$ and $\beta > 2$. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing ${\sf AC}^0[p]$ circuits, to efficiently reduce a matrix product over the $(\min,+)$ algebra to a relatively small number of rectangular matrix products over ${\mathbb F}_2$, each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in $n^3/2^{\log^{\delta} n}$ time for some $\delta > 0$, which utilizes the Yao-Beigel-Tarui translation of ${\sf AC}^0[m]$ circuits into "nice" depth-two circuits.

...read moreread less

143 citations

Posted Content•

The Noisy Power Method: A Meta Algorithm with Applications

[...]

Moritz Hardt¹, Eric Price¹•Institutions (1)

IBM¹

11 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: A new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that is called the noisy power method is provided and shows that the error dependence of the algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix.

...read moreread less

Abstract: We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix-vector multiplication. The noisy power method can be seen as a meta-algorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacy-preserving spectral analysis. Our general analysis subsumes several existing ad-hoc convergence bounds and resolves a number of open problems in multiple applications including streaming PCA and privacy-preserving singular vector computation.

...read moreread less

137 citations

Posted Content•

Parallel Algorithms for Geometric Graph Problems

[...]

Alexandr Andoni¹, Aleksandar Nikolov², Krzysztof Onak³, Grigory Yaroslavtsev⁴•Institutions (4)

Microsoft¹, Rutgers University², IBM³, Brown University⁴

30 Dec 2013-arXiv: Data Structures and Algorithms

TL;DR: A general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem, and has implications beyond the MapReduce model.

...read moreread less

Abstract: We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+\epsilon)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_\epsilon(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+\epsilon)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+\epsilon)$-approximation algorithm with $n^{\delta}$ space in the streaming-with-sorting model with $1/\delta^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

...read moreread less

135 citations

Posted Content•

Nearly Maximum Flows in Nearly Linear Time

[...]

Jonah Sherman¹•Institutions (1)

University of California, Berkeley¹

07 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: This work introduces a new approach to the maximum flow problem in undirected, capacitated graphs using congestion-approximators: easy-to-compute functions that approximate the congestion required to route single-commodity demands in a graph to within some factor α.

...read moreread less

Abstract: We introduce a new approach to the maximum flow problem in undirected, capacitated graphs using $\alpha$-\emph{congestion-approximators}: easy-to-compute functions that approximate the congestion required to route single-commodity demands in a graph to within a factor of $\alpha$. Our algorithm maintains an arbitrary flow that may have some residual excess and deficits, while taking steps to minimize a potential function measuring the congestion of the current flow plus an over-estimate of the congestion required to route the residual demand. Since the residual term over-estimates, the descent process gradually moves the contribution to our potential function from the residual term to the congestion term, eventually achieving a flow routing the desired demands with nearly minimal congestion after $\tilde{O}(\alpha\eps^{-2}\log^2 n)$ iterations. Our approach is similar in spirit to that used by Spielman and Teng (STOC 2004) for solving Laplacian systems, and we summarize our approach as trying to do for $\ell_\infty$-flows what they do for $\ell_2$-flows. Together with a nearly linear time construction of a $n^{o(1)}$-congestion-approximator, we obtain $1+\eps$-optimal single-commodity flows undirected graphs in time $m^{1+o(1)}\eps^{-2}$, yielding the fastest known algorithm for that problem. Our requirements of a congestion-approximator are quite low, suggesting even faster and simpler algorithms for certain classes of graphs. For example, an $\alpha$-competitive oblivious routing tree meets our definition, \emph{even without knowing how to route the tree back in the graph}. For graphs of conductance $\phi$, a trivial $\phi^{-1}$-congestion-approximator gives an extremely simple algorithm for finding $1+\eps$-optimal-flows in time $\tilde{O}(m\phi^{-1})$.

...read moreread less

132 citations

Posted Content•

Beyond Locality-Sensitive Hashing

[...]

Alexandr Andoni¹, Piotr Indyk², Huy Nguyen³, Ilya Razenshteyn²•Institutions (3)

Microsoft¹, Massachusetts Institute of Technology², Princeton University³

06 Jun 2013-arXiv: Data Structures and Algorithms

TL;DR: By a standard reduction, a new data structure is presented for the Hamming space and e1 norm with ρ ≤ 7/(8c)+ O(1/c3/2)+ oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

...read moreread less

Abstract: We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(n^{\rho} + d log n) query time and O(n^{1 + \rho} + d log n) space, where \rho <= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and \ell_1 norm with \rho <= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

...read moreread less

Posted Content•

Optimal approximation for submodular and supermodular optimization with bounded curvature

[...]

Maxim Sviridenko, Jan Vondrák, Justin Ward

19 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: It is shown how to reduce both submodular maximization and supermodular minimization to this general problem when the objective function has bounded total curvature, and it is proved that the approximation results obtained are the best possible in the value oracle model.

...read moreread less

Abstract: We design new approximation algorithms for the problems of optimizing submodular and supermodular functions subject to a single matroid constraint. Specifically, we consider the case in which we wish to maximize a nondecreasing submodular function or minimize a nonincreasing supermodular function in the setting of bounded total curvature $c$. In the case of submodular maximization with curvature $c$, we obtain a $(1-c/e)$-approximation --- the first improvement over the greedy $(1-e^{-c})/c$-approximation of Conforti and Cornuejols from 1984, which holds for a cardinality constraint, as well as recent approaches that hold for an arbitrary matroid constraint. Our approach is based on modifications of the continuous greedy algorithm and non-oblivious local search, and allows us to approximately maximize the sum of a nonnegative, nondecreasing submodular function and a (possibly negative) linear function. We show how to reduce both submodular maximization and supermodular minimization to this general problem when the objective function has bounded total curvature. We prove that the approximation results we obtain are the best possible in the value oracle model, even in the case of a cardinality constraint. We define an extension of the notion of curvature to general monotone set functions and show $(1-c)$-approximation for maximization and $1/(1-c)$-approximation for minimization cases. Finally, we give two concrete applications of our results in the settings of maximum entropy sampling, and the column-subset selection problem.

...read moreread less

Posted Content•

Smoothed Analysis of Tensor Decompositions

[...]

Aditya Bhaskara¹, Moses Charikar², Ankur Moitra³, Aravindan Vijayaraghavan⁴•Institutions (4)

Google¹, Princeton University², Massachusetts Institute of Technology³, Carnegie Mellon University⁴

14 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, a smoothed analysis model is introduced for learning generative models and an efficient tensor decomposition in the highly overcomplete case (rank polynomial in the dimension) is developed.

...read moreread less

Abstract: Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, our main contribution is in analyzing their stability in the framework of smoothed analysis. Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.

...read moreread less

Posted Content•

Primal Beats Dual on Online Packing LPs in the Random-Order Model

[...]

Thomas Kesselheim, Klaus Radke, Andreas Tönnis, Berthold Vöcking

11 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors considered the problem of packing LP in an online model where the columns are presented to the algorithm in random order, and they obtained an incentive compatible (1 − ϵ)-competitive algorithm with competitive ratio O(1) for any (randomized) online algorithm.

...read moreread less

Abstract: We study packing LPs in an online model where the columns are presented to the algorithm in random order. This natural problem was investigated in various recent studies motivated, e.g., by online ad allocations and yield management where rows correspond to resources and columns to requests specifying demands for resources. Our main contribution is a $1-O(\sqrt{(\log{d})/B})$-competitive online algorithm, where $d$ denotes the column sparsity, i.e., the maximum number of resources that occur in a single column, and $B$ denotes the capacity ratio $B$, i.e., the ratio between the capacity of a resource and the maximum demand for this resource. In other words, we achieve a $(1 - \epsilon)$-approximation if the capacity ratio satisfies $B=\Omega((\log d)/\epsilon^2)$, which is known to be best-possible for any (randomized) online algorithms. Our result improves exponentially on previous work with respect to the capacity ratio. In contrast to existing results on packing LP problems, our algorithm does not use dual prices to guide the allocation of resources. Instead, it simply solves, for each request, a scaled version of the partially known primal program and randomly rounds the obtained fractional solution to obtain an integral allocation for this request. We show that this simple algorithmic technique is not restricted to packing LPs with large capacity ratio: We prove an upper bound on the competitive ratio of $\Omega(d^{-1/(B-1)})$, for any $B \ge 2$. In addition, we show that our approach can be combined with VCG payments and obtain an incentive compatible $(1-\epsilon)$-competitive mechanism for packing LPs with $B=\Omega((\log m)/\epsilon^2)$, where $m$ is the number of constraints. Finally, we apply our technique to the generalized assignment problem for which we obtain the first online algorithm with competitive ratio $O(1)$.

...read moreread less

Posted Content•

Shortest Path and Distance Queries on Road Networks: Towards Bridging Theory and Practice

[...]

Andy Diwen Zhu¹, Hui Ma¹, Xiaokui Xiao¹, Siqiang Luo², Youze Tang¹, Shuigeng Zhou² - Show less +2 more•Institutions (2)

Nanyang Technological University¹, Fudan University²

09 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: Arterial Hierarchy is presented, an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks and outperforms the state of the art in terms of query time and space and pre-computation overheads.

...read moreread less

Abstract: Given two locations $s$ and $t$ in a road network, a distance query returns the minimum network distance from $s$ to $t$, while a shortest path query computes the actual route that achieves the minimum distance. These two types of queries find important applications in practice, and a plethora of solutions have been proposed in past few decades. The existing solutions, however, are optimized for either practical or asymptotic performance, but not both. In particular, the techniques with enhanced practical efficiency are mostly heuristic-based, and they offer unattractive worst-case guarantees in terms of space and time. On the other hand, the methods that are worst-case efficient often entail prohibitive preprocessing or space overheads, which render them inapplicable for the large road networks (with millions of nodes) commonly used in modern map applications. This paper presents {\em Arterial Hierarchy (AH)}, an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks. On the theoretical side, we show that, under a realistic assumption, AH answers any distance query in $\tilde{O}(\log \r)$ time, where $\r = d_{max}/d_{min}$, and $d_{max}$ (resp.\ $d_{min}$) is the largest (resp.\ smallest) $L_\infty$ distance between any two nodes in the road network. In addition, any shortest path query can be answered in $\tilde{O}(k + \log \r)$ time, where $k$ is the number of nodes on the shortest path. On the practical side, we experimentally evaluate AH on a large set of real road networks with up to twenty million nodes, and we demonstrate that (i) AH outperforms the state of the art in terms of query time, and (ii) its space and pre-computation overheads are moderate.

...read moreread less

Posted Content•

Efficient Computation of Representative Sets with Applications in Parameterized and Exact Algorithms

[...]

Fedor V. Fomin, Daniel Lokshtanov, Fahad Panolan, Saket Saurabh

16 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, representative families of linear matroids have been used for designing single-exponential parameterized and exact exponential time algorithms on graphs of bounded treewidth. But these algorithms are not suitable for graphs with constant trewidth, such as k-vertex patterns.

...read moreread less

Abstract: We give two algorithms computing representative families of linear and uniform matroids and demonstrate how to use representative families for designing single-exponential parameterized and exact exponential time algorithms. The applications of our approach include - LONGEST DIRECTED CYCLE - MINIMUM EQUIVALENT GRAPH (MEG) - Algorithms on graphs of bounded treewidth -k-PATH, k-TREE, and more generally, k-SUBGRAPH ISOMORPHISM, where the k-vertex pattern graph is of constant treewidth.

...read moreread less

Posted Content•

Fast Semidifferential-based Submodular Function Optimization

[...]

Rishabh Iyer¹, Stefanie Jegelka², Jeff A. Bilmes¹•Institutions (2)

University of Washington¹, University of California, Berkeley²

05 Aug 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, a framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials) is presented.

...read moreread less

Abstract: We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials). The resulting algorithms, which repeatedly compute and then efficiently optimize submodular semigradients, offer new and generalize many old methods for submodular optimization. Our approach, moreover, takes steps towards providing a unifying paradigm applicable to both submodular min- imization and maximization, problems that historically have been treated quite distinctly. The practicality of our algorithms is important since interest in submodularity, owing to its natural and wide applicability, has recently been in ascendance within machine learning. We analyze theoretical properties of our algorithms for minimization and maximization, and show that many state-of-the-art maximization algorithms are special cases. Lastly, we complement our theoretical analyses with supporting empirical experiments.

...read moreread less

Posted Content•

Active Self-Assembly of Algorithmic Shapes and Patterns in Polylogarithmic Time

[...]

Damien Woods¹, Ho-Lin Chen¹, Scott Goodfriend¹, Dabby Nadine L¹, Erik Winfree¹, Peng Yin¹ - Show less +2 more•Institutions (1)

California Institute of Technology¹

11 Jan 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors describe a computational model for studying the complexity of self-assembled structures with active molecular components, inspired by biology's ability to assemble biomolecules that form systems with complicated structure and dynamics.

...read moreread less

Abstract: We describe a computational model for studying the complexity of self-assembled structures with active molecular components. Our model captures notions of growth and movement ubiquitous in biological systems. The model is inspired by biology's fantastic ability to assemble biomolecules that form systems with complicated structure and dynamics, from molecular motors that walk on rigid tracks and proteins that dynamically alter the structure of the cell during mitosis, to embryonic development where large-scale complicated organisms efficiently grow from a single cell. Using this active self-assembly model, we show how to efficiently self-assemble shapes and patterns from simple monomers. For example, we show how to grow a line of monomers in time and number of monomer states that is merely logarithmic in the length of the line. Our main results show how to grow arbitrary connected two-dimensional geometric shapes and patterns in expected time that is polylogarithmic in the size of the shape, plus roughly the time required to run a Turing machine deciding whether or not a given pixel is in the shape. We do this while keeping the number of monomer types logarithmic in shape size, plus those monomers required by the Kolmogorov complexity of the shape or pattern. This work thus highlights the efficiency advantages of active self-assembly over passive self-assembly and motivates experimental effort to construct general-purpose active molecular self-assembly systems.

...read moreread less

Posted Content•

Submodular Maximization Meets Streaming: Matchings, Matroids, and More

[...]

Amit Chakrabarti¹, Sagar Kale¹•Institutions (1)

Dartmouth College¹

09 Sep 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors studied the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges.

...read moreread less

Abstract: We study the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges. This problem, which we call maximum submodular-function matching (MSM), is a natural generalization of maximum weight matching (MWM), which is in turn a generalization of maximum cardinality matching (MCM). We give two incomparable algorithms for this problem with space usage falling in the semi-streaming range---they store only $O(n)$ edges, using $O(n\log n)$ working memory---that achieve approximation ratios of $7.75$ in a single pass and $(3+\epsilon)$ in $O(\epsilon^{-3})$ passes respectively. The operations of these algorithms mimic those of Zelke's and McGregor's respective algorithms for MWM; the novelty lies in the analysis for the MSM setting. In fact we identify a general framework for MWM algorithms that allows this kind of adaptation to the broader setting of MSM. In the sequel, we give generalizations of these results where the maximization is over "independent sets" in a very general sense. This generalization captures hypermatchings in hypergraphs as well as independence in the intersection of multiple matroids.

...read moreread less

Posted Content•

Distributed Minimum Cut Approximation

[...]

Mohsen Ghaffari¹, Fabian Kuhn²•Institutions (2)

Massachusetts Institute of Technology¹, University of Freiburg²

23 May 2013-arXiv: Data Structures and Algorithms

TL;DR: It is shown that the same lower bound holds for unweighted multigraphs or equivalently for weighted graphs in which Owlogn bits can be transmitted in each round over an edge of weight w.r.t. the CONGEST model, and that computing an α-approximate minimum cut requires time at least $\tilde{\Omega}D + \sqrt{n}/\alpha^{1/4}$ .

...read moreread less

Abstract: We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round, $O(\log n)$ bits can be transmitted over each edge (a.k.a. the CONGEST model). We present a distributed algorithm that, for any weighted graph and any $\epsilon \in (0, 1)$, with high probability finds a cut of size at most $O(\epsilon^{-1}\lambda)$ in $O(D) + \tilde{O}(n^{1/2 + \epsilon})$ rounds, where $\lambda$ is the size of the minimum cut. This algorithm is based on a simple approach for analyzing random edge sampling, which we call the random layering technique. In addition, we also present another distributed algorithm, which is based on a centralized algorithm due to Matula [SODA '93], that with high probability computes a cut of size at most $(2+\epsilon)\lambda$ in $\tilde{O}((D+\sqrt{n})/\epsilon^5)$ rounds for any $\epsilon>0$. The time complexities of both of these algorithms almost match the $\tilde{\Omega}(D + \sqrt{n})$ lower bound of Das Sarma et al. [STOC '11], thus leading to an answer to an open question raised by Elkin [SIGACT-News '04] and Das Sarma et al. [STOC '11]. Furthermore, we also strengthen the lower bound of Das Sarma et al. by extending it to unweighted graphs. We show that the same lower bound also holds for unweighted multigraphs (or equivalently for weighted graphs in which $O(w\log n)$ bits can be transmitted in each round over an edge of weight $w$), even if the diameter is $D=O(\log n)$. For unweighted simple graphs, we show that even for networks of diameter $\tilde{O}(\frac{1}{\lambda}\cdot \sqrt{\frac{n}{\alpha\lambda}})$, finding an $\alpha$-approximate minimum cut in networks of edge connectivity $\lambda$ or computing an $\alpha$-approximation of the edge connectivity requires $\tilde{\Omega}(D + \sqrt{\frac{n}{\alpha\lambda}})$ rounds.

...read moreread less

Posted Content•

Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

[...]

Rishabh Iyer¹, Jeff A. Bilmes¹•Institutions (1)

University of Washington¹

08 Nov 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors investigate two new optimization problems, namely, minimizing a submodular function subject to a sub-modular lower bound constraint (submodular cover) and maximizing a subMODULE subject to an upper bound constraint, and provide hardness results for both problems up to log-factors.

...read moreread less

Abstract: We investigate two new optimization problems -- minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [14, 35] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms.

...read moreread less

Posted Content•

A Stochastic Probing Problem with Applications

[...]

Anupam Gupta¹, Viswanath Nagarajan²•Institutions (2)

Carnegie Mellon University¹, IBM²

24 Feb 2013-arXiv: Data Structures and Algorithms

TL;DR: The first polynomial-time Ω(1/k)-approximate "Sequential Posted Price Mechanism" under k-matroid intersection feasibility constraints is obtained, improving on prior work.

...read moreread less

Abstract: We study a general stochastic probing problem defined on a universe V, where each element e in V is "active" independently with probability p_e. Elements have weights {w_e} and the goal is to maximize the weight of a chosen subset S of active elements. However, we are given only the p_e values-- to determine whether or not an element e is active, our algorithm must probe e. If element e is probed and happens to be active, then e must irrevocably be added to the chosen set S; if e is not active then it is not included in S. Moreover, the following conditions must hold in every random instantiation: (1) the set Q of probed elements satisfy an "outer" packing constraint, and (2) the set S of chosen elements satisfy an "inner" packing constraint. The kinds of packing constraints we consider are intersections of matroids and knapsacks. Our results provide a simple and unified view of results in stochastic matching and Bayesian mechanism design, and can also handle more general constraints. As an application, we obtain the first polynomial-time $\Omega(1/k)$-approximate "Sequential Posted Price Mechanism" under k-matroid intersection feasibility constraints.

...read moreread less

Posted Content•

Order Preserving Matching

[...]

Jinil Kim¹, Peter Eades², Rudolf Fleischer³, Seok-Hee Hong², Costas S. Iliopoulos⁴, Kunsoo Park¹, Simon J. Puglisi⁵, Takeshi Tokuyama⁶ - Show less +4 more•Institutions (6)

Seoul National University¹, University of Sydney², German University of Technology in Oman³, King's College London⁴, University of Helsinki⁵, Tohoku University⁶

17 Feb 2013-arXiv: Data Structures and Algorithms

TL;DR: Order-preserving matching on numeric strings was introduced in this article, where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern.

...read moreread less

Abstract: We introduce a new string matching problem called order-preserving matching on numeric strings where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching in which the order relations should be matched instead of the strings themselves. Solving order-preserving matching has to do with representations of order relations of a numeric string. We define prefix representation and nearest neighbor representation, which lead to efficient algorithms for order-preserving matching. We present efficient algorithms for single and multiple pattern cases. For the single pattern case, we give an O(n log m) time algorithm and optimize it further to obtain O(n + m log m) time. For the multiple pattern case, we give an O(n log m) time algorithm.

...read moreread less

Posted Content•

Local algorithms for interactive clustering

[...]

Pranjal Awasthi¹, Maria-Florina Balcan², Konstantin Voevodski³•Institutions (3)

Rutgers University¹, Carnegie Mellon University², Google³

24 Dec 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions and show that in this constrained setting one can still design provably efficient algorithms that produce accurate clusterings.

...read moreread less

Abstract: We study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions. Our algorithms start with any initial clustering and only make local changes in each step; both are desirable features in many applications. We show that in this constrained setting one can still design provably efficient algorithms that produce accurate clusterings. We also show that our algorithms perform well on real-world data.

...read moreread less

Posted Content•

Tight Bounds for Set Disjointness in the Message Passing Model

[...]

Mark Braverman, Faith Ellen, Rotem Oshman, Toniann Pitassi, Vinod Vaikuntanathan - Show less +1 more

21 May 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, a lower bound of the form Ω(n \cdot k)$ for the set disjointness problem in the message passing model was shown.

...read moreread less

Abstract: In a multiparty message-passing model of communication, there are $k$ players. Each player has a private input, and they communicate by sending messages to one another over private channels. While this model has been used extensively in distributed computing and in multiparty computation, lower bounds on communication complexity in this model and related models have been somewhat scarce. In recent work \cite{phillips12,woodruff12,woodruff13}, strong lower bounds of the form $\Omega(n \cdot k)$ were obtained for several functions in the message-passing model; however, a lower bound on the classical Set Disjointness problem remained elusive. In this paper, we prove tight lower bounds of the form $\Omega(n \cdot k)$ for the Set Disjointness problem in the message passing model. Our bounds are obtained by developing information complexity tools in the message-passing model, and then proving an information complexity lower bound for Set Disjointness. As a corollary, we show a tight lower bound for the task allocation problem \cite{DruckerKuhnOshman} via a reduction from Set Disjointness.

...read moreread less

Posted Content•

Sample-Optimal Average-Case Sparse Fourier Transform in Two Dimensions

[...]

Badih Ghazi¹, Haitham Hassanieh¹, Piotr Indyk¹, Dina Katabi¹, Eric Price¹, Lixin Shi¹ - Show less +2 more•Institutions (1)

Massachusetts Institute of Technology¹

05 Mar 2013-arXiv: Data Structures and Algorithms

TL;DR: The first sample-optimal sublinear time algorithms for the sparse Discrete Fourier Transform over a two-dimensional√n × √n grid are presented and match the lower bounds on sample complexity for their respective signal models.

...read moreread less

Abstract: We present the first sample-optimal sublinear time algorithms for the sparse Discrete Fourier Transform over a two-dimensional sqrt{n} x sqrt{n} grid. Our algorithms are analyzed for /average case/ signals. For signals whose spectrum is exactly sparse, our algorithms use O(k) samples and run in O(k log k) time, where k is the expected sparsity of the signal. For signals whose spectrum is approximately sparse, our algorithm uses O(k log n) samples and runs in O(k log^2 n) time; the latter algorithm works for k=Theta(sqrt{n}). The number of samples used by our algorithms matches the known lower bounds for the respective signal models. By a known reduction, our algorithms give similar results for the one-dimensional sparse Discrete Fourier Transform when n is a power of a small composite number (e.g., n = 6^t).

...read moreread less

Posted Content•

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

[...]

Kamil Salikhov¹, Gustavo Sacomoto², Gustavo Sacomoto³, Gregory Kucherov⁴•Institutions (4)

Moscow State University¹, University of Lyon², French Institute for Research in Computer Science and Automation³, Ben-Gurion University of the Negev⁴

28 Feb 2013-arXiv: Data Structures and Algorithms

TL;DR: This work shows how to reduce the memory required by the data structure of Chikhi and Rizk (WABI’12) that represents de Brujin graphs using Bloom filters, which constitutes the most efficient practical representation of de Bruijn graphs.

...read moreread less

Abstract: De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.

...read moreread less

Collapse