scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Data Structures and Algorithms in 2013"


Posted Content
TL;DR: In this article, the authors present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements.
Abstract: Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.

291 citations


Posted Content
TL;DR: This work proposes a new exact method for shortest-path distance queries on large-scale networks that can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods.
Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

278 citations


Posted Content
TL;DR: A deeper understanding of interior-point methods is acquired - a powerful tool in convex optimization - in the context of flow problems, as well as, utilizing certain interplay between maximum flows and bipartite matchings.
Abstract: We present an $\tilde{O}(m^{10/7})=\tilde{O}(m^{1.43})$-time algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing $O(m \min(\sqrt{m},n^{2/3}))$ time bound due to Even and Tarjan [EvenT75]. By well-known reductions, this also establishes an $\tilde{O}(m^{10/7})$-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated celebrated $O(m \sqrt{n})$ time bound of Hopcroft and Karp [HK73] whenever the input graph is sufficiently sparse.

191 citations


Posted Content
Nitish Korula1, Silvio Lattanzi1
TL;DR: In this article, a small fraction of individuals explicitly link their accounts across multiple online social networks (e.g., Facebook, Twitter, Google+, LinkedIn, etc.) and leverage these connections to identify a very large fraction of the network.
Abstract: People today typically use multiple online social networks (Facebook, Twitter, Google+, LinkedIn, etc.). Each online network represents a subset of their "real" ego-networks. An interesting and challenging problem is to reconcile these online networks, that is, to identify all the accounts belonging to the same individual. Besides providing a richer understanding of social dynamics, the problem has a number of practical applications. At first sight, this problem appears algorithmically challenging. Fortunately, a small fraction of individuals explicitly link their accounts across multiple networks; our work leverages these connections to identify a very large fraction of the network. Our main contributions are to mathematically formalize the problem for the first time, and to design a simple, local, and efficient parallel algorithm to solve it. We are able to prove strong theoretical guarantees on the algorithm's performance on well-established network models (Random Graphs, Preferential Attachment). We also experimentally confirm the effectiveness of the algorithm on synthetic and real social network data sets.

189 citations


Posted Content
TL;DR: A simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time and has the fastest known running time under the standard unit-cost RAM model.
Abstract: In this paper, we present a simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, spectral sparsification, or even the Chebyshev Method or Conjugate Gradient. After constructing a "nice" spanning tree of a graph associated with the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm has the fastest known running time under the standard unit-cost RAM model. We hope that the simplicity of the algorithm and the insights yielded by its analysis will be useful in both theory and practice.

177 citations


Posted Content
TL;DR: The first sub-linear time algorithm for this problem was presented in this paper, which matched the lower bounds of Valiant up to a logarithmic factor in $n, and a polynomial factor in O(n 2 ).
Abstract: We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions $p$ and $q$ over an $n$-element set, we wish to distinguish whether $p=q$ versus $p$ is at least $\eps$-far from $q$, in either $\ell_1$ or $\ell_2$ distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in $n$, and a polynomial factor of $\eps.$ In this work, we present simple (and new) testers for both the $\ell_1$ and $\ell_2$ settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on $n$, and the dependence on $\eps$; for the $\ell_1$ testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\eps^{4/3}, n^{1/2}/\eps^2 \}).$

148 citations


Posted Content
TL;DR: In this paper, a randomized algorithm for computing the min-plus product (a.k.a., tropical product) of two $n \times n$ matrices was presented, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense $n$-node directed graphs with arbitrary edge weights.
Abstract: We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two $n \times n$ matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense $n$-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time \[\frac{n^3}{2^{\Omega(\log n)^{1/2}}}\] and is correct with high probability. On the word RAM, the algorithm runs in $n^3/2^{\Omega(\log n)^{1/2}} + n^{2+o(1)}\log M$ time for edge weights in $([0,M] \cap {\mathbb Z})\cup\{\infty\}$. Prior algorithms used either $n^3/(\log^c n)$ time for various $c \leq 2$, or $O(M^{\alpha}n^{\beta})$ time for various $\alpha > 0$ and $\beta > 2$. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing ${\sf AC}^0[p]$ circuits, to efficiently reduce a matrix product over the $(\min,+)$ algebra to a relatively small number of rectangular matrix products over ${\mathbb F}_2$, each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in $n^3/2^{\log^{\delta} n}$ time for some $\delta > 0$, which utilizes the Yao-Beigel-Tarui translation of ${\sf AC}^0[m]$ circuits into "nice" depth-two circuits.

143 citations


Posted Content
Moritz Hardt1, Eric Price1
TL;DR: A new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that is called the noisy power method is provided and shows that the error dependence of the algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix.
Abstract: We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix-vector multiplication. The noisy power method can be seen as a meta-algorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacy-preserving spectral analysis. Our general analysis subsumes several existing ad-hoc convergence bounds and resolves a number of open problems in multiple applications including streaming PCA and privacy-preserving singular vector computation.

137 citations


Posted Content
TL;DR: A general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem, and has implications beyond the MapReduce model.
Abstract: We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+\epsilon)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_\epsilon(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+\epsilon)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+\epsilon)$-approximation algorithm with $n^{\delta}$ space in the streaming-with-sorting model with $1/\delta^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

135 citations


Posted Content
TL;DR: This work introduces a new approach to the maximum flow problem in undirected, capacitated graphs using congestion-approximators: easy-to-compute functions that approximate the congestion required to route single-commodity demands in a graph to within some factor α.
Abstract: We introduce a new approach to the maximum flow problem in undirected, capacitated graphs using $\alpha$-\emph{congestion-approximators}: easy-to-compute functions that approximate the congestion required to route single-commodity demands in a graph to within a factor of $\alpha$. Our algorithm maintains an arbitrary flow that may have some residual excess and deficits, while taking steps to minimize a potential function measuring the congestion of the current flow plus an over-estimate of the congestion required to route the residual demand. Since the residual term over-estimates, the descent process gradually moves the contribution to our potential function from the residual term to the congestion term, eventually achieving a flow routing the desired demands with nearly minimal congestion after $\tilde{O}(\alpha\eps^{-2}\log^2 n)$ iterations. Our approach is similar in spirit to that used by Spielman and Teng (STOC 2004) for solving Laplacian systems, and we summarize our approach as trying to do for $\ell_\infty$-flows what they do for $\ell_2$-flows. Together with a nearly linear time construction of a $n^{o(1)}$-congestion-approximator, we obtain $1+\eps$-optimal single-commodity flows undirected graphs in time $m^{1+o(1)}\eps^{-2}$, yielding the fastest known algorithm for that problem. Our requirements of a congestion-approximator are quite low, suggesting even faster and simpler algorithms for certain classes of graphs. For example, an $\alpha$-competitive oblivious routing tree meets our definition, \emph{even without knowing how to route the tree back in the graph}. For graphs of conductance $\phi$, a trivial $\phi^{-1}$-congestion-approximator gives an extremely simple algorithm for finding $1+\eps$-optimal-flows in time $\tilde{O}(m\phi^{-1})$.

132 citations


Posted Content
TL;DR: By a standard reduction, a new data structure is presented for the Hamming space and e1 norm with ρ ≤ 7/(8c)+ O(1/c3/2)+ oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).
Abstract: We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(n^{\rho} + d log n) query time and O(n^{1 + \rho} + d log n) space, where \rho <= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and \ell_1 norm with \rho <= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

Posted Content
TL;DR: It is shown how to reduce both submodular maximization and supermodular minimization to this general problem when the objective function has bounded total curvature, and it is proved that the approximation results obtained are the best possible in the value oracle model.
Abstract: We design new approximation algorithms for the problems of optimizing submodular and supermodular functions subject to a single matroid constraint. Specifically, we consider the case in which we wish to maximize a nondecreasing submodular function or minimize a nonincreasing supermodular function in the setting of bounded total curvature $c$. In the case of submodular maximization with curvature $c$, we obtain a $(1-c/e)$-approximation --- the first improvement over the greedy $(1-e^{-c})/c$-approximation of Conforti and Cornuejols from 1984, which holds for a cardinality constraint, as well as recent approaches that hold for an arbitrary matroid constraint. Our approach is based on modifications of the continuous greedy algorithm and non-oblivious local search, and allows us to approximately maximize the sum of a nonnegative, nondecreasing submodular function and a (possibly negative) linear function. We show how to reduce both submodular maximization and supermodular minimization to this general problem when the objective function has bounded total curvature. We prove that the approximation results we obtain are the best possible in the value oracle model, even in the case of a cardinality constraint. We define an extension of the notion of curvature to general monotone set functions and show $(1-c)$-approximation for maximization and $1/(1-c)$-approximation for minimization cases. Finally, we give two concrete applications of our results in the settings of maximum entropy sampling, and the column-subset selection problem.

Posted Content
TL;DR: In this article, a smoothed analysis model is introduced for learning generative models and an efficient tensor decomposition in the highly overcomplete case (rank polynomial in the dimension) is developed.
Abstract: Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, our main contribution is in analyzing their stability in the framework of smoothed analysis. Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.

Posted Content
TL;DR: In this paper, the authors considered the problem of packing LP in an online model where the columns are presented to the algorithm in random order, and they obtained an incentive compatible (1 − ϵ)-competitive algorithm with competitive ratio O(1) for any (randomized) online algorithm.
Abstract: We study packing LPs in an online model where the columns are presented to the algorithm in random order. This natural problem was investigated in various recent studies motivated, e.g., by online ad allocations and yield management where rows correspond to resources and columns to requests specifying demands for resources. Our main contribution is a $1-O(\sqrt{(\log{d})/B})$-competitive online algorithm, where $d$ denotes the column sparsity, i.e., the maximum number of resources that occur in a single column, and $B$ denotes the capacity ratio $B$, i.e., the ratio between the capacity of a resource and the maximum demand for this resource. In other words, we achieve a $(1 - \epsilon)$-approximation if the capacity ratio satisfies $B=\Omega((\log d)/\epsilon^2)$, which is known to be best-possible for any (randomized) online algorithms. Our result improves exponentially on previous work with respect to the capacity ratio. In contrast to existing results on packing LP problems, our algorithm does not use dual prices to guide the allocation of resources. Instead, it simply solves, for each request, a scaled version of the partially known primal program and randomly rounds the obtained fractional solution to obtain an integral allocation for this request. We show that this simple algorithmic technique is not restricted to packing LPs with large capacity ratio: We prove an upper bound on the competitive ratio of $\Omega(d^{-1/(B-1)})$, for any $B \ge 2$. In addition, we show that our approach can be combined with VCG payments and obtain an incentive compatible $(1-\epsilon)$-competitive mechanism for packing LPs with $B=\Omega((\log m)/\epsilon^2)$, where $m$ is the number of constraints. Finally, we apply our technique to the generalized assignment problem for which we obtain the first online algorithm with competitive ratio $O(1)$.

Posted Content
TL;DR: Arterial Hierarchy is presented, an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks and outperforms the state of the art in terms of query time and space and pre-computation overheads.
Abstract: Given two locations $s$ and $t$ in a road network, a distance query returns the minimum network distance from $s$ to $t$, while a shortest path query computes the actual route that achieves the minimum distance. These two types of queries find important applications in practice, and a plethora of solutions have been proposed in past few decades. The existing solutions, however, are optimized for either practical or asymptotic performance, but not both. In particular, the techniques with enhanced practical efficiency are mostly heuristic-based, and they offer unattractive worst-case guarantees in terms of space and time. On the other hand, the methods that are worst-case efficient often entail prohibitive preprocessing or space overheads, which render them inapplicable for the large road networks (with millions of nodes) commonly used in modern map applications. This paper presents {\em Arterial Hierarchy (AH)}, an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks. On the theoretical side, we show that, under a realistic assumption, AH answers any distance query in $\tilde{O}(\log \r)$ time, where $\r = d_{max}/d_{min}$, and $d_{max}$ (resp.\ $d_{min}$) is the largest (resp.\ smallest) $L_\infty$ distance between any two nodes in the road network. In addition, any shortest path query can be answered in $\tilde{O}(k + \log \r)$ time, where $k$ is the number of nodes on the shortest path. On the practical side, we experimentally evaluate AH on a large set of real road networks with up to twenty million nodes, and we demonstrate that (i) AH outperforms the state of the art in terms of query time, and (ii) its space and pre-computation overheads are moderate.

Posted Content
TL;DR: In this paper, representative families of linear matroids have been used for designing single-exponential parameterized and exact exponential time algorithms on graphs of bounded treewidth. But these algorithms are not suitable for graphs with constant trewidth, such as k-vertex patterns.
Abstract: We give two algorithms computing representative families of linear and uniform matroids and demonstrate how to use representative families for designing single-exponential parameterized and exact exponential time algorithms. The applications of our approach include - LONGEST DIRECTED CYCLE - MINIMUM EQUIVALENT GRAPH (MEG) - Algorithms on graphs of bounded treewidth -k-PATH, k-TREE, and more generally, k-SUBGRAPH ISOMORPHISM, where the k-vertex pattern graph is of constant treewidth.

Posted Content
TL;DR: In this article, a framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials) is presented.
Abstract: We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials). The resulting algorithms, which repeatedly compute and then efficiently optimize submodular semigradients, offer new and generalize many old methods for submodular optimization. Our approach, moreover, takes steps towards providing a unifying paradigm applicable to both submodular min- imization and maximization, problems that historically have been treated quite distinctly. The practicality of our algorithms is important since interest in submodularity, owing to its natural and wide applicability, has recently been in ascendance within machine learning. We analyze theoretical properties of our algorithms for minimization and maximization, and show that many state-of-the-art maximization algorithms are special cases. Lastly, we complement our theoretical analyses with supporting empirical experiments.

Posted Content
TL;DR: In this article, the authors describe a computational model for studying the complexity of self-assembled structures with active molecular components, inspired by biology's ability to assemble biomolecules that form systems with complicated structure and dynamics.
Abstract: We describe a computational model for studying the complexity of self-assembled structures with active molecular components. Our model captures notions of growth and movement ubiquitous in biological systems. The model is inspired by biology's fantastic ability to assemble biomolecules that form systems with complicated structure and dynamics, from molecular motors that walk on rigid tracks and proteins that dynamically alter the structure of the cell during mitosis, to embryonic development where large-scale complicated organisms efficiently grow from a single cell. Using this active self-assembly model, we show how to efficiently self-assemble shapes and patterns from simple monomers. For example, we show how to grow a line of monomers in time and number of monomer states that is merely logarithmic in the length of the line. Our main results show how to grow arbitrary connected two-dimensional geometric shapes and patterns in expected time that is polylogarithmic in the size of the shape, plus roughly the time required to run a Turing machine deciding whether or not a given pixel is in the shape. We do this while keeping the number of monomer types logarithmic in shape size, plus those monomers required by the Kolmogorov complexity of the shape or pattern. This work thus highlights the efficiency advantages of active self-assembly over passive self-assembly and motivates experimental effort to construct general-purpose active molecular self-assembly systems.

Posted Content
TL;DR: In this article, the authors studied the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges.
Abstract: We study the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges. This problem, which we call maximum submodular-function matching (MSM), is a natural generalization of maximum weight matching (MWM), which is in turn a generalization of maximum cardinality matching (MCM). We give two incomparable algorithms for this problem with space usage falling in the semi-streaming range---they store only $O(n)$ edges, using $O(n\log n)$ working memory---that achieve approximation ratios of $7.75$ in a single pass and $(3+\epsilon)$ in $O(\epsilon^{-3})$ passes respectively. The operations of these algorithms mimic those of Zelke's and McGregor's respective algorithms for MWM; the novelty lies in the analysis for the MSM setting. In fact we identify a general framework for MWM algorithms that allows this kind of adaptation to the broader setting of MSM. In the sequel, we give generalizations of these results where the maximization is over "independent sets" in a very general sense. This generalization captures hypermatchings in hypergraphs as well as independence in the intersection of multiple matroids.

Posted Content
TL;DR: It is shown that the same lower bound holds for unweighted multigraphs or equivalently for weighted graphs in which Owlogn bits can be transmitted in each round over an edge of weight w.r.t. the CONGEST model, and that computing an α-approximate minimum cut requires time at least $\tilde{\Omega}D + \sqrt{n}/\alpha^{1/4}$ .
Abstract: We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round, $O(\log n)$ bits can be transmitted over each edge (a.k.a. the CONGEST model). We present a distributed algorithm that, for any weighted graph and any $\epsilon \in (0, 1)$, with high probability finds a cut of size at most $O(\epsilon^{-1}\lambda)$ in $O(D) + \tilde{O}(n^{1/2 + \epsilon})$ rounds, where $\lambda$ is the size of the minimum cut. This algorithm is based on a simple approach for analyzing random edge sampling, which we call the random layering technique. In addition, we also present another distributed algorithm, which is based on a centralized algorithm due to Matula [SODA '93], that with high probability computes a cut of size at most $(2+\epsilon)\lambda$ in $\tilde{O}((D+\sqrt{n})/\epsilon^5)$ rounds for any $\epsilon>0$. The time complexities of both of these algorithms almost match the $\tilde{\Omega}(D + \sqrt{n})$ lower bound of Das Sarma et al. [STOC '11], thus leading to an answer to an open question raised by Elkin [SIGACT-News '04] and Das Sarma et al. [STOC '11]. Furthermore, we also strengthen the lower bound of Das Sarma et al. by extending it to unweighted graphs. We show that the same lower bound also holds for unweighted multigraphs (or equivalently for weighted graphs in which $O(w\log n)$ bits can be transmitted in each round over an edge of weight $w$), even if the diameter is $D=O(\log n)$. For unweighted simple graphs, we show that even for networks of diameter $\tilde{O}(\frac{1}{\lambda}\cdot \sqrt{\frac{n}{\alpha\lambda}})$, finding an $\alpha$-approximate minimum cut in networks of edge connectivity $\lambda$ or computing an $\alpha$-approximation of the edge connectivity requires $\tilde{\Omega}(D + \sqrt{\frac{n}{\alpha\lambda}})$ rounds.

Posted Content
TL;DR: In this paper, the authors investigate two new optimization problems, namely, minimizing a submodular function subject to a sub-modular lower bound constraint (submodular cover) and maximizing a subMODULE subject to an upper bound constraint, and provide hardness results for both problems up to log-factors.
Abstract: We investigate two new optimization problems -- minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [14, 35] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms.

Posted Content
TL;DR: The first polynomial-time Ω(1/k)-approximate "Sequential Posted Price Mechanism" under k-matroid intersection feasibility constraints is obtained, improving on prior work.
Abstract: We study a general stochastic probing problem defined on a universe V, where each element e in V is "active" independently with probability p_e. Elements have weights {w_e} and the goal is to maximize the weight of a chosen subset S of active elements. However, we are given only the p_e values-- to determine whether or not an element e is active, our algorithm must probe e. If element e is probed and happens to be active, then e must irrevocably be added to the chosen set S; if e is not active then it is not included in S. Moreover, the following conditions must hold in every random instantiation: (1) the set Q of probed elements satisfy an "outer" packing constraint, and (2) the set S of chosen elements satisfy an "inner" packing constraint. The kinds of packing constraints we consider are intersections of matroids and knapsacks. Our results provide a simple and unified view of results in stochastic matching and Bayesian mechanism design, and can also handle more general constraints. As an application, we obtain the first polynomial-time $\Omega(1/k)$-approximate "Sequential Posted Price Mechanism" under k-matroid intersection feasibility constraints.

Posted Content
TL;DR: Order-preserving matching on numeric strings was introduced in this article, where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern.
Abstract: We introduce a new string matching problem called order-preserving matching on numeric strings where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching in which the order relations should be matched instead of the strings themselves. Solving order-preserving matching has to do with representations of order relations of a numeric string. We define prefix representation and nearest neighbor representation, which lead to efficient algorithms for order-preserving matching. We present efficient algorithms for single and multiple pattern cases. For the single pattern case, we give an O(n log m) time algorithm and optimize it further to obtain O(n + m log m) time. For the multiple pattern case, we give an O(n log m) time algorithm.

Posted Content
TL;DR: In this article, the authors study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions and show that in this constrained setting one can still design provably efficient algorithms that produce accurate clusterings.
Abstract: We study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions. Our algorithms start with any initial clustering and only make local changes in each step; both are desirable features in many applications. We show that in this constrained setting one can still design provably efficient algorithms that produce accurate clusterings. We also show that our algorithms perform well on real-world data.

Posted Content
TL;DR: In this article, a lower bound of the form Ω(n \cdot k)$ for the set disjointness problem in the message passing model was shown.
Abstract: In a multiparty message-passing model of communication, there are $k$ players. Each player has a private input, and they communicate by sending messages to one another over private channels. While this model has been used extensively in distributed computing and in multiparty computation, lower bounds on communication complexity in this model and related models have been somewhat scarce. In recent work \cite{phillips12,woodruff12,woodruff13}, strong lower bounds of the form $\Omega(n \cdot k)$ were obtained for several functions in the message-passing model; however, a lower bound on the classical Set Disjointness problem remained elusive. In this paper, we prove tight lower bounds of the form $\Omega(n \cdot k)$ for the Set Disjointness problem in the message passing model. Our bounds are obtained by developing information complexity tools in the message-passing model, and then proving an information complexity lower bound for Set Disjointness. As a corollary, we show a tight lower bound for the task allocation problem \cite{DruckerKuhnOshman} via a reduction from Set Disjointness.

Posted Content
TL;DR: The first sample-optimal sublinear time algorithms for the sparse Discrete Fourier Transform over a two-dimensional√n × √n grid are presented and match the lower bounds on sample complexity for their respective signal models.
Abstract: We present the first sample-optimal sublinear time algorithms for the sparse Discrete Fourier Transform over a two-dimensional sqrt{n} x sqrt{n} grid. Our algorithms are analyzed for /average case/ signals. For signals whose spectrum is exactly sparse, our algorithms use O(k) samples and run in O(k log k) time, where k is the expected sparsity of the signal. For signals whose spectrum is approximately sparse, our algorithm uses O(k log n) samples and runs in O(k log^2 n) time; the latter algorithm works for k=Theta(sqrt{n}). The number of samples used by our algorithms matches the known lower bounds for the respective signal models. By a known reduction, our algorithms give similar results for the one-dimensional sparse Discrete Fourier Transform when n is a power of a small composite number (e.g., n = 6^t).

Posted Content
TL;DR: This work shows how to reduce the memory required by the data structure of Chikhi and Rizk (WABI’12) that represents de Brujin graphs using Bloom filters, which constitutes the most efficient practical representation of de Bruijn graphs.
Abstract: De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.

Posted Content
TL;DR: This work shows how computing the sparse DFT X is equivalent to decoding of these sparse-graph codes and is low in both sample complexity and decoding complexity, and dubs the algorithm the FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm.
Abstract: Given an $n$-length input signal $\mbf{x}$, it is well known that its Discrete Fourier Transform (DFT), $\mbf{X}$, can be computed in $O(n \log n)$ complexity using a Fast Fourier Transform (FFT). If the spectrum $\mbf{X}$ is exactly $k$-sparse (where $k<

Posted Content
TL;DR: Recent trends in practical algorithms for balanced graph partitioning are surveyed, applications are pointed to and future research directions are discussed.
Abstract: We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.

Posted Content
TL;DR: In this paper, it was shown that linear kernels on graphs of bounded expansion have almost-linear kernels, when parameterized by the size of a modulator to constant-treedepth graphs.
Abstract: Meta-theorems for polynomial (linear) kernels have been the subject of intensive research in parameterized complexity. Heretofore, meta-theorems for linear kernels exist on graphs of bounded genus, $H$-minor-free graphs, and $H$-topological-minor-free graphs. To the best of our knowledge, no meta-theorems for polynomial kernels are known for any larger sparse graph classes; e.g., for classes of bounded expansion or for nowhere dense ones. In this paper we prove such meta-theorems for the two latter cases. More specifically, we show that graph problems that have finite integer index (FII) have linear kernels on graphs of bounded expansion when parameterized by the size of a modulator to constant-treedepth graphs. For nowhere dense graph classes, our result yields almost-linear kernels. While our parameter may seem rather strong, we argue that a linear kernelization result on graphs of bounded expansion with a weaker parameter (than treedepth modulator) would fail to include some of the problems covered by our framework. Moreover, we only require the problems to have FII on graphs of constant treedepth. This allows us to prove linear kernels for problems such as Longest Path/Cycle, Exact $s,t$-Path, Treewidth, and Pathwidth, which do not have FII on general graphs (and the first two not even on bounded treewidth graphs).