scispace - formally typeset
Search or ask a question

Showing papers on "Time complexity published in 2014"


Journal ArticleDOI
TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.
Abstract: The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

2,079 citations


Journal ArticleDOI
TL;DR: This work shows that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples, and an exponential speedup is obtained.
Abstract: Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

1,078 citations


Posted Content
TL;DR: TIM is presented, an algorithm that aims to bridge the theory and practice in influence maximization and outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time.
Abstract: Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / \epsilon^2) expected time and returns a (1-1/e-\epsilon)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the \Omega(m + n) lower-bound established in previous work (for fixed k, \ell, and \epsilon). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, \epsilon = 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.

609 citations


Proceedings ArticleDOI
18 Oct 2014
TL;DR: In this article, the authors provide new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.
Abstract: Convex empirical risk minimization is a basic tool in machine learning and statistics. We provide new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex.Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for (aepsi;, 0)- and (aepsi;,a#x03B4;)-differential privacy, perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.

587 citations


Proceedings ArticleDOI
18 Jun 2014
TL;DR: TIM as discussed by the authors is an algorithm for influence maximization that runs in O((k+ l) (n+m) log n/e2) expected time and returns a (1-1/e-e)-approximate solution with at least 1 - n-l probability.
Abstract: Given a social network G and a constant $k$, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+ l) (n+m) log n/e2) expected time and returns a (1-1/e-e)-approximate solution with at least 1 - n-l probability. The time complexity of TIM is near-optimal under the IC model, as it is only a log n factor larger than the Ω(m + n) lower-bound established in previous work (for fixed k, l, and e). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, e = 0.2, and l = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges. This demonstrates that influence maximization algorithms can be made practical while still offering strong theoretical guarantees.

506 citations


Journal ArticleDOI
09 Jul 2014
TL;DR: This work presents Super 4PCS for global pointcloud registration that is optimal, i.e., runs in linear time and is also output sensitive in the complexity of the alignment problem based on the (unknown) overlap across scan pairs.
Abstract: Data acquisition in large-scale scenes regularly involves accumulating information across multiple scans A common approach is to locally align scan pairs using Iterative Closest Point (ICP) algorithm (or its variants), but requires static scenes and small motion between scan pairs This prevents accumulating data across multiple scan sessions and/or different acquisition modalities (eg, stereo, depth scans) Alternatively, one can use a global registration algorithm allowing scans to be in arbitrary initial poses The state-of-the-art global registration algorithm, 4PCS, however has a quadratic time complexity in the number of data points This vastly limits its applicability to acquisition of large environments We present Super 4PCS for global pointcloud registration that is optimal, ie, runs in linear time (in the number of data points) and is also output sensitive in the complexity of the alignment problem based on the (unknown) overlap across scan pairs Technically, we map the algorithm as an 'instance problem' and solve it efficiently using a smart indexing data organization The algorithm is simple, memory-efficient, and fast We demonstrate that Super 4PCS results in significant speedup over alternative approaches and allows unstructured efficient acquisition of scenes at scales previously not possible Complete source code and datasets are available for research use at http://geometrycsuclacuk/projects/2014/super4PCS/

479 citations


Journal ArticleDOI
Hamid Arabnejad1, Jorge G. Barbosa1
TL;DR: The analysis and experiments show that the PEFT algorithm outperforms the state-of-the-art list-based algorithms for heterogeneous systems in terms of schedule length ratio, efficiency, and frequency of best results.
Abstract: Efficient application scheduling algorithms are important for obtaining high performance in heterogeneous computing systems. In this paper, we present a novel list-based scheduling algorithm called Predict Earliest Finish Time (PEFT) for heterogeneous computing systems. The algorithm has the same time complexity as the state-of-the-art algorithm for the same purpose, that is, O(v2.p) for v tasks and p processors, but offers significant makespan improvements by introducing a look-ahead feature without increasing the time complexity associated with computation of an optimistic cost table (OCT). The calculated value is an optimistic cost because processor availability is not considered in the computation. Our algorithm is only based on an OCT that is used to rank tasks and for processor selection. The analysis and experiments based on randomly generated graphs with various characteristics and graphs of real-world applications show that the PEFT algorithm outperforms the state-of-the-art list-based algorithms for heterogeneous systems in terms of schedule length ratio, efficiency, and frequency of best results.

460 citations


Journal ArticleDOI
TL;DR: This paper is a review of AFSA algorithm and describes the evolution of this algorithm along with all improvements, its combination with various methods as well as its applications.
Abstract: AFSA (artificial fish-swarm algorithm) is one of the best methods of optimization among the swarm intelligence algorithms. This algorithm is inspired by the collective movement of the fish and their various social behaviors. Based on a series of instinctive behaviors, the fish always try to maintain their colonies and accordingly demonstrate intelligent behaviors. Searching for food, immigration and dealing with dangers all happen in a social form and interactions between all fish in a group will result in an intelligent social behavior.This algorithm has many advantages including high convergence speed, flexibility, fault tolerance and high accuracy. This paper is a review of AFSA algorithm and describes the evolution of this algorithm along with all improvements, its combination with various methods as well as its applications. There are many optimization methods which have a affinity with this method and the result of this combination will improve the performance of this method. Its disadvantages include high time complexity, lack of balance between global and local search, in addition to lack of benefiting from the experiences of group members for the next movements.

333 citations


Journal ArticleDOI
TL;DR: This study proves that the time complexity of the EMD/EEMD is actually equivalent to that of the Fourier Transform.
Abstract: It has been claimed that the empirical mode decomposition (EMD) and its improved version the ensemble EMD (EEMD) are computation intensive. In this study we will prove that the time complexity of the EMD/EEMD, which has never been analyzed before, is actually equivalent to that of the Fourier Transform. Numerical examples are presented to verify that EMD/EEMD is, in fact, a computationally efficient method.

324 citations


Journal ArticleDOI
TL;DR: It is shown that even approximating the minimum number of variables that need to be affected within a multiplicative factor of clog n is NP-hard for some positive c, and that it is possible to find sets of variables matching this in approximability barrier in polynomial time.
Abstract: Given a linear system, we consider the problem of finding a small set of variables to affect with an input so that the resulting system is controllable. We show that this problem is NP-hard; indeed, we show that even approximating the minimum number of variables that need to be affected within a multiplicative factor of clog n is NP-hard for some positive c. On the positive side, we show it is possible to find sets of variables matching this in approximability barrier in polynomial time. This can be done with a simple greedy heuristic which sequentially picks variables to maximize the rank increase of the controllability matrix. Experiments on Erdos-Renyi random graphs that demonstrate this heuristic almost always succeed at finding the minimum number of variables.

320 citations


Posted Content
TL;DR: In this article, the authors consider several well-studied problems in dynamic algorithms and prove that sufficient progress on any of them would imply a breakthrough on one of five major open problems in the theory of algorithms.
Abstract: We consider several well-studied problems in dynamic algorithms and prove that sufficient progress on any of them would imply a breakthrough on one of five major open problems in the theory of algorithms: 1. Is the 3SUM problem on $n$ numbers in $O(n^{2-\epsilon})$ time for some $\epsilon>0$? 2. Can one determine the satisfiability of a CNF formula on $n$ variables in $O((2-\epsilon)^n poly n)$ time for some $\epsilon>0$? 3. Is the All Pairs Shortest Paths problem for graphs on $n$ vertices in $O(n^{3-\epsilon})$ time for some $\epsilon>0$? 4. Is there a linear time algorithm that detects whether a given graph contains a triangle? 5. Is there an $O(n^{3-\epsilon})$ time combinatorial algorithm for $n\times n$ Boolean matrix multiplication? The problems we consider include dynamic versions of bipartite perfect matching, bipartite maximum weight matching, single source reachability, single source shortest paths, strong connectivity, subgraph connectivity, diameter approximation and some nongraph problems such as Pagh's problem defined in a recent paper by Patrascu [STOC 2010].

Proceedings ArticleDOI
05 Jan 2014
TL;DR: Improved approximations for two variants of the cardinality constraint for non-monotone functions are presented and a simple randomized greedy approach is presented where in each step a random element is chosen from a set of "reasonably good" elements.
Abstract: We consider the problem of maximizing a (non-monotone) submodular function subject to a cardinality constraint. In addition to capturing well-known combinatorial optimization problems, e.g., Max-k-Coverage and Max-Bisection, this problem has applications in other more practical settings such as natural language processing, information retrieval, and machine learning. In this work we present improved approximations for two variants of the cardinality constraint for non-monotone functions. When at most k elements can be chosen, we improve the current best 1/e -- o(1) approximation to a factor that is in the range [1/e + 0.004, 1/2], achieving a tight approximation of 1/2 -- o(1) for k = n/2 and breaking the 1/e barrier for all values of k. When exactly k elements must be chosen, our algorithms improve the current best 1/4 -- o(1) approximation to a factor that is in the range [0.356, 1/2], again achieving a tight approximation of 1/2 -- o(1) for k = n/2. Additionally, some of the algorithms we provide are very fast with time complexities of O(nk), as opposed to previous known algorithms which are continuous in nature, and thus, too slow for applications in the practical settings mentioned above.Our algorithms are based on two new techniques. First, we present a simple randomized greedy approach where in each step a random element is chosen from a set of "reasonably good" elements. This approach might be considered a natural substitute for the greedy algorithm of Nemhauser, Wolsey and Fisher [45], as it retains the same tight guarantee of 1--1/e for monotone objectives and the same time complexity of O(nk), while giving an approximation of 1/e for general non-monotone objectives (while the greedy algorithm of Nemhauser et. al. fails to provide any constant guarantee). Second, we extend the double greedy technique, which achieves a tight 1/2 approximation for unconstrained submodular maximization, to the continuous setting. This allows us to manipulate the natural rates by which elements change, thus bounding the total number of elements chosen.

Posted Content
TL;DR: This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.
Abstract: In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We provide new algorithms and matching lower bounds for private ERM assuming only that each data point's contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for $(\epsilon,0)$- and $(\epsilon,\delta)$-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.

Journal ArticleDOI
TL;DR: This work improves DeLong's algorithm by reducing the order of time complexity from quadratic down to linearithmic (the product of sample size and its logarithm).
Abstract: Among algorithms for comparing the areas under two or more correlated receiver operating characteristic (ROC) curves, DeLong's algorithm is perhaps the most widely used one due to its simplicity of implementation in practice. Unfortunately, however, the time complexity of DeLong's algorithm is of quadratic order (the product of sample sizes), thus making it time-consuming and impractical when the sample sizes are large. Based on an equivalent relationship between the Heaviside function and mid-ranks of samples, we improve DeLong's algorithm by reducing the order of time complexity from quadratic down to linearithmic (the product of sample size and its logarithm). Monte Carlo simulations verify the computational efficiency of our algorithmic findings in this work.

Journal ArticleDOI
TL;DR: This article gives an algorithm that computes a (1 − 1 − 0))-approximate maximum weight matching in O(i) time, that is, optimal linear time for any fixed ε, and should be appealing in all applications that can tolerate a negligible relative error.
Abstract: The maximum cardinality and maximum weight matching problems can be solved in O(m√n) time, a bound that has resisted improvement despite decades of research. (Here m and n are the number of edges and vertices.) In this article, we demonstrate that this “m√n barrier” can be bypassed by approximation. For any e > 0, we give an algorithm that computes a (1 − e)-approximate maximum weight matching in O(me−1 log e−1) time, that is, optimal linear time for any fixed e. Our algorithm is dramatically simpler than the best exact maximum weight matching algorithms on general graphs and should be appealing in all applications that can tolerate a negligible relative error.

Proceedings ArticleDOI
18 Oct 2014
TL;DR: It is proved that sufficient progress would imply a breakthrough on one of five major open problems in the theory of algorithms, including dynamic versions of bipartite perfect matching, bipartites maximum weight matching, single source reachability, single sources shortest paths, strong connectivity, subgraph connectivity, diameter approximation and some nongraph problems.
Abstract: We consider several well-studied problems in dynamic algorithms and prove that sufficient progress on any of them would imply a breakthrough on one of five major open problems in the theory of algorithms: 1) Is the 3SUM problem on n numbers in O(n2 -- aepsi;) time for some aepsi; > 0? 2) Can one determine the satisfiability of a CNF formula on n variables and poly n clauses in O((2 -- aepsi;)npolyn) time for some aepsi; > 0? 3) Is the All Pairs Shortest Paths problem for graphs on n vertices in O(n3 -- aepsi;) time for some aepsi; > 0? 4) Is there a linear time algorithm that detects whether a given graph contains a triangle? 5) Is there an O(n3 -- aepsi;) time combinatorial algorithm for n × n Boolean matrix multiplication? The problems we consider include dynamic versions of bipartite perfect matching, bipartite maximum weight matching, single source reachability, single source shortest paths, strong connectivity, subgraph connectivity, diameter approximation and some nongraph problems such as Pagh's problem defined in a recent paper by pa#x0103;traa#x015F;cu [STOC 2010].

Journal ArticleDOI
TL;DR: A new combinatorial method is proposed that builds a system of equations that connect counts of orbits from graphlets with up to five nodes, which allows to compute all orbit counts by enumerating just a single one.
Abstract: Motivation: Small-induced subgraphs called graphlets are emerging as a possible tool for exploration of global and local structure of networks and for analysis of roles of individual nodes. One of the obstacles to their wider use is the computational complexity of algorithms for their discovery and counting. Results: We propose a new combinatorial method for counting graphlets and orbit signatures of network nodes. The algorithm builds a system of equations that connect counts of orbits from graphlets with up to five nodes, which allows to compute all orbit counts by enumerating just a single one. This reduces its practical time complexity in sparse graphs by an order of magnitude as compared with the existing pure enumeration-based algorithms. Availability and implementation: Source code is available freely at http://www.biolab.si/supp/orca/orca.html.

Proceedings ArticleDOI
14 Dec 2014
TL;DR: It is shown that a recent result can be exploited to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.
Abstract: Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.

Proceedings ArticleDOI
05 Jan 2014
TL;DR: A new framework for approximately solving flow problems in capacitated, undirected graphs is introduced and it is applied to provide asymptotically faster algorithms for the maximum s-t flow and maximum concurrent multicommodity flow problems.
Abstract: In this paper, we introduce a new framework for approximately solving flow problems in capacitated, undirected graphs and apply it to provide asymptotically faster algorithms for the maximum s-t flow and maximum concurrent multicommodity flow problems. For graphs with n vertices and m edges, it allows us to find an e-approximate maximum s-t flow in time O(m1+o(1)e-2), improving on the previous best bound of O(mn1/3poly(e-1)). Applying the same framework in the multicommodity setting solves a maximum concurrent multicommodity flow problem with k commodities in O(m1+o(1)e-2k2) time, improving on the existing bound of O(m4/3poly(k>,e-1)).Our algorithms utilize several new technical tools that we believe may be of independent interest:• We give a non-Euclidean generalization of gradient descent and provide bounds on its performance. Using this, we show how to reduce approximate maximum flow and maximum concurrent flow to oblivious routing.• We define and provide an efficient construction of a new type of flow sparsifier. Previous sparsifier constructions approximately preserved the size of cuts and, by duality, the value of the maximum flows as well. However, they did not provide any direct way to route flows in the sparsifier G' back in the original graph G, leading to a longstanding gap between the efficacy of sparsification on flow and cut problems. We ameliorate this by constructing a sparsifier G' that can be embedded (very efficiently) into G with low congestion, allowing one to transfer flows from G' back to G.• We give the first almost-linear-time construction of an O(mo(1))-competitive oblivious routing scheme. No previous such algorithm ran in time better than Ω(mn). By reducing the running time to almost-linear, our work provides a powerful new primitive for constructing very fast graph algorithms.The interested reader is referred to the full version of the paper [8] for a more complete treatment of these results.

Proceedings ArticleDOI
Moritz Hardt1
18 Oct 2014
TL;DR: A new algorithm based on alternating minimization is given that provably recovers an unknown low-rank matrix from a random subsample of its entries under a standard incoherence assumption and gives the strongest sample bounds among all subquadratic time algorithms that are aware of.
Abstract: Alternating minimization is a widely used and empirically successful heuristic for matrix completion and related low-rank optimization problems. Theoretical guarantees for alternating minimization have been hard to come by and are still poorly understood. This is in part because the heuristic is iterative and non-convex in nature. We give a new algorithm based on alternating minimization that provably recovers an unknown low-rank matrix from a random subsample of its entries under a standard incoherence assumption. Our results reduce the sample size requirements of the alternating minimization approach by at least a quartic factor in the rank and the condition number of the unknown matrix. These improvements apply even if the matrix is only close to low-rank in the Frobenius norm. Our algorithm runs in nearly linear time in the dimension of the matrix and, in a broad range of parameters, gives the strongest sample bounds among all subquadratic time algorithms that we are aware of. Underlying our work is a new robust convergence analysis of the well-known Power Method for computing the dominant singular vectors of a matrix. This viewpoint leads to a conceptually simple understanding of alternating minimization. In addition, we contribute a new technique for controlling the coherence of intermediate solutions arising in iterative algorithms based on a smoothed analysis of the QR factorization. These techniques may be of interest beyond their application here.

Posted Content
TL;DR: In this article, it was shown that the edit distance can be computed in time O(n 2 − ε ) for some constant ε > 0, where ε is the number of insertions, deletions or substitutions of symbols needed to transform one string into another.
Abstract: The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time. In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be tight. Specifically, we show that, if the edit distance can be computed in time $O(n^{2-\delta})$ for some constant $\delta>0$, then the satisfiability of conjunctive normal form formulas with $N$ variables and $M$ clauses can be solved in time $M^{O(1)} 2^{(1-\epsilon)N}$ for a constant $\epsilon>0$. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist.

Proceedings ArticleDOI
05 Jan 2014
TL;DR: The efficient construction of a q-representative family of size at most (p-qp) in time bounded by a polynomial in (p+qp), t, and the time required for field operations is demonstrated to demonstrate how the efficientConstruction of representative families can be a powerful tool for designing single-exponential parameterized and exact exponential time algorithms.
Abstract: Let M = (E, I) be a matroid and let S = {S1, ...,St} be a family of subsets of E of size p. A subfamily S ⊆ S is q-representative for S if for every set Y ⊆ E of size at most q, if there is a set X e S disjoint from Y with X ∪ Y e I, then there is a set X e S disjoint from Y with X ∪ Y e I. By the classical result of Bollobas, in a uniform matroid, every family of sets of size p has a q-representative family with at most (p+qp) sets. In his famous "two families theorem" from 1977, Lovasz proved that the same bound also holds for any matroid representable over a field F. As observed by Marx, Lovasz's proof is constructive. In this paper we show how Lovasz's proof can be turned into an algorithm constructing a q-representative family of size at most (p+qp) in time bounded by a polynomial in (p+qp), t, and the time required for field operations.We demonstrate how the efficient construction of representative families can be a powerful tool for designing single-exponential parameterized (2O(k)) and exact exponential time (2O(k)) algorithms. The applications of our approach include the following.• In the Long Directed Cycle problem the input is a directed n-vertex graph G and the positive integer k. The task is to find a directed cycle of length at least k in G, if such a cycle exists. As a consequence of our 8k+onO(1) time algorithm, we have that a directed cycle of length at least log n, if such cycle exists, can be found in polynomial time. As it was shown by Bjorklund, Husfeldt, and Khanna [ICALP 2004], under an appropriate complexity assumption, it is impossible to improve this guarantee by more than a constant factor. Thus our algorithm not only improves over the best previous log n/log log n bound of Gabow and Nie [SODA 2004] but also closes the gap between known lower and upper bounds for this problem.• In the Minimum Equivalent Graph (MEG) problem we are seeking a spanning subdigraph D' of a given n-vertex digraph D with as few arcs as possible in which the reachability relation is the same as in the original digraph D. The existence of a single-exponential cn-time algorithm for some constant c > 1 for MEG was open since the work of Moyles and Thompson [JACM 1969].• To demonstrate the diversity of applications of the approach, we provide an alternative proof of the results recently obtained by Bodlaender, Cygan, Kratsch and Nederlof for algorithms on graphs of bounded treewidth, who showed that many "connectivity" problems such as Hamiltonian Cycle or Steiner Tree can be solved in time 2O(t)n on n-vertex graphs of treewidth at most t. We believe that expressing graph problems in "matroid language" shed light on what makes it possible to solve connectivity problems single-exponential time parameterized by treewidth.For the special case of uniform matroids on n elements, we give a faster algorithm computing a representative family in time O((p+q/q)q · 2o(p+q) · t · log n). We use this algorithm to provide the fastest known deterministic parameterized algorithms for k-Path, k-Tree, and more generally, for k-Subgraph Isomorphism, where the k-vertex pattern graph is of constant treewidth. For example, our k-Path algorithm runs in time O(2.851kn log2n log W) on weighted graphs with maximum edge weight W.

Proceedings ArticleDOI
01 Jun 2014
TL;DR: A much faster model whose time complexity is linear in the number of sentences, with two linear-chain CRFs applied in cascade as local classifiers and a novel approach of post-editing, which modifies a fully-built tree by considering information from constituents on upper levels, can improve the accuracy.
Abstract: Text-level discourse parsing remains a challenge. The current state-of-the-art overall accuracy in relation assignment is 55.73%, achieved by Joty et al. (2013). However, their model has a high order of time complexity, and thus cannot be applied in practice. In this work, we develop a much faster model whose time complexity is linear in the number of sentences. Our model adopts a greedy bottom-up approach, with two linear-chain CRFs applied in cascade as local classifiers. To enhance the accuracy of the pipeline, we add additional constraints in the Viterbi decoding of the first CRF. In addition to efficiency, our parser also significantly outperforms the state of the art. Moreover, our novel approach of post-editing, which modifies a fully-built tree by considering information from constituents on upper levels, can further improve the accuracy.

Journal ArticleDOI
TL;DR: Priority-Flood as discussed by the authors uses a priority queue to determine the next cell to be flooded, which is optimal for both integer and floating-point data, working in O (n) and O(nlog"2n) time, respectively.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new efficient algorithm to maintain the core number for every node in a dynamic graph, where only certain nodes need to update their core numbers given the graph is changed by inserting/deleting an edge.
Abstract: The k-core decomposition in a graph is a fundamental problem for social network analysis. The problem of k-core decomposition is to calculate the core number for every node in a graph. Previous studies mainly focus on k-core decomposition in a static graph. There exists a linear time algorithm for k-core decomposition in a static graph. However, in many real-world applications such as online social networks and the Internet, the graph typically evolves over time. Under such applications, a key issue is to maintain the core numbers of nodes given the graph changes over time. In this paper, we propose a new efficient algorithm to maintain the core number for every node in a dynamic graph. Our main result is that only certain nodes need to update their core numbers given the graph is changed by inserting/deleting an edge. We devise an efficient algorithm to identify and recompute the core numbers of such nodes. The complexity of our algorithm is independent of the graph size. In addition, to further accelerate the algorithm, we develop two pruning strategies by exploiting the lower and upper bounds of the core number. Finally, we conduct extensive experiments over both real-world and synthetic datasets, and the results demonstrate the efficiency of the proposed algorithm.

Posted Content
TL;DR: In this article, the authors proposed Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix, which enables the use of Fast Fourier Transformation to speed up the computation.
Abstract: Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d\log{d})$, and the space complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$ where $d$ is the input dimensionality. We also propose a novel time-frequency alternating optimization to learn data-dependent circulant projections, which alternatively minimizes the objective in original and Fourier domains. We show by extensive experiments that the proposed approach gives much better performance than the state-of-the-art approaches for fixed time, and provides much faster computation with no performance degradation for fixed number of bits.

Journal ArticleDOI
TL;DR: Many of the results also rule out the existence of compression algorithms, a notion similar to kernelization defined by Harnik and Naor [2007], for the problems in question.
Abstract: In parameterized complexity, each problem instance comes with a parameter k, and a parameterized problem is said to admit a polynomial kernel if there are polynomial time preprocessing rules that reduce the input instance to an instance with size polynomial in k. Many problems have been shown to admit polynomial kernels, but it is only recently that a framework for showing the nonexistence of polynomial kernels for specific problems has been developed by Bodlaender et al. [2009] and Fortnow and Santhanam [2008]. With few exceptions, all known kernelization lower bounds results have been obtained by directly applying this framework. In this article, we show how to combine these results with combinatorial reductions that use colors and IDs in order to prove kernelization lower bounds for a variety of basic problems. To follow we give a summary of our main results. All results are under the assumption that the polynomial hierarchy does not collapse to the third level.—We show that the Steiner Tree problem parameterized by the number of terminals and solution size k, and the Connected Vertex Cover and Capacitated Vertex Cover problems do not admit a polynomial kernel. The two latter results are surprising because the closely related Vertex Cover problem admits a kernel with at most 2k vertices.—Alon and Gutner [2008] obtain a kpoly(h) kernel for Dominating Set in H-Minor Free Graphs parameterized by h = vHv and solution size k, and ask whether kernels of smaller size exist. We partially resolve this question by showing that Dominating Set in H-Minor Free Graphs does not admit a kernel with size polynomial in k p h.—Harnik and Naor [2007] obtain a “compression algorithm” for the Sparse Subset Sum problem. We show that their algorithm is essentially optimal by showing that the instances cannot be compressed further.—The Hitting Set and Set Cover problems are among the most-studied problems in algorithmics. Both problems admit a kernel of size kO(d) when parameterized by solution size k and maximum set size d. We show that neither of them, along with the Unique Coverage and Bounded Rank Disjoint Sets problems, admits a polynomial kernel.The existence of polynomial kernels for several of the problems mentioned previously was an open problem explicitly stated in the literature [Alon and Gutner 2008; Betzler 2006; Guo and Niedermeier 2007; Guo et al. 2007; Moser et al. 2007]. Many of our results also rule out the existence of compression algorithms, a notion similar to kernelization defined by Harnik and Naor [2007], for the problems in question.

Proceedings ArticleDOI
31 May 2014
TL;DR: The time complexity of approximating weighted (undirected) shortest paths on distributed networks with a O (log n) bandwidth restriction on edges is studied to find a sublinear-time algorithm with almost optimal solution.
Abstract: A distributed network is modeled by a graph having n nodes (processors) and diameter D. We study the time complexity of approximating weighted (undirected) shortest paths on distributed networks with a O (log n) bandwidth restriction on edges (the standard synchronous CONGEST model). The question whether approximation algorithms help speed up the shortest paths and distance computation (more precisely distance computation) was raised since at least 2004 by Elkin (SIGACT News 2004). The unweighted case of this problem is well-understood while its weighted counterpart is fundamental problem in the area of distributed approximation algorithms and remains widely open. We present new algorithms for computing both single-source shortest paths (SSSP) and all-pairs shortest paths (APSP) in the weighted case. Our main result is an algorithm for SSSP. Previous results are the classic O(n)-time Bellman-Ford algorithm and an O(n1/2+1/2k + D)-time (8k⌈log(k + 1)⌉ --1)-approximation algorithm, for any integer k ≥ 1, which follows from the result of Lenzen and Patt-Shamir (STOC 2013). (Note that Lenzen and Patt-Shamir in fact solve a harder problem, and we use O(·) to hide the O(poly log n) term.) We present an O (n1/2D1/4 + D)-time (1 + o(1))-approximation algorithm for SSSP. This algorithm is sublinear-time as long as D is sublinear, thus yielding a sublinear-time algorithm with almost optimal solution. When D is small, our running time matches the lower bound of Ω(n1/2 + D) by Das Sarma et al. (SICOMP 2012), which holds even when D=Θ(log n), up to a poly log n factor. As a by-product of our technique, we obtain a simple O (n)-time (1+ o(1))-approximation algorithm for APSP, improving the previous O(n)-time O(1)-approximation algorithm following from the results of Lenzen and Patt-Shamir. We also prove a matching lower bound. Our techniques also yield an O(n1/2) time algorithm on fully-connected networks, which guarantees an exact solution for SSSP and a (2+ o(1))-approximate solution for APSP. All our algorithms rely on two new simple tools: light-weight algorithm for bounded-hop SSSP and shortest-path diameter reduction via shortcuts. These tools might be of an independent interest and useful in designing other distributed algorithms.

Journal ArticleDOI
TL;DR: It is argued that combining previously known preprocessing rules with the most straightforward branching algorithm yields an O*(2.618k) algorithm for the problem, and a kernel is obtained for the standard parameterization of Vertex Cover with at most 2k − clog k vertices, simpler than previously known kernels achieving the same size bound.
Abstract: We investigate the parameterized complexity of Vertex Cover parameterized by the difference between the size of the optimal solution and the value of the linear programming (LP) relaxation of the problem. By carefully analyzing the change in the LP value in the branching steps, we argue that combining previously known preprocessing rules with the most straightforward branching algorithm yields an Oa(2.618k) algorithm for the problem. Here, k is the excess of the vertex cover size over the LP optimum, and we write Oa(f(k)) for a time complexity of the form O(f(k)nO(1)). We proceed to show that a more sophisticated branching algorithm achieves a running time of Oa(2.3146k). Following this, using previously known as well as new reductions, we give Oa(2.3146k) algorithms for the parameterized versions of Above Guarantee Vertex Cover, Odd Cycle Transversal, Split Vertex Deletion, and Almost 2-SAT, and Oa(1.5214k) algorithms for Konig Vertex Deletion and Vertex Cover parameterized by the size of the smallest odd cycle transversal and Konig vertex deletion set. These algorithms significantly improve the best known bounds for these problems. The most notable improvement among these is the new bound for Odd Cycle Transversal—this is the first algorithm that improves on the dependence on k of the seminal Oa(3k) algorithm of Reed, Smith, and Vetta. Finally, using our algorithm, we obtain a kernel for the standard parameterization of Vertex Cover with at most 2k − clog k vertices. Our kernel is simpler than previously known kernels achieving the same size bound.

Posted Content
TL;DR: A very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon is considered.
Abstract: In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well studied problems/algorithms such as the Blackwell approachability problem, online convex optimization, and the Frank-Wolfe technique for convex optimization. We give examples of several concrete applications, where this more general model of bandits allows for richer and/or more efficient formulations of the problem.