scispace - formally typeset
Search or ask a question

Showing papers on "Time complexity published in 2019"


Journal ArticleDOI
TL;DR: This paper proposes a parallel diffusion method that ensures the parallelism of diffusion to the utmost extent and achieves a qualitative improvement in efficiency over traditional streaming diffusion methods.

301 citations


Proceedings ArticleDOI
30 Jan 2019
TL;DR: This work proposes a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance, and suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search.
Abstract: Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into an embedding vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to supplement the graph-level embeddings with fine-grained node-level information. Our model achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. Our study suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search.

203 citations


Posted Content
TL;DR: The computational complexity of the dynamic convex hull problem in the planar case is determined and a lower bound on the amortized asymptotic time complexity is given that matches the performance of this data structure.
Abstract: In this article, we determine the amortized computational complexity of the planar dynamic convex hull problem by querying. We present a data structure that maintains a set of n points in the plane under the insertion and deletion of points in amortized O(log n) time per operation. The space usage of the data structure is O(n). The data structure supports extreme point queries in a given direction, tangent queries through a given point, and queries for the neighboring points on the convex hull in O(log n) time. The extreme point queries can be used to decide whether or not a given line intersects the convex hull, and the tangent queries to determine whether a given point is inside the convex hull. We give a lower bound on the amortized asymptotic time complexity that matches the performance of this data structure.

184 citations


Journal ArticleDOI
TL;DR: Simulation and performance analysis verify that the new 2D-SLIM modulation map based on the improved two-dimensional closed-loop modulation coupling model has acceptable compression, high security and low time complexity.

171 citations


Journal ArticleDOI
01 Jan 2019
TL;DR: Wang et al. as discussed by the authors proposed a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time).
Abstract: Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial scenario of Taobao (Alibaba Group) and has been integrated into their billion-scale search engine.

156 citations


Proceedings ArticleDOI
23 Jun 2019
TL;DR: A framework for reducing the security of protocols based on the learning with errors (LWE) problem to qualitatively simpler and weaker computational hardness assumptions is presented.
Abstract: We give new instantiations of the Fiat-Shamir transform using explicit, efficiently computable hash functions. We improve over prior work by reducing the security of these protocols to qualitatively simpler and weaker computational hardness assumptions. As a consequence of our framework, we obtain the following concrete results. 1) There exists a succinct publicly verifiable non-interactive argument system for log-space uniform computations, under the assumption that any one of a broad class of fully homomorphic encryption (FHE) schemes has almost optimal security against polynomial-time adversaries. The class includes all FHE schemes in the literature that are based on the learning with errors (LWE) problem. 2) There exists a non-interactive zero-knowledge argument system for in the common reference string model, under either of the following two assumptions: (i) Almost optimal hardness of search-LWE against polynomial-time adversaries, or (ii) The existence of a circular-secure FHE scheme with a standard (polynomial time, negligible advantage) level of security. 3) The classic quadratic residuosity protocol of [Goldwasser, Micali, and Rackoff, SICOMP ’89] is not zero knowledge when repeated in parallel, under any of the hardness assumptions above.

142 citations


Journal ArticleDOI
TL;DR: Using statistical analyses, it is shown that this approach can protect the image against the statistical attacks and the entropy test results illustrate that the entropy values are close to the ideal, and hence, the proposed algorithm is secure against the entropy attacks.
Abstract: In this paper, a novel image encryption algorithm is proposed based on the combination of the chaos sequence and the modified AES algorithm. In this method, the encryption key is generated by Arnold chaos sequence. Then, the original image is encrypted using the modified AES algorithm and by implementing the round keys produced by the chaos system. The proposed approach not only reduces the time complexity of the algorithm but also adds the diffusion ability to the proposed algorithm, which make the encrypted images by the proposed algorithm resistant to the differential attacks. The key space of the proposed method is large enough to resist the brute-force attacks. This method is so sensitive to the initial values and input image so that the small changes in these values can lead to significant changes in the encrypted image. Using statistical analyses, we show that this approach can protect the image against the statistical attacks. The entropy test results illustrate that the entropy values are close to the ideal, and hence, the proposed algorithm is secure against the entropy attacks. The simulation results clarify that the small changes in the original image and key result in the significant changes in the encrypted image and the original image cannot be accessed.

124 citations


Proceedings ArticleDOI
23 Jun 2019
TL;DR: A non-elementary lower bound is established, i.e. that the reachability problem needs a tower of exponentials of time and space, which implies that a plethora of problems from formal languages, logic, concurrent systems, process calculi and other areas, that are known to admit reductions from the Petri nets reachable problem, are also not elementary.
Abstract: Petri nets, also known as vector addition systems, are a long established model of concurrency with extensive applications in modelling and analysis of hardware, software and database systems, as well as chemical, biological and business processes. The central algorithmic problem for Petri nets is reachability: whether from the given initial configuration there exists a sequence of valid execution steps that reaches the given final configuration. The complexity of the problem has remained unsettled since the 1960s, and it is one of the most prominent open questions in the theory of verification. Decidability was proved by Mayr in his seminal STOC 1981 work, and the currently best published upper bound is non-primitive recursive Ackermannian of Leroux and Schmitz from LICS 2019. We establish a non-elementary lower bound, i.e. that the reachability problem needs a tower of exponentials of time and space. Until this work, the best lower bound has been exponential space, due to Lipton in 1976. The new lower bound is a major breakthrough for several reasons. Firstly, it shows that the reachability problem is much harder than the coverability (i.e., state reachability) problem, which is also ubiquitous but has been known to be complete for exponential space since the late 1970s. Secondly, it implies that a plethora of problems from formal languages, logic, concurrent systems, process calculi and other areas, that are known to admit reductions from the Petri nets reachability problem, are also not elementary. Thirdly, it makes obsolete the currently best lower bounds for the reachability problems for two key extensions of Petri nets: with branching and with a pushdown stack. At the heart of our proof is a novel gadget so called the factorial amplifier that, assuming availability of counters that are zero testable and bounded by k, guarantees to produce arbitrarily large pairs of values whose ratio is exactly the factorial of k. We also develop a novel construction that uses arbitrarily large pairs of values with ratio R to provide zero testable counters that are bounded by R. Repeatedly composing the factorial amplifier with itself by means of the construction then enables us to compute in linear time Petri nets that simulate Minsky machines whose counters are bounded by a tower of exponentials, which yields the non-elementary lower bound. By refining this scheme further, we in fact establish hardness for h-exponential space already for Petri nets with h + 13 counters.

119 citations


Journal ArticleDOI
01 Jul 2019
TL;DR: This paper defines the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability, and develops an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity.
Abstract: Given a data set D containing millions of data points and a data consumer who is willing to pay for $X to train a machine learning (ML) model over D, how should we distribute this $X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2N) model evaluations for exact computation and O(N log N) for (ϵ, δ)-approximation.In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N log N) time - an exponential improvement on computational complexity! Moreover, for (ϵ, δ)-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity O(Nh(ϵ, K) log N) when ϵ is not too small and K is not too large. We empirically evaluate our algorithms on up to 10 million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further.We then extend our algorithm to other scenarios such as (1) weighed KNN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation. Some of these extensions, although also being improved exponentially, are less practical for exact computation (e.g., O(NK) complexity for weigthed KNN). We thus propose an Monte Carlo approximation algorithm, which is O(N(log N)2/(log K)2) times more efficient than the baseline approximation algorithm.

115 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that most of the results relating submodularity and convexity for set-functions can be extended to all submodular functions and provide a new interpretation of existing results for set functions.
Abstract: Submodular set-functions have many applications in combinatorial optimization, as they can be minimized and approximately maximized in polynomial time. A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization. Submodularity goes beyond set-functions and has naturally been considered for problems with multiple labels or for functions defined on continuous domains, where it corresponds essentially to cross second-derivatives being nonpositive. In this paper, we show that most results relating submodularity and convexity for set-functions can be extended to all submodular functions. In particular, (a) we naturally define a continuous extension in a set of probability measures, (b) show that the extension is convex if and only if the original function is submodular, (c) prove that the problem of minimizing a submodular function is equivalent to a typically non-smooth convex optimization problem, and (d) propose another convex optimization problem with better computational properties (e.g., a smooth dual problem). Most of these extensions from the set-function situation are obtained by drawing links with the theory of multi-marginal optimal transport, which provides also a new interpretation of existing results for set-functions. We then provide practical algorithms to minimize generic submodular functions on discrete domains, with associated convergence rates.

114 citations


Journal ArticleDOI
TL;DR: New criteria for globally uniformly exponential stability and globally uniformly asymptotic stability of the corresponding closed-loop system are derived and an algorithm is presented to solve the gain synthesis problem.

Posted Content
TL;DR: This work introduces a systematic technique for minimizing requisite state preparations by exploiting the simultaneous measurability of partitions of commuting Pauli strings, and encompasses algorithms for efficiently approximating a MIN-COMMUTING-PARTITION, as well as a synthesis tool for compiling simultaneous measurement circuits.
Abstract: Variational quantum eigensolver (VQE) is a promising algorithm suitable for near-term quantum machines. VQE aims to approximate the lowest eigenvalue of an exponentially sized matrix in polynomial time. It minimizes quantum resource requirements both by co-processing with a classical processor and by structuring computation into many subproblems. Each quantum subproblem involves a separate state preparation terminated by the measurement of one Pauli string. However, the number of such Pauli strings scales as $N^4$ for typical problems of interest--a daunting growth rate that poses a serious limitation for emerging applications such as quantum computational chemistry. We introduce a systematic technique for minimizing requisite state preparations by exploiting the simultaneous measurability of partitions of commuting Pauli strings. Our work encompasses algorithms for efficiently approximating a MIN-COMMUTING-PARTITION, as well as a synthesis tool for compiling simultaneous measurement circuits. For representative problems, we achieve 8-30x reductions in state preparations, with minimal overhead in measurement circuit cost. We demonstrate experimental validation of our techniques by estimating the ground state energy of deuteron on an IBM Q 20-qubit machine. We also investigate the underlying statistics of simultaneous measurement and devise an adaptive strategy for mitigating harmful covariance terms.

Proceedings ArticleDOI
06 Jan 2019
TL;DR: In this paper, the authors studied the problem of high-dimensional linear regression in a robust model where an e-fraction of the samples can be adversarially corrupted and gave a sample near-optimal and computationally efficient algorithm that draws O(d/e2) labeled examples and outputs a candidate hypothesis vector that approximates the unknown regression vector β within e2-norm O(e log(1/e)σ, where σ is the standard deviation of the random observation noise.
Abstract: We study the prototypical problem of high-dimensional linear regression in a robust model where an e-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution N(0, Σ) on Rd. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically our main contributions are as follows:• For the case that the covariance matrix is known to be the identity we give a sample near-optimal and computationally efficient algorithm that draws O(d/e2) labeled examples and outputs a candidate hypothesis vector [MATH HERE] that approximates the unknown regression vector β within e2-norm O(e log(1/e)σ), where σ is the standard deviation of the random observation noise. An error of Ω(eσ) is information-theoretically necessary even with infinite sample size. Hence, the error guarantee of our algorithm is optimal, up to a logarithmic factor in 1/e. Prior work gave an algorithm for this problem with sample complexity [MATH HERE] whose error guarantee scales with the e2-norm of β.• For the case of unknown covariance Σ, we show that we can efficiently achieve the same error guarantee of O(e log(1/e)σ), as in the known covariance case, using an additional O(d2 / e2) unlabeled examples. On the other hand, an error of O(eσ) can be information-theoretically attained with O(d/e2) samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity O(d2−c), where c > 0 is an arbitrarily small constant, must incur an error of [MATH HERE].

Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work proposes a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image- text instance.
Abstract: A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity.

Proceedings ArticleDOI
06 Jan 2019
TL;DR: This work gives the first nearly-linear time algorithms for high-dimensional robust mean estimation on distributions with known covariance and sub-gaussian tails and unknown bounded covariance, and exploits the special structure of the corresponding SDPs to show that they are approximately solvable in nearly- linear time.
Abstract: We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions.In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and subgaussian tails, and (ii) unknown bounded covariance. Given N samples on Rd, an e-fraction of which may be arbitrarily corrupted, our algorithms run in time O(Nd)/poly(e) and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times [MATH HERE], for e = Ω(1).Our algorithms rely on a natural family of SDPs parameterized by our current guess v for the unknown mean μ*. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for μ* --- independent of our current guess v --- or a near-optimal solution to the dual SDP yields a new guess v' whose distance from μ* is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearly-linear time. Our approach is quite general, and we believe it can also be applied to obtain nearly-linear time algorithms for other high-dimensional robust learning problems.

Journal ArticleDOI
TL;DR: A novel approach is proposed that prunes the data points by removing the ones that have extremely small chance of becoming support vectors that can serve in reducing the grid-search time for the standard SVM and for the approximate methods that search heuristically for a small range of the kernel parameters first.

Proceedings ArticleDOI
06 Jan 2019
TL;DR: A remarkably simple meta-algorithm for the (∆ + 1) coloring problem: Sample O(log n) colors for each vertex independently and uniformly at random from the ∆+ 1 colors; find a proper coloring of the graph using only the sampled colors of each vertex.
Abstract: Any graph with maximum degree Δ admits a proper vertex coloring with Δ + 1 colors that can be found via a simple sequential greedy algorithm in linear time and space. But can one find such a coloring via a sublinear algorithm?We answer this fundamental question in the affirmative for several canonical classes of sublinear algorithms including graph streaming, sublinear time, and massively parallel computation (MPC) algorithms. In particular, we design:• A single-pass semi-streaming algorithm in dynamic streams using O(n) space. The only known semi-streaming algorithm prior to our work was a folklore O(log n)-pass algorithm obtained by simulating classical distributed algorithms in the streaming model.• A sublinear-time algorithm in the standard query model that allows neighbor queries and pair queries using [MATH HERE] time. We further show that any algorithm that outputs a valid coloring with sufficiently large constant probability requires [MATH HERE] time. No non-trivial sublinear time algorithms were known prior to our work.• A parallel algorithm in the massively parallel computation (MPC) model using O(n) memory per machine and O(1) MPC rounds. Our number of rounds significantly improves upon the recent O(log log Δ · log* (n))-round algorithm of Parter [ICALP 2018].At the core of our results is a remarkably simple meta-algorithm for the (Δ + 1) coloring problem: Sample O(log n) colors for each vertex independently and uniformly at random from the Δ + 1 colors; find a proper coloring of the graph using only the sampled colors of each vertex. As our main result, we prove that the sampled set of colors with high probability contains a proper coloring of the input graph. The sublinear algorithms are then obtained by designing efficient algorithms for finding a proper coloring of the graph from the sampled colors in each model.We note that all our upper bound results for (Δ + 1) coloring are either optimal or close to best possible in each model studied. We also establish new lower bounds that rule out the possibility of achieving similar results in these models for the closely related problems of maximal independent set and maximal matching. Collectively, our results highlight a sharp contrast between the complexity of (Δ+1) coloring vs maximal independent set and maximal matching in various models of sublinear computation even though all three problems are solvable by a simple greedy algorithm in the classical setting.

Journal ArticleDOI
TL;DR: An attack-resilient, provably correct state estimation algorithm is developed that admits a fully distributed implementation and a notion of `strong-robustness' that captures both measurement and communication redundancy is introduced.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: The main objective of this paper includes the enactment of the recommended Particle Swarm Optimization (PSO) density algorithm of this era for MC-CDMA framework as far as space and time.
Abstract: A layout for high data rate self-actualization is proposed that is the solution study regions of the upcoming time’s frameworks. The most grounded structures for satisfying the demand for high data of upcoming time-period systems are the Multi-Carrier transmission designs like Multi Carrier Code Division Multiple Access and Orthogonal Frequency Division Multiple Access. In the frequency domain, MC-CDMA spreads every user. The main objective of this paper includes the enactment of the recommended Particle Swarm Optimization (PSO) density algorithm of this era for MC-CDMA framework as far as space and time.

Proceedings Article
10 Feb 2019
TL;DR: In this article, the authors proposed an approximate fairlet decomposition algorithm that runs in nearly linear time for the fair variant of the classic $k$-median problem, where the points are colored and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color.
Abstract: We study the fair variant of the classic $k$-median problem introduced by Chierichetti et al. [2017]. In the standard $k$-median problem, given an input pointset $P$, the goal is to find $k$ centers $C$ and assign each input point to one of the centers in $C$ such that the average distance of points to their cluster center is minimized. In the fair variant of $k$-median, the points are colored, and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color. Chierichetti et al. proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the $k$-median objective. In the second step, fairlets are merged into $k$ clusters by one of the existing $k$-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time. Our algorithm additionally allows for finer control over the balance of resulting clusters than the original work. We complement our theoretical bounds with empirical evaluation.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This paper design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids, which could be regarded as a well-performing baseline for domain adaptation tasks.
Abstract: Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.

Journal ArticleDOI
TL;DR: This work is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure, and leads to significantly more accurate predictions on the longest sequence families in that database, as well as improved accuracies for long-range base pairs.
Abstract: Motivation Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability and implementation Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100 000nt). Supplementary information Supplementary data are available at Bioinformatics online.

Proceedings Article
22 Jun 2019
TL;DR: TEASER as mentioned in this paper uses a Truncated Least Squares (TLS) cost that makes the estimation insensitive to a large fraction of spurious point-to-point correspondences.
Abstract: We propose a robust approach for the registration of two sets of 3D points in the presence of a large amount of outliers. Our first contribution is to reformulate the registration problem using a Truncated Least Squares (TLS) cost that makes the estimation insensitive to a large fraction of spurious point-to-point correspondences. The second contribution is a general framework to decouple rotation, translation, and scale estimation, which allows solving in cascade for the three transformations. Since each subproblem (scale, rotation, and translation estimation) is still non-convex and combinatorial in nature, out third contribution is to show that (i) TLS scale and (component-wise) translation estimation can be solved exactly and in polynomial time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed to a semidefinite program and the relaxation is tight in practice, even in the presence of an extreme amount of outliers. We validate the proposed algorithm, named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation), in standard registration benchmarks showing that the algorithm outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to 99% outliers and returns highly-accurate solutions.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: In this article, a message-passing algorithm was proposed to find the ground state of the Sherrington-Kirkpatrick model of spin glasses, which is a special case of the problem of maximizing the quadratic form associated to A over binary vectors.
Abstract: Let A be a symmetric random matrix with independent and identically distributed Gaussian entries above the diagonal. We consider the problem of maximizing the quadratic form associated to A over binary vectors. In the language of statistical physics, this amounts to finding the ground state of the Sherrington-Kirkpatrick model of spin glasses. The asymptotic value of this optimization problem was characterized by Parisi via a celebrated variational principle, subsequently proved by Talagrand. We give an algorithm that, for any e > 0, outputs a feasible solution whose value is at least (1 – e) of the optimum, with probability converging to one as the dimension n of the matrix diverges. The algorithm's time complexity is of order n^2. It is a message-passing algorithm, but the specific structure of its update rules is new. As a side result, we prove that, at (low) non-zero temperature, the algorithm constructs approximate solutions of the Thouless-Anderson-Palmer equations.

Journal ArticleDOI
TL;DR: A novel ranking algorithm is proposed which improves closeness centrality by taking advantage of local structure of nodes and aims to decrease the computational complexity of this method.

Journal ArticleDOI
TL;DR: Geometric inhomogeneous random graphs (GIRGs) as discussed by the authors are a generalization of hyperbolic random graphs, and they have small separators, i.e., it suffices to delete a sublinear number of edges to break the giant component into two large pieces.

Posted Content
TL;DR: This method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework, and outperforms state-of-the-art graph partitions and matching methods, achieving a trade-off between accuracy and efficiency.
Abstract: We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nodes and achieves graph matching. When one of the graphs has isolated but self-connected nodes ($i.e.$, a disconnected graph), the optimal transport indicates the clustering structure of the other graph and achieves graph partitioning. Using this concept, we extend our method to multi-graph partitioning and matching by learning a Gromov-Wasserstein barycenter graph for multiple observed graphs; the barycenter graph plays the role of the disconnected graph, and since it is learned, so is the clustering. Our method combines a recursive $K$-partition mechanism with a regularized proximal gradient algorithm, whose time complexity is $\mathcal{O}(K(E+V)\log_K V)$ for graphs with $V$ nodes and $E$ edges. To our knowledge, our method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework. It outperforms state-of-the-art graph partitioning and matching methods, achieving a trade-off between accuracy and efficiency.

Journal ArticleDOI
TL;DR: It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning 10-million-level nonlinearly-separable datasets on a PC with 64 GB memory.
Abstract: This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K-nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning ten-million-level nonlinearly-separable datasets on a PC with 64GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at this https URL.

Journal ArticleDOI
TL;DR: An optimal offline algorithm that leverages dynamic and linear programming techniques with the assumption of available exact knowledge of workload on objects is proposed and two online algorithms that make a trade-off between residential and migration costs and dynamically select storage classes across CSPs are proposed.
Abstract: Cloud Storage Providers (CSPs) offer geographically data stores providing several storage classes with different prices. An important problem facing by cloud users is how to exploit these storage classes to serve an application with a time-varying workload on its objects at minimum cost. This cost consists of residential cost (i.e., storage, Put and Get costs) and potential migration cost (i.e., network cost). To address this problem, we first propose the optimal offline algorithm that leverages dynamic and linear programming techniques with the assumption of available exact knowledge of workload on objects. Due to the high time complexity of this algorithm and its requirement for a priori knowledge, we propose two online algorithms that make a trade-off between residential and migration costs and dynamically select storage classes across CSPs. The first online algorithm is deterministic with no need of any knowledge of workload and incurs no more than $2\gamma -1$2γ-1 times of the minimum cost obtained by the optimal offline algorithm, where $\gamma$γ is the ratio of the residential cost in the most expensive data store to the cheapest one in either network or storage cost. The second online algorithm is randomized that leverages “Receding Horizon Control” (RHC) technique with the exploitation of available future workload information for $w$w time slots. This algorithm incurs at most $1+\frac{\gamma }{w}$1+γw times the optimal cost. The effectiveness of the proposed algorithms is demonstrated through simulations using a workload synthesized based on characteristics of the Facebook workload.

Journal ArticleDOI
TL;DR: An analog all-optical implementation of a coherent Ising machine (CIM) based on a network of injection-locked multicore fiber lasers using spatial light modulators (SLMs) to solve several Ising Hamiltonians.
Abstract: Combinatorial optimization problems over large and complex systems have many applications in social networks, image processing, artificial intelligence, computational biology and a variety of other areas. Finding the optimized solution for such problems in general are usually in non-deterministic polynomial time (NP)-hard complexity class. Some NP-hard problems can be easily mapped to minimizing an Ising energy function. Here, we present an analog all-optical implementation of a coherent Ising machine (CIM) based on a network of injection-locked multicore fiber (MCF) lasers. The Zeeman terms and the mutual couplings appearing in the Ising Hamiltonians are implemented using spatial light modulators (SLMs). As a proof-of-principle, we demonstrate the use of optics to solve several Ising Hamiltonians for up to thirteen nodes. Overall, the average accuracy of the CIM to find the ground state energy was ~90% for 120 trials. The fundamental bottlenecks for the scalability and programmability of the presented CIM are discussed as well. For specific computation problems where electronic digital processors have shortcomings in simulating, an analog optical system may be a solution, Here, the authors present an analog all-optical implementation of a coherent Ising machine based on a network of injection-locked multicore fiber lasers.