scispace - formally typeset
Search or ask a question

Showing papers on "Time complexity published in 2015"


Proceedings ArticleDOI
27 May 2015
TL;DR: The proposed influence maximization algorithm is a set of estimation techniques based on martingales, a classic statistical tool that provides the same worst-case guarantees as the state of the art, but offers significantly improved empirical efficiency.
Abstract: Given a social network G and a positive integer k, the influence maximization problem asks for k nodes (in G) whose adoptions of a certain idea or product can trigger the largest expected number of follow-up adoptions by the remaining nodes This problem has been extensively studied in the literature, and the state-of-the-art technique runs in O((k+l) (n+m) log n e2) expected time and returns a (1-1 e-e)-approximate solution with at least 1 - 1/n l probability This paper presents an influence maximization algorithm that provides the same worst-case guarantees as the state of the art, but offers significantly improved empirical efficiency The core of our algorithm is a set of estimation techniques based on martingales, a classic statistical tool Those techniques not only provide accurate results with small computation overheads, but also enable our algorithm to support a larger class of information diffusion models than existing methods do We experimentally evaluate our algorithm against the states of the art under several popular diffusion models, using real social networks with up to 14 billion edges Our experimental results show that the proposed algorithm consistently outperforms the states of the art in terms of computation efficiency, and is often orders of magnitude faster

682 citations


Journal ArticleDOI
TL;DR: In this article, the performance of spectral clustering for community extraction in stochastic block models is analyzed and a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality, is established.
Abstract: We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.

452 citations


Journal ArticleDOI
TL;DR: The focus is on the original interactive proof model where no assumptions are made on the computational power or adaptiveness of dishonest provers, and an open question regarding the expressive power of proof systems with such verifiers is settled.
Abstract: In this work we study interactive proofs for tractable languages. The (honest) prover should be efficient and run in polynomial time or, in other words, a “muggle”.1 The verifier should be super-efficient and run in nearly linear time. These proof systems can be used for delegating computation: a server can run a computation for a client and interactively prove the correctness of the result. The client can verify the result’s correctness in nearly linear time (instead of running the entire computation itself). Previously, related questions were considered in the holographic proof setting by Babai et al. [1991b] in the argument setting under computational assumptions by Kilian, and in the random oracle model by Micali [1994]. Our focus, however, is on the original interactive proof model where no assumptions are made on the computational power or adaptiveness of dishonest provers. Our main technical theorem gives a public coin interactive proof for any language computable by a log-space uniform boolean circuit with depth d and input length n. The verifier runs in time n · poly(d, log(n)) and space O(log(n)), the communication complexity is poly(d, log(n)), and the prover runs in time poly(n). In particular, for languages computable by log-space uniform NC (circuits of polylog(n) depth), the prover is efficient, the verifier runs in time n · polylog(n) and space O(log(n)), and the communication complexity is polylog(n). Using this theorem we make progress on several questions. --- We show how to construct 1-round computationally sound arguments with polylog communication for any log-space uniform NC computation. The verifier runs in quasi-linear time. This result uses a recent transformation of Kalai and Raz from public coin interactive proofs to 1-round arguments. The soundness of the argument system is based on the existence of a PIR scheme with polylog communication. --- We construct interactive proofs with public coin, log-space, poly-time verifiers for all of P are given. This settles an open question regarding the expressive power of proof systems with such verifiers. --- We construct zero-knowledge interactive proofs are given with communication complexity quasi-linear in the witness length for any NP language verifiable in NC, based on the existence of 1-way functions. --- We construct probabilistically checkable arguments (a model due to Kalai and Raz) of size polynomial in the witness length (rather than instance length) for any NP language verifiable in NC, under computational assumptions, are provided.

306 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: This work explores the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection, which substantially reduces memory footprint and enables the use of the Fast Fourier Transform to speed up the computation.
Abstract: We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. The circulant structure substantially reduces memory footprint and enables the use of the Fast Fourier Transform to speed up the computation. Considering a fully-connected neural network layer with d input nodes, and d output nodes, this method improves the time complexity from O(d2) to O(dlogd) and space complexity from O(d2) to O(d). The space savings are particularly important for modern deep convolutional neural network architectures, where fully-connected layers typically contain more than 90% of the network parameters. We further show that the gradient computation and optimization of the circulant projections can be performed very efficiently. Our experiments on three standard datasets show that the proposed approach achieves this significant gain in storage and efficiency with minimal increase in error rate compared to neural networks with unstructured projections.

299 citations


Journal ArticleDOI
TL;DR: Compared to the local execution and the remote execution, the collaborative task execution can significantly save the energy consumption on the mobile device, prolonging its battery life and applying the LARAC algorithm to solving the optimization problem approximately, which has lower complexity than the enumeration algorithm.
Abstract: This paper investigates collaborative task execution between a mobile device and a cloud clone for mobile applications under a stochastic wireless channel. A mobile application is modeled as a sequence of tasks that can be executed on the mobile device or on the cloud clone. We aim to minimize the energy consumption on the mobile device while meeting a time deadline, by strategically offloading tasks to the cloud. We formulate the collaborative task execution as a constrained shortest path problem. We derive a one-climb policy by characterizing the optimal solution and then propose an enumeration algorithm for the collaborative task execution in polynomial time. Further, we apply the LARAC algorithm to solving the optimization problem approximately, which has lower complexity than the enumeration algorithm. Simulation results show that the approximate solution of the LARAC algorithm is close to the optimal solution of the enumeration algorithm. In addition, we consider a probabilistic time deadline, which is transformed to hard deadline by Markov inequality. Moreover, compared to the local execution and the remote execution, the collaborative task execution can significantly save the energy consumption on the mobile device, prolonging its battery life.

248 citations


Proceedings ArticleDOI
14 Jun 2015
TL;DR: In this article, it was shown that there is no truly subcubic (O(n3-e) time algorithm for the online Boolean matrix-vector multiplication problem.
Abstract: Consider the following Online Boolean Matrix-Vector Multiplication problem: We are given an n x n matrix M and will receive n column-vectors of size n, denoted by v1, ..., vn, one by one. After seeing each vector vi, we have to output the product Mvi before we can see the next vector. A naive algorithm can solve this problem using O(n3) time in total, and its running time can be slightly improved to O(n3/log2 n) [Williams SODA'07]. We show that a conjecture that there is no truly subcubic (O(n3-e)) time algorithm for this problem can be used to exhibit the underlying polynomial time hardness shared by many dynamic problems. For a number of problems, such as subgraph connectivity, Pagh's problem, d-failure connectivity, decremental single-source shortest paths, and decremental transitive closure, this conjecture implies tight hardness results. Thus, proving or disproving this conjecture will be very interesting as it will either imply several tight unconditional lower bounds or break through a common barrier that blocks progress with these problems. This conjecture might also be considered as strong evidence against any further improvement for these problems since refuting it will imply a major breakthrough for combinatorial Boolean matrix multiplication and other long-standing problems if the term "combinatorial algorithms" is interpreted as "Strassen-like algorithms" [Ballard et al. SPAA'11].The conjecture also leads to hardness results for problems that were previously based on diverse problems and conjectures -- such as 3SUM, combinatorial Boolean matrix multiplication, triangle detection, and multiphase -- thus providing a uniform way to prove polynomial hardness results for dynamic algorithms; some of the new proofs are also simpler or even become trivial. The conjecture also leads to stronger and new, non-trivial, hardness results, e.g., for the fully-dynamic densest subgraph and diameter problems.

238 citations


Journal ArticleDOI
TL;DR: In this paper, the authors formulate the sensor selection problem as the design of a sparse vector, which in its original form is a nonconvex $\ell_{0}$ -(quasi) norm optimization problem.
Abstract: The problem of choosing the best subset of sensors that guarantees a certain estimation performance is referred to as sensor selection. In this paper, we focus on observations that are related to a general non-linear model. The proposed framework is valid as long as the observations are independent, and its likelihood satisfies the regularity conditions. We use several functions of the Cramer–Rao bound (CRB) as a performance measure. We formulate the sensor selection problem as the design of a sparse vector, which in its original form is a nonconvex $\ell_{0}$ -(quasi) norm optimization problem. We present relaxed sensor selection solvers that can be efficiently solved in polynomial time. The proposed solvers result in sparse sensing techniques. We also propose a projected subgradient algorithm that is attractive for large-scale problems. The developed theory is applied to sensor placement for localization.

220 citations


Proceedings ArticleDOI
18 May 2015
TL;DR: An interesting consequence of this work is that triangle counting, a well-studied computational problem in the context of social network analysis can be used to detect large near-cliques.
Abstract: Numerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, the poly-time solvable densest subgraph problem which maximizes the average degree over all possible subgraphs "lies at the core of large scale data mining" [10]. However, frequently the densest subgraph problem fails in detecting large near-cliques in networks. In this work, we introduce the k-clique densest subgraph problem, k ≥ 2. This generalizes the well studied densest subgraph problem which is obtained as a special case for k=2. For k=3 we obtain a novel formulation which we refer to as the triangle densest subgraph problem: given a graph G(V,E), find a subset of vertices S* such that τ(S*)=max limitsS ⊆ V t(S)/|S|, where t(S) is the number of triangles induced by the set S. On the theory side, we prove that for any k constant, there exist an exact polynomial time algorithm for the k-clique densest subgraph problem}. Furthermore, we propose an efficient 1/k-approximation algorithm which generalizes the greedy peeling algorithm of Asahiro and Charikar [8,18] for k=2. Finally, we show how to implement efficiently this peeling framework on MapReduce for any k ≥ 3, generalizing the work of Bahmani, Kumar and Vassilvitskii for the case k=2 [10]. On the empirical side, our two main findings are that (i) the triangle densest subgraph is consistently closer to being a large near-clique compared to the densest subgraph and (ii) the peeling approximation algorithms for both k=2 and k=3 achieve on real-world networks approximation ratios closer to 1 rather than the pessimistic 1/k guarantee. An interesting consequence of our work is that triangle counting, a well-studied computational problem in the context of social network analysis can be used to detect large near-cliques. Finally, we evaluate our proposed method on a popular graph mining application.

218 citations


Journal ArticleDOI
TL;DR: The research has proved that the complexity of SVM (LibSVM) is O(n3) and the time complexity shown that C++ faster than Java, both in training and testing, beside that the data growth will be affect and increase the time of computation.
Abstract: Support Vector Machines (SVM) is one of machine learning methods that can be used to perform classification task. Many researchers using SVM library to accelerate their research development. Using such a library will save their time and avoid to write codes from scratch. LibSVM is one of SVM library that has been widely used by researchers to solve their problems. The library also integrated to WEKA, one of popular Data Mining tools. This article contain results of our work related to complexity analysis of Support Vector Machines. Our work has focus on SVM algorithm and its implementation in LibSVM. We also using two popular programming languages i.e C++ and Java with three different dataset to test our analysis and experiment. The results of our research has proved that the complexity of SVM (LibSVM) is O(n3) and the time complexity shown that C++ faster than Java, both in training and testing, beside that the data growth will be affect and increase the time of computation.

201 citations


Proceedings ArticleDOI
27 May 2015
TL;DR: It is proved that for d ≥ 3, the DBSCAN problem requires Ω(n4/3) time to solve, unless very significant breakthroughs---ones widely believed to be impossible---could be made in theoretical computer science, and the running time can be dramatically brought down to O(n) in expectation regardless of the dimensionality d.
Abstract: DBSCAN is a popular method for clustering multi-dimensional objects. Just as notable as the method's vast success is the research community's quest for its efficient computation. The original KDD'96 paper claimed an algorithm with O(n log n) running time, where n is the number of objects. Unfortunately, this is a mis-claim; and that algorithm actually requires O(n2) time. There has been a fix in 2D space, where a genuine O(n log n)-time algorithm has been found. Looking for a fix for dimensionality d ≥ 3 is currently an important open problem. In this paper, we prove that for d ≥ 3, the DBSCAN problem requires Ω(n4/3) time to solve, unless very significant breakthroughs---ones widely believed to be impossible---could be made in theoretical computer science. This (i) explains why the community's search for fixing the aforementioned mis-claim has been futile for d ≥ 3, and (ii) indicates (sadly) that all DBSCAN algorithms must be intolerably slow even on moderately large n in practice. Surprisingly, we show that the running time can be dramatically brought down to O(n) in expectation regardless of the dimensionality d, as soon as slight inaccuracy in the clustering results is permitted. We formalize our findings into the new notion of ρ-approximate DBSCAN, which we believe should replace DBSCAN on big data due to the latter's computational intractability.

196 citations


Proceedings ArticleDOI
17 Oct 2015
TL;DR: In this article, it was shown that these measures do not have strongly sub quadratic time algorithms, i.e., no algorithm with running time O(n 2 ) for any a#x03B5; > 0, unless the Strong Exponential Time Hypothesis fails.
Abstract: Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., The classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple O(n2) dynamic programming algorithms, and despite much effort no algorithms with significantly better running time are known. We prove that, even restricted to binary strings or one-dimensional curves, respectively, these measures do not have strongly sub quadratic time algorithms, i.e., No algorithms with running time O(n2 -- a#x03B5;) for any a#x03B5; > 0, unless the Strong Exponential Time Hypothesis fails. We generalize the result to edit distance for arbitrary fixed costs of the four operations (deletion in one of the two strings, matching, substitution), by identifying trivial cases that can be solved in constant time, and proving quadratic-time hardness on binary strings for all other cost choices. This improves and generalizes the known hardness result for Levenshtein distance [Backurs, Indyk STOC'15] by the restriction to binary strings and the generalization to arbitrary costs, and adds important problems to a recent line of research showing conditional lower bounds for a growing number of quadratic time problems. As our main technical contribution, we introduce a framework for proving quadratic-time hardness of similarity measures. To apply the framework it suffices to construct a single gadget, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability. Finally, we prove quadratic-time hardness for longest palindromic subsequence and longest tandem subsequence via reductions from longest common subsequence, showing that conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.

Journal ArticleDOI
TL;DR: In this article, the authors presented the first provably accurate feature selection method for $k$ -means clustering and in addition, they presented two feature extraction methods for clustering.
Abstract: We study the topic of dimensionality reduction for $k$ -means clustering. Dimensionality reduction encompasses the union of two approaches: 1) feature selection and 2) feature extraction. A feature selection-based algorithm for $k$ -means clustering selects a small subset of the input features and then applies $k$ -means clustering on the selected features. A feature extraction-based algorithm for $k$ -means clustering constructs a small set of new artificial features and then applies $k$ -means clustering on the constructed features. Despite the significance of $k$ -means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for $k$ -means clustering are not known. On the other hand, two provably accurate feature extraction methods for $k$ -means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress toward a better understanding of dimensionality reduction for $k$ -means clustering. Namely, we present the first provably accurate feature selection method for $k$ -means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal $k$ -means objective value.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A novel model is presented for the task of joint mention extraction and classification that is able to effectively capture overlapping mentions with unbounded lengths and can be extended to additionally capture mention heads explicitly in a joint manner under the same time complexity.
Abstract: We present a novel model for the task of joint mention extraction and classification. Unlike existing approaches, our model is able to effectively capture overlapping mentions with unbounded lengths. The model is highly scalable, with a time complexity that is linear in the number of words in the input sentence and linear in the number of possible mention classes. Our model can be extended to additionally capture mention heads explicitly in a joint manner under the same time complexity. We demonstrate the effectiveness of our model through extensive experiments on standard datasets.

Proceedings ArticleDOI
14 Jun 2015
TL;DR: In this paper, the Sum of Squares (SQS) algorithm was used to solve the problem of dictionary learning for tensor decomposition in the constant spectral-norm noise regime.
Abstract: We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown n x m matrix A (for m ≥ n) from examples of the form [y = Ax + e,] where x is a random vector in Rm with at most τ m nonzero coordinates, and e is a random noise vector in Rn with bounded magnitude For the case m=O(n), our algorithm recovers every column of A within arbitrarily good constant accuracy in time mO(log m/log(τ-1)), in particular achieving polynomial time if τ = m-δ for any δ>0, and time mO(log m) if τ is (a sufficiently small) constant Prior algorithms with comparable assumptions on the distribution required the vector $x$ to be much sparser---at most √n nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser xWe achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor T, given access to a tensor T' that is τ-close to T in the spectral norm (when considered as a matrix) To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of T and T' have similar structuresOur algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems

Proceedings ArticleDOI
01 Jan 2015
TL;DR: Evidence is provided that a problem $A$ with a running time O(n^k) that has not been improved in decades, also requires n^{k-o(1)} time, thus explaining the lack of progress on the problem.
Abstract: Algorithmic research strives to develop fast algorithms for fundamental problems. Despite its many successes, however, many problems still do not have very efficient algorithms. For years researchers have explained the hardness for key problems by proving NP-hardness, utilizing polynomial time reductions to base the hardness of key problems on the famous conjecture P != NP. For problems that already have polynomial time algorithms, however, it does not seem that one can show any sort of hardness based on P != NP. Nevertheless, we would like to provide evidence that a problem $A$ with a running time O(n^k) that has not been improved in decades, also requires n^{k-o(1)} time, thus explaining the lack of progress on the problem. Such unconditional time lower bounds seem very difficult to obtain, unfortunately. Recent work has concentrated on an approach mimicking NP-hardness: (1) select a few key problems that are conjectured to require T(n) time to solve, (2) use special, fine-grained reductions to prove time lower bounds for many diverse problems in P based on the conjectured hardness of the key problems. In this abstract we outline the approach, give some examples of hardness results based on the Strong Exponential Time Hypothesis, and present an overview of some of the recent work on the topic.

Journal ArticleDOI
TL;DR: A framework for pricing data on the Internet that allows the price of any query to be derived automatically, and proves that pricing all other queries is NP-complete, thus establishing a dichotomy on the complexity of the pricing problem when all views are selection queries.
Abstract: Data is increasingly being bought and sold online, and Web-based marketplace services have emerged to facilitate these activities. However, current mechanisms for pricing data are very simple: buyers can choose only from a set of explicit views, each with a specific price. In this article, we propose a framework for pricing data on the Internet that, given the price of a few views, allows the price of any query to be derived automatically. We call this capability query-based pricing. We first identify two important properties that the pricing function must satisfy, the arbitrage-free and discount-free properties. Then, we prove that there exists a unique function that satisfies these properties and extends the seller's explicit prices to all queries. Central to our framework is the notion of query determinacy, and in particular instance-based determinacy: we present several results regarding the complexity and properties of it.When both the views and the query are unions of conjunctive queries or conjunctive queries, we show that the complexity of computing the price is high. To ensure tractability, we restrict the explicit prices to be defined only on selection views (which is the common practice today). We give algorithms with polynomial time data complexity for computing the price of two classes of queries: chain queries (by reducing the problem to network flow), and cyclic queries. Furthermore, we completely characterize the class of conjunctive queries without self-joins that have PTIME data complexity, and prove that pricing all other queries is NP-complete, thus establishing a dichotomy on the complexity of the pricing problem when all views are selection queries.

Proceedings ArticleDOI
24 Aug 2015
TL;DR: In this paper, the authors investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay.
Abstract: We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1 − 1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.

Journal ArticleDOI
TL;DR: It is analytically prove that the memory properties of UMMs endow them with universal computing power (they are Turing-complete), intrinsic parallelism, functional polymorphism, and information overhead, namely, their collective states can support exponential data compression directly in memory.
Abstract: We introduce the notion of universal memcomputing machines (UMMs): a class of brain-inspired general-purpose computing machines based on systems with memory, whereby processing and storing of information occur on the same physical location. We analytically prove that the memory properties of UMMs endow them with universal computing power (they are Turing-complete), intrinsic parallelism, functional polymorphism, and information overhead , namely, their collective states can support exponential data compression directly in memory. We also demonstrate that a UMM has the same computational power as a nondeterministic Turing machine, namely, it can solve nondeterministic polynomial (NP)-complete problems in polynomial time. However, by virtue of its information overhead, a UMM needs only an amount of memory cells (memprocessors) that grows polynomially with the problem size. As an example, we provide the polynomial-time solution of the subset-sum problem and a simple hardware implementation of the same. Even though these results do not prove the statement NP = P within the Turing paradigm, the practical realization of these UMMs would represent a paradigm shift from the present von Neumann architectures, bringing us closer to brain-like neural computation.

Journal ArticleDOI
TL;DR: This paper shows that for voters who follow the most central political-science model of electorates—single-peaked preferences—those protections vanish, and shows that NP-hard bribery problems—including those for Kemeny and Llull elections—fall to polynomial time.
Abstract: For many election systems, bribery (and related) attacks have been shown NP-hard using constructions on combinatorially rich structures such as partitions and covers. This paper shows that for voters who follow the most central political-science model of electorates-- single-peaked preferences--those hardness protections vanish. By using single-peaked preferences to simplify combinatorial covering challenges, we for the first time show that NP-hard bribery problems--including those for Kemeny and Llull elections--fall to polynomial time for single-peaked electorates. By using single-peaked preferences to simplify combinatorial partition challenges, we for the first time show that NP-hard partition-of-voters problems fall to polynomial time for single-peaked electorates. We show that for single-peaked electorates, the winner problems for Dodgson and Kemeny elections, though Θ2p-complete in the general case, fall to polynomial time. And we completely classify the complexity of weighted coalition manipulation for scoring protocols in single-peaked electorates.

Journal ArticleDOI
TL;DR: A polynomial time algorithm that provably finds the ground state of any 1D quantum system described by a gapped local Hamiltonian with constant ground-state energy is developed.
Abstract: The density matrix renormalization group method has been extensively used to study the ground state of 1D many-body systems since its introduction two decades ago. In spite of its wide use, this heuristic method is known to fail in certain cases and no certifiably correct implementation is known, leaving researchers faced with an ever-growing toolbox of heuristics, none of which is guaranteed to succeed. Here we develop a polynomial time algorithm that provably finds the ground state of any 1D quantum system described by a gapped local Hamiltonian with constant ground-state energy. The algorithm is based on a framework that combines recently discovered structural features of gapped 1D systems with an efficient construction of a class of operators called approximate ground-state projections (AGSPs). The combination of these tools yields a method that is guaranteed to succeed in all 1D gapped systems. An AGSP-centric approach may help guide the search for algorithms for more general quantum systems, including for the central challenge of 2D systems, where even heuristic methods have had more limited success.

Journal ArticleDOI
TL;DR: An improved analysis of the Simple Genetic Algorithm for the OneMax problem overcomes some limitations and presents a technique to bound the diversity of the population that does not require a bound on its bandwidth.

Proceedings ArticleDOI
24 Aug 2015
TL;DR: The concept of e-dominant dataset is defined, which is only a small data set and can represent the vast information carried by big sensory data with the information loss rate being less than e, where e can be arbitrarily small.
Abstract: The amount of sensory data manifests an explosive growth due to the increasing popularity of Wireless Sensor Networks. The scale of the sensory data in many applications has already exceeds several petabytes annually, which is beyond the computation and transmission capabilities of the conventional WSNs. On the other hand, the information carried by big sensory data has high redundancy because of strong correlation among sensory data. In this paper, we define the concept of e-dominant dataset, which is only a small data set and can represent the vast information carried by big sensory data with the information loss rate being less than e, where e can be arbitrarily small. We prove that drawing the minimum e-dominant dataset is polynomial time solvable and provide a centralized algorithm with 0(n3) time complexity. Furthermore, a distributed algorithm with constant complexity (O(l)) is also designed. It is shown that the result returned by the distributed algorithm can satisfy the e requirement with a near optimal size. Finally, the extensive real experiment results and simulation results are carried out. The results indicate that all the proposed algorithms have high performance in terms of accuracy and energy efficiency.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the tradeoff between statistical performance and computational cost from a complexity-theoretic perspective, considering a sequence of discretized models which are asymptotically equivalent to the Gaussian model.
Abstract: This paper studies the minimax detection of a small submatrix of elevated mean in a large matrix contaminated by additive Gaussian noise. To investigate the tradeoff between statistical performance and computational cost from a complexity-theoretic perspective, we consider a sequence of discretized models which are asymptotically equivalent to the Gaussian model. Under the hypothesis that the planted clique detection problem cannot be solved in randomized polynomial time when the clique size is of smaller order than the square root of the graph size, the following phase transition phenomenon is established: when the size of the large matrix $p\to\infty$, if the submatrix size $k=\Theta(p^{\alpha})$ for any $\alpha\in(0,{2}/{3})$, computational complexity constraints can incur a severe penalty on the statistical performance in the sense that any randomized polynomial-time test is minimax suboptimal by a polynomial factor in $p$; if $k=\Theta(p^{\alpha})$ for any $\alpha\in({2}/{3},1)$, minimax optimal detection can be attained within constant factors in linear time. Using Schatten norm loss as a representative example, we show that the hardness of attaining the minimax estimation rate can crucially depend on the loss function. Implications on the hardness of support recovery are also obtained.

Proceedings ArticleDOI
14 Jun 2015
TL;DR: This work constructs succinct randomized encodings where the time to encode a computation, given by a program Π and input x, is essentially independent of Π's time complexity, and only depends on its space complexity, as well as the size of its input, output, and description.
Abstract: A randomized encoding allows to express a "complex" computation, given by a function f and input x, by a "simple to compute" randomized representation f(x) whose distribution encodes f(x), while revealing nothing else regarding f and x. Existing randomized encodings, geared mostly to allow encoding with low parallel-complexity, have proven instrumental in various strong applications such as multiparty computation and parallel cryptography. This work focuses on another natural complexity measure: the time required to encode. We construct succinct randomized encodings where the time to encode a computation, given by a program Π and input x, is essentially independent of Π's time complexity, and only depends on its space complexity, as well as the size of its input, output, and description. The scheme guarantees computational privacy of (Π,x), and is based on indistinguishability obfuscation for a relatively simple circuit class, for which there exist instantiations based on polynomial hardness assumptions on multi-linear maps.We then invoke succinct randomized encodings to obtain several strong applications, including: Succinct indistinguishability obfuscation, where the obfuscated program IObf({Π}) computes the same function as Π for inputs x of apriori-bounded size. Obfuscating Π is roughly as fast as encoding the computation of Π on any such input x. Here we also require subexponentially-secure indistinguishability obfuscation for circuits. Succinct functional encryption, where a functional decryption key corresponding to Π allows decrypting Π(x) from encryptions of any plaintext x of apriori-bounded size. Key derivation is as fast as encoding the corresponding computation. Succinct reusable garbling, a stronger form of randomized encodings where any number of inputs x can be encoded separately of Π, independently of Π's time and space complexity. Publicly-verifiable 2-message delegation where verifying the result of a long computation given by Π and input x is as fast as encoding the corresponding computation. We also show how to transform any 2-message delegation scheme to an essentially non-interactive system where the verifier message is reusable.Previously, succinct randomized encodings or any of the above applications were only known based on various non-standard knowledge assumptions.At the heart of our techniques is a generic method of compressing a piecemeal garbled computation, without revealing anything about the secret randomness utilized for garbling.

Proceedings Article
06 Jul 2015
TL;DR: This paper considers the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros, and proposes a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones.
Abstract: In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem is an instance of PU (positive-unlabeled) learning, i.e. learning from only positive and unlabeled examples that has been studied in the context of binary classification. Under the assumption that M has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones; for the second case, we propose a "biased matrix completion" method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds -- if M e Rn×n the error is bounded as O( 1/(1-ρ)n), where 1 -- ρ denotes the fraction of ones observed. This implies a sample complexity of O(n log n) ones to achieve a small error, when M is dense and n is large. We extend our analysis to the inductive matrix completion problem, where rows and columns of M have associated features. We develop efficient and scalable optimization procedures for both the proposed methods and demonstrate their effectiveness for link prediction (on real-world networks consisting of over 2 million nodes and 90 million links) and semi-supervised clustering tasks.

Proceedings ArticleDOI
01 Jun 2015
TL;DR: In this paper, it was shown that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c).
Abstract: The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d. In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Proceedings Article
07 Dec 2015
TL;DR: In this paper, it is shown that it is possible to solve unstructured random quadratic systems in n variables exactly from O(n) equations in linear time, that is, in time proportional to reading the data {ai} and {yi}.
Abstract: This paper is concerned with finding a solution x to a quadratic system of equations yi = |〈ai, x〉|2, i = 1,..., m. We demonstrate that it is possible to solve unstructured random quadratic systems in n variables exactly from O(n) equations in linear time, that is, in time proportional to reading the data {ai} and {yi}. This is accomplished by a novel procedure, which starting from an initial guess given by a spectral initialization procedure, attempts to minimize a nonconvex objective. The proposed algorithm distinguishes from prior approaches by regularizing the initialization and descent procedures in an adaptive fashion, which discard terms bearing too much influence on the initial estimate or search directions. These careful selection rules—which effectively serve as a variance reduction scheme—provide a tighter initial guess, more robust descent directions, and thus enhanced practical performance. Further, this procedure also achieves a near-optimal statistical accuracy in the presence of noise. Empirically, we demonstrate that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size.

Book ChapterDOI
01 Jan 2015
TL;DR: A Lagrangian relaxation approach is presented and used for computational results on 40 classical test instances as well as a 500-node instance derived from the most populous counties in the contiguous United States.
Abstract: The p-median problem is central to much of discrete location modeling and theory. While the p-median problem is \( \mathcal{N}\mathcal{P} \)-hard on a general graph, it can be solved in polynomial time on a tree. A linear time algorithm for the 1-median problem on a tree is described. We also present a classical formulation of the problem. Basic construction and improvement algorithms are outlined. Results from the literature using various metaheuristics including tabu search, heuristic concentration, genetic algorithms, and simulated annealing are summarized. A Lagrangian relaxation approach is presented and used for computational results on 40 classical test instances as well as a 500-node instance derived from the most populous counties in the contiguous United States. We conclude with a discussion of multi-objective extensions of the p-median problem.

Journal ArticleDOI
TL;DR: An accurate and scalable Nyström scheme that first samples a large column subset from the input matrix, but then only performs an approximate SVD on the inner submatrix using the recent randomized low-rank matrix approximation algorithms is proposed.
Abstract: The Nystrom method is an efficient technique for the eigenvalue decomposition of large kernel matrices. However, to ensure an accurate approximation, a sufficient number of columns have to be sampled. On very large data sets, the singular value decomposition (SVD) step on the resultant data submatrix can quickly dominate the computations and become prohibitive. In this paper, we propose an accurate and scalable Nystrom scheme that first samples a large column subset from the input matrix, but then only performs an approximate SVD on the inner submatrix using the recent randomized low-rank matrix approximation algorithms. Theoretical analysis shows that the proposed algorithm is as accurate as the standard Nystrom method that directly performs a large SVD on the inner submatrix. On the other hand, its time complexity is only as low as performing a small SVD. Encouraging results are obtained on a number of large-scale data sets for low-rank approximation. Moreover, as the most computational expensive steps can be easily distributed and there is minimal data transfer among the processors, significant speedup can be further obtained with the use of multiprocessor and multi-GPU systems.

Journal ArticleDOI
TL;DR: In this paper, it was shown that nonconvex quadratically constrained quadratic programs can be solved in polynomial time when their underlying graph is acyclic, provided the constraints satisfy a certain technical condition.
Abstract: This paper proves that nonconvex quadratically constrained quadratic programs can be solved in polynomial time when their underlying graph is acyclic, provided the constraints satisfy a certain technical condition. We demonstrate this theory on optimal power-flow problems over tree networks.