Showing papers on "Time complexity published in 2007"

PDF

Open Access

Journal Article•DOI•

Near linear time algorithm to detect community structures in large-scale networks.

[...]

Usha Nandini Raghavan¹, Réka Albert¹, Soundar R. T. Kumara¹•Institutions (1)

11 Sep 2007-Physical Review E

TL;DR: This paper investigates a simple label propagation algorithm that uses the network structure alone as its guide and requires neither optimization of a predefined objective function nor prior information about the communities.

...read moreread less

Abstract: Community detection and analysis is an important methodology for understanding the organization of various real-world networks and has applications in problems as diverse as consensus formation in social communities or the identification of functional modules in biochemical networks. Currently used algorithms that identify the community structures in large-scale real-world networks require a priori information such as the number and sizes of communities or are computationally expensive. In this paper we investigate a simple label propagation algorithm that uses the network structure alone as its guide and requires neither optimization of a predefined objective function nor prior information about the communities. In our algorithm every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities. We validate the algorithm by applying it to networks whose community structures are known. We also demonstrate that the algorithm takes an almost linear time and hence it is computationally less expensive than what was possible so far.

...read moreread less

3,095 citations

Journal Article•DOI•

Toward accurate dynamic time warping in linear time and space

[...]

Stan Salvador¹, Philip K. Chan²•Institutions (2)

General Dynamics¹, Florida Institute of Technology²

01 Oct 2007

TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity and shows a large improvement in accuracy over existing methods.

...read moreread less

Abstract: Dynamic Time Warping (DTW) has a quadratic time and space complexity that limits its use to small time series. In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution. We prove the linear time and space complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW by comparing it to two other types of existing approximate DTW algorithms: constraints (such as Sakoe-Chiba Bands) and abstraction. Our results show a large improvement in accuracy over existing methods.

...read moreread less

1,363 citations

Journal Article•DOI•

A faster circular binary segmentation algorithm for the analysis of array CGH data

[...]

Ennapadam Venkatraman¹, Adam B. Olshen¹•Institutions (1)

Memorial Sloan Kettering Cancer Center¹

20 Feb 2007-Bioinformatics

TL;DR: A hybrid approach to obtain the P-value of the test statistic in linear time is presented and it is shown that the substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed.

...read moreread less

Abstract: Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. Results: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. Availability: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher. Contact: venkatre@mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

952 citations

Journal Article•DOI•

Sharing Visual Features for Multiclass and Multiview Object Detection

[...]

Antonio Torralba¹, Kevin Murphy², William T. Freeman•Institutions (2)

Massachusetts Institute of Technology¹, University of British Columbia²

01 May 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views) and considerably reduce the computational cost of multiclass object detection.

...read moreread less

Abstract: We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection

...read moreread less

812 citations

Journal Article•DOI•

The complexity of homomorphism and constraint satisfaction problems seen from the other side

[...]

Martin Grohe¹•Institutions (1)

Humboldt University of Berlin¹

01 Mar 2007-Journal of the ACM

TL;DR: It is proved that, under some complexity theoretic assumption from parameterized complexity theory, HOM(C,−) is in polynomial time if and only if C has bounded tree width modulo homomorphic equivalence.

...read moreread less

Abstract: We give a complexity theoretic classification of homomorphism problems for graphs and, more generally, relational structures obtained by restricting the left hand side structure in a homomorphism. For every class C of structures, let HOM(C,−) be the problem of deciding whether a given structure A ∈C has a homomorphism to a given (arbitrary) structure s. We prove that, under some complexity theoretic assumption from parameterized complexity theory, HOM(C,−) is in polynomial time if and only if C has bounded tree width modulo homomorphic equivalence.Translated into the language of constraint satisfaction problems, our result yields a characterization of the tractable structural restrictions of constraint satisfaction problems. Translated into the language of database theory, it implies a characterization of the tractable instances of the evaluation problem for conjunctive queries over relational databases.

...read moreread less

501 citations

Journal Article•DOI•

Control of Boolean networks: hardness results and algorithms for tree structured networks.

[...]

Tatsuya Akutsu¹, Morihiro Hayashida¹, Wai-Ki Ching², Michael K. Ng³•Institutions (3)

Kyoto University¹, University of Hong Kong², Hong Kong Baptist University³

21 Feb 2007-Journal of Theoretical Biology

TL;DR: It is shown that finding a control strategy leading to the desired global state is computationally intractable (NP-hard) in general and this hardness result is extended for BNs with considerably restricted network structures.

...read moreread less

475 citations

Posted Content•

Settling the Complexity of Computing Two-Player Nash Equilibria

[...]

Xi Chen¹, Xiaotie Deng², Shang-Hua Teng³•Institutions (3)

Tsinghua University¹, City University of Hong Kong², Akamai Technologies³

12 Apr 2007-arXiv: Computer Science and Game Theory

TL;DR: It is proved that Bimatrix, the problem of finding a Nash equilibrium in a two-player game, is complete for the complexity class PPAD (Polynomial Parity Argument, Directed version) introduced by Papadimitriou in 1991.

...read moreread less

Abstract: We settle a long-standing open question in algorithmic game theory. We prove that Bimatrix, the problem of finding a Nash equilibrium in a two-player game, is complete for the complexity class PPAD Polynomial Parity Argument, Directed version) introduced by Papadimitriou in 1991. This is the first of a series of results concerning the complexity of Nash equilibria. In particular, we prove the following theorems: Bimatrix does not have a fully polynomial-time approximation scheme unless every problem in PPAD is solvable in polynomial time. The smoothed complexity of the classic Lemke-Howson algorithm and, in fact, of any algorithm for Bimatrix is not polynomial unless every problem in PPAD is solvable in randomized polynomial time. Our results demonstrate that, even in the simplest form of non-cooperative games, equilibrium computation and approximation are polynomial-time equivalent to fixed point computation. Our results also have two broad complexity implications in mathematical economics and operations research: Arrow-Debreu market equilibria are PPAD-hard to compute. The P-Matrix Linear Complementary Problem is computationally harder than convex programming unless every problem in PPAD is solvable in polynomial time.

...read moreread less

469 citations

Journal Article•DOI•

An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison

[...]

Haibin Ling¹, Kazunori Okada•Institutions (1)

University of Maryland, College Park¹

01 May 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed EMD-L1 significantly simplifies the original linear programming formulation of EMD, and empirically shows that this new algorithm has an average time complexity of O(N2), which significantly improves the best reported supercubic complexity of the original EMD.

...read moreread less

Abstract: We propose EMD-L1: a fast and exact algorithm for computing the earth mover's distance (EMD) between a pair of histograms. The efficiency of the new algorithm enables its application to problems that were previously prohibitive due to high time complexities. The proposed EMD-L1 significantly simplifies the original linear programming formulation of EMD. Exploiting the L1 metric structure, the number of unknown variables in EMD-L1 is reduced to O(N) from O(N2) of the original EMD for a histogram with N bins. In addition, the number of constraints is reduced by half and the objective function of the linear program is simplified. Formally, without any approximation, we prove that the EMD-L1 formulation is equivalent to the original EMD with a L1 ground distance. To perform the EMD-L1 computation, we propose an efficient tree-based algorithm, Tree-EMD. Tree-EMD exploits the fact that a basic feasible solution of the simplex algorithm-based solver forms a spanning tree when we interpret EMD-L1 as a network flow optimization problem. We empirically show that this new algorithm has an average time complexity of O(N2), which significantly improves the best reported supercubic complexity of the original EMD. The accuracy of the proposed methods is evaluated by experiments for two computation-intensive problems: shape recognition and interest point matching using multidimensional histogram-based local features. For shape recognition, EMD-L1 is applied to compare shape contexts on the widely tested MPEG7 shape data set, as well as an articulated shape data set. For interest point matching, SIFT, shape context and spin image are tested on both synthetic and real image pairs with large geometrical deformation, illumination change, and heavy intensity noise. The results demonstrate that our EMD-L1-based solutions outperform previously reported state-of-the-art features and distance measures in solving the two tasks

...read moreread less

456 citations

Journal Article•DOI•

Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control

[...]

A. Al-Tamimi¹, Frank L. Lewis¹, Murad Abu-Khalaf¹•Institutions (1)

University of Texas at Arlington¹

01 Mar 2007

TL;DR: It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game.

...read moreread less

Abstract: In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x,u,w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft.

...read moreread less

441 citations

Book Chapter•DOI•

Small Approximate Pareto Sets for Bi-objective Shortest Paths and Other Problems

[...]

Ilias Diakonikolas¹, Mihalis Yannakakis¹•Institutions (1)

Columbia University¹

20 Aug 2007

TL;DR: The problem of computing a minimum set of solutions that approximates within a specified accuracy the Pareto curve of a multi-objective optimization problem was studied in this article, where it was shown that it is NP-hard to do better than 2.

...read moreread less

Abstract: We investigate the problem of computing a minimum set of solutions that approximates within a specified accuracy i¾?the Pareto curve of a multiobjective optimization problem. We show that for a broad class of bi-objective problems (containing many important widely studied problems such as shortest paths, spanning tree, and many others), we can compute in polynomial time an i¾?-Pareto set that contains at most twice as many solutions as the minimum such set. Furthermore we show that the factor of 2 is tight for these problems, i.e., it is NP-hard to do better. We present further results for three or more objectives, as well as for the dual problem of computing a specified number kof solutions which provide a good approximation to the Pareto curve.

...read moreread less

429 citations

Proceedings Article•DOI•

Selecting Stars: The k Most Representative Skyline Operator

[...]

Xuemin Lin¹, Yidong Yuan¹, Qing Zhang², Ying Zhang³•Institutions (3)

University of New South Wales¹, Commonwealth Scientific and Industrial Research Organisation², NICTA³

15 Apr 2007

TL;DR: An efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique and a comprehensive performance evaluation demonstrates that the randomized technique is very efficient, highly accurate, and scalable.

...read moreread less

Abstract: Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, we show that the problem is NP-hard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1-1/e. To speed-up the computation, an efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable.

...read moreread less

Journal Article•DOI•

Identity-based key agreement protocols from pairings

[...]

Liqun Chen¹, Zhaohui Cheng², Nigel P. Smart³•Institutions (3)

Hewlett-Packard¹, Middlesex University², University of Bristol³

25 Jun 2007-International Journal of Information Security

TL;DR: In this article, the authors present a method incorporating a built-in decisional function into the protocols, which transfers a hard decisional problem in the proof to an easy decisional problems.

...read moreread less

Abstract: In recent years, a large number of identity- based key agreement protocols from pairings have been proposed. Some of them are elegant and practical. However, the security of this type of protocol has been surprisingly hard to prove, even in the random oracle model. The main issue is that a simulator is not able to deal with reveal queries, because it requires solving either a computational problem or a decisional problem, both of which are generally believed to be hard (i.e., computationally infeasible). The best solution so far for security proofs uses the gap assumption, which means assuming that the existence of a decisional oracle does not change the hardness of the corresponding computational problem. The disadvantage of using this solution to prove security is that such decisional oracles, on which the security proof relies, cannot be performed by any polynomial time algorithm in the real world, because of the hardness of the decisional problem. In this paper we present a method incorporating a built-in decisional function into the protocols. The function transfers a hard decisional problem in the proof to an easy decisional problem. We then discuss the resulting efficiency of the schemes and the relevant security reductions, in the random oracle model, in the context of different pairings one can use. We pay particular attention, unlike most other papers in the area, to the issues which arise when using asymmetric pairings.

...read moreread less

Proceedings Article•DOI•

Finding Top-k Min-Cost Connected Trees in Databases

[...]

Bolin Ding¹, J. Xu Yu¹, Shan Wang¹, Lu Qin², Xiao Zhang², Xuemin Lin³ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Renmin University of China², University of New South Wales³

15 Apr 2007

TL;DR: This paper proposes a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3ln + 2l ((l + logn)n + m), where n and m are the numbers of nodes and edges in graph G, which can handle graphs with a large number of nodes.

...read moreread less

Abstract: It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper, we study processing an l-keyword query, p1, p1, ..., pl, against a relational database which can be modeled as a weighted graph, G(V, E). Here V is a set of nodes (tuples) and E is a set of edges representing foreign key references between tuples. Let Vi ⊆ V be a set of nodes that contain the keyword pi. We study finding top-k minimum cost connected trees that contain at least one node in every subset Vi, and denote our problem as GST-k When k = 1, it is known as a minimum cost group Steiner tree problem which is NP-complete. We observe that the number of keywords, l, is small, and propose a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3ln + 2l ((l + logn)n + m)), where n and m are the numbers of nodes and edges in graph G. Our solution can handle graphs with a large number of nodes. Our GST-1 solution can be easily extended to support GST-k, which outperforms the existing GST-k solutions over both weighted undirected/directed graphs. We conducted extensive experimental studies, and report our finding.

...read moreread less

Journal Article•DOI•

A taxonomy of suffix array construction algorithms

[...]

Simon J. Puglisi¹, William F. Smyth¹, Andrew Turpin²•Institutions (2)

Curtin University¹, RMIT University²

06 Jul 2007-ACM Computing Surveys

TL;DR: A survey of suffix array construction algorithms can be found in this article, with a comparison of the algorithms' worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.

...read moreread less

Abstract: In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple high-level descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms' worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.

...read moreread less

Proceedings Article•DOI•

Faster integer multiplication

[...]

Martin Fürer¹•Institutions (1)

Pennsylvania State University¹

11 Jun 2007

TL;DR: A major step towards closing the gap from above is presented by presenting an algorithm running in time n log n, 2O(log* n) for boolean circuits as well as for multitape Turing machines, but it has consequences to other models of computation as well.

...read moreread less

Abstract: For more than 35 years, the fastest known method for integer multiplication has been the Schonhage-Strassen algorithm running in time O(n log n log log n). Under certain restrictive conditions there is a corresponding Ω(n log n) lower bound. The prevailing conjecture has always been that the complexity of an optimal algorithm is Θ(n log n). We present a major step towards closing the gap from above by presenting an algorithm running in time n log n, 2O(log* n).The main result is for boolean circuits as well as for multitape Turing machines, but it has consequences to other models of computation as well.

...read moreread less

Proceedings Article•DOI•

P3 & Beyond: Solving Energies with Higher Order Cliques

[...]

Pushmeet Kohli¹, M.P. Kumar¹, Philip H. S. Torr¹•Institutions (1)

Oxford Brookes University¹

17 Jun 2007

TL;DR: A class of higher order clique potentials is introduced and it is shown that the expansion and swap moves for any energy function composed of these potentials can be found by minimizing a submodular function.

...read moreread less

Abstract: In this paper we extend the class of energy functions for which the optimal alpha-expansion and alphabeta-swap moves can be computed in polynomial time. Specifically, we introduce a class of higher order clique potentials and show that the expansion and swap moves for any energy function composed of these potentials can be found by minimizing a submodular function. We also show that for a subset of these potentials, the optimal move can be found by solving an st-mincut problem. We refer to this subset as the P3 Potts model. Our results enable the use of powerful move making algorithms i.e. alpha-expansion and alphabeta-swap for minimization of energy functions involving higher order cliques. Such functions have the capability of modelling the rich statistics of natural scenes and can be used for many applications in computer vision. We demonstrate their use on one such application i.e. the texture based video segmentation problem.

...read moreread less

Journal Article•DOI•

Theory of semidefinite programming for Sensor Network Localization

[...]

Anthony Man-Cho So¹, Yinyu Ye¹•Institutions (1)

Stanford University¹

30 Jan 2007-Mathematical Programming

TL;DR: In this paper, a semidefinite programming (SDP) based model and method for the position estimation problem in sensor network localization and other Euclidean distance geometry applications is presented.

...read moreread less

Abstract: We analyze the semidefinite programming (SDP) based model and method for the position estimation problem in sensor network localization and other Euclidean distance geometry applications. We use SDP duality and interior-point algorithm theories to prove that the SDP localizes any network or graph that has unique sensor positions to fit given distance measures. Therefore, we show, for the first time, that these networks can be localized in polynomial time. We also give a simple and efficient criterion for checking whether a given instance of the localization problem has a unique realization in $$\mathcal{R}^2$$ using graph rigidity theory. Finally, we introduce a notion called strong localizability and show that the SDP model will identify all strongly localizable sub-networks in the input network.

...read moreread less

Journal Article•DOI•

Complexity of Self-Assembled Shapes

[...]

David Soloveichik, Erik Winfree

01 Feb 2007-SIAM Journal on Computing

TL;DR: In this paper, it was shown that the size of the smallest self-assembly program that builds a shape and the shape's descriptional (Kolmogorov) complexity should be related.

...read moreread less

Abstract: The connection between self-assembly and computation suggests that a shape can be considered the output of a self-assembly “program,” a set of tiles that fit together to create a shape. It seems plausible that the size of the smallest self-assembly program that builds a shape and the shape’s descriptional (Kolmogorov) complexity should be related. We show that when using a notion of a shape that is independent of scale, this is indeed so: in the tile assembly model, the minimal number of distinct tile types necessary to self-assemble a shape, at some scale, can be bounded both above and below in terms of the shape’s Kolmogorov complexity. As part of the proof, we develop a universal constructor for this model of self-assembly that can execute an arbitrary Turing machine program specifying how to grow a shape. Our result implies, somewhat counterintuitively, that self-assembly of a scaled-up version of a shape often requires fewer tile types. Furthermore, the independence of scale in self-assembly theory appears to play the same crucial role as the independence of running time in the theory of computability. This leads to an elegant formulation of languages of shapes generated by self-assembly. Considering functions from bit strings to shapes, we show that the running-time complexity, with respect to Turing machines, is polynomially equivalent to the scale complexity of the same function implemented via self-assembly by a finite set of tile types. Our results also hold for shapes defined by Wang tiling—where there is no sense of a self-assembly process—except that here time complexity must be measured with respect to nondeterministic Turing machines.

...read moreread less

Journal Article•DOI•

Generalized Compact Knapsacks, Cyclic Lattices, and Efficient One-Way Functions

[...]

Daniele Micciancio¹•Institutions (1)

University of California, San Diego¹

01 Dec 2007-Computational Complexity

TL;DR: It is proved that for any unbounded function m = ω(1) with arbitrarily slow growth rate, solving the generalized compact knapsack problems on the average is at least as hard as the worst-case instance of various approximation problems over cyclic lattices.

...read moreread less

Abstract: We investigate the average-case complexity of a generalization of the compact knapsack problem to arbitrary rings: given m (random) ring elements a 1,..., a m ? R and a (random) target value b ? R, find coefficients x 1, ..., x m ? S (where S is an appropriately chosen subset of R) such that ? a i · x i = b. We consider compact versions of the generalized knapsack where the set S is large and the number of weights m is small. Most variants of this problem considered in the past (e.g., when $$R={\mathbb{Z}}$$ is the ring of the integers) can be easily solved in polynomial time even in the worst case. We propose a new choice of the ring R and subset S that yields generalized compact knapsacks that are seemingly very hard to solve on the average, even for very small values of m. Namely, we prove that for any unbounded function m = ?(1) with arbitrarily slow growth rate, solving our generalized compact knapsack problems on the average is at least as hard as the worst-case instance of various approximation problems over cyclic lattices. Specific worst-case lattice problems considered in this paper are the shortest independent vector problem SIVP and the guaranteed distance decoding problem GDD (a variant of the closest vector problem, CVP) for approximation factors n 1+? almost linear in the dimension of the lattice. Our results yield very efficient and provably secure one-way functions (based on worst-case complexity assumptions) with key size and time complexity almost linear in the security parameter n. Previous constructions with similar security guarantees required quadratic key size and computation time. Our results can also be formulated as a connection between the worst-case and average-case complexity of various lattice problems over cyclic and quasi-cyclic lattices.

...read moreread less

Proceedings Article•DOI•

An improved algorithm to accelerate regular expression evaluation

[...]

Michela Becchi¹, Patrick Crowley¹•Institutions (1)

Washington University in St. Louis¹

03 Dec 2007

TL;DR: This paper introduces a general compression technique that results in at most 2N state traversals when processing a string of length N, and describes a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.

...read moreread less

Abstract: Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While deterministic finite automata (DFAs) allow this operation to be performed in linear time, they may exhibit prohibitive memory requirements. In [9], Kumar et al. propose Delayed Input DFAs (D2FAs), which provide a trade-off between the memory requirements of the compressed DFA and the number of states visited for each character processed, which corresponds directly to the memory bandwidth required to evaluate regular expressions.In this paper we introduce a general compression technique that results in at most 2N state traversals when processing a string of length N. In comparison to the D2FA approach, our technique achieves comparable levels of compression, with lower provable bounds on memory bandwidth (or greater compression for a given bandwidth bound). Moreover, our proposed algorithm has lower complexity, is suitable for scenarios where a compressed DFA needs to be dynamically built or updated, and fosters locality in the traversal process. Finally, we also describe a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.

...read moreread less

Journal Article•DOI•

Genetic algorithms for supply-chain scheduling: A case study in the distribution of ready-mixed concrete

[...]

David Naso¹, Michele Surico¹, Biagio Turchiano¹, Uzay Kaymak²•Institutions (2)

Instituto Politécnico Nacional¹, Erasmus University Rotterdam²

16 Mar 2007-European Journal of Operational Research

TL;DR: This paper proposes a novel meta-heuristic approach based on a hybrid genetic algorithm combined with constructive heuristics for ready-mixed concrete delivery of just-in-time production and transportation to distributed customers.

...read moreread less

Journal Article•DOI•

Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions

[...]

Sven Verdoolaege¹, Rachid Seghir, Kristof Beyls², Vincent Loechner, Maurice Bruynooghe¹ - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, Ghent University²

23 Mar 2007-Algorithmica

TL;DR: This work extends an existing method, based on Barvinok's decomposition, for counting the number of integer points in a non-parametric polytope and computes polynomially-sized enumerators in polynomial time (for fixed dimensions).

...read moreread less

Abstract: Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric polytopes. It is well known that the enumerator of such a set can be represented by an explicit function consisting of a set of quasi-polynomials, each associated with a chamber in the parameter space. Previously, interpolation was used to obtain these quasi-polynomials, but this technique has several disadvantages. Its worst-case computation time for a single quasi-polynomial is exponential in the input size, even for fixed dimensions. The worst-case size of such a quasi-polynomial (measured in bits needed to represent the quasi-polynomial) is also exponential in the input size. Under certain conditions this technique even fails to produce a solution. Our main contribution is a novel method for calculating the required quasi-polynomials analytically. It extends an existing method, based on Barvinok's decomposition, for counting the number of integer points in a non-parametric polytope. Our technique always produces a solution and computes polynomially-sized enumerators in polynomial time (for fixed dimensions).

...read moreread less

Book Chapter•DOI•

Computability of models for sequence assembly

[...]

Paul Medvedev¹, Konstantinos Georgiou¹, Gene Myers², Michael Brudno¹•Institutions (2)

University of Toronto¹, Howard Hughes Medical Institute²

08 Sep 2007

TL;DR: This work shows sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs, and gives the first, to the knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA.

...read moreread less

Abstract: Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.

...read moreread less

Journal Article•DOI•

Quickest Flows Over Time

[...]

Lisa Fleischer, Martin Skutella¹•Institutions (1)

Technical University of Dortmund¹

01 Feb 2007-SIAM Journal on Computing

TL;DR: The approach yields fully polynomial-time approximation schemes for the NP-hard quickest min-cost and multicommodity flow problems and shows that storage of flow at intermediate nodes is unnecessary, and the approximation schemes do not use any.

...read moreread less

Abstract: Flows over time (also called dynamic flows) generalize standard network flows by introducing an element of time. They naturally model problems where travel and transmission are not instantaneous. Traditionally, flows over time are solved in time-expanded networks that contain one copy of the original network for each discrete time step. While this method makes available the whole algorithmic toolbox developed for static flows, its main and often fatal drawback is the enormous size of the time-expanded network. We present several approaches for coping with this difficulty. First, inspired by the work of Ford and Fulkerson on maximal $s$-$t$-flows over time (or “maximal dynamic $s$-$t$-flows”), we show that static length-bounded flows lead to provably good multicommodity flows over time. Second, we investigate “condensed” time-expanded networks which rely on a rougher discretization of time. We prove that a solution of arbitrary precision can be computed in polynomial time through an appropriate discretization leading to a condensed time-expanded network of polynomial size. In particular, our approach yields fully polynomial-time approximation schemes for the NP-hard quickest min-cost and multicommodity flow problems. For single commodity problems, we show that storage of flow at intermediate nodes is unnecessary, and our approximation schemes do not use any.

...read moreread less

Proceedings Article•

Memory-bounded dynamic programming for DEC-POMDPs

[...]

Sven Seuken¹, Shlomo Zilberstein²•Institutions (2)

Harvard University¹, University of Massachusetts Amherst²

06 Jan 2007

TL;DR: This work presents the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs, which can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality.

...read moreread less

Abstract: Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques.

...read moreread less

Proceedings Article•DOI•

Approximation algorithm for the temperature-aware scheduling problem

[...]

Sushu Zhang¹, Karam S. Chatha¹•Institutions (1)

Arizona State University¹

05 Nov 2007

TL;DR: It is proved that the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints is NP-hard, and a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) are presented.

...read moreread less

Abstract: The paper addresses the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints. We prove that the problem is NP-hard, and present a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) for the problem. The FPTAS technique is able to generate solutions in polynomial time that are guaranteed to be within a designer specified quality bound (QB) (say within 1% of the optimal). We evaluate our techniques by experimentation with multimedia and synthetic benchmarks mapped on the 70 nm CMOS technology processor. The experimental results demonstrate our techniques are able to match optimal solutions when QB is set at 5%, can generate solutions that arc quite close to optimal ( 25%) for large task sets with 120 nodes (while the optimal solution takes several hundred seconds). We also analyze the effect of different thermal parameters, such as the initial temperature, the final temperature and the thermal resistance.

...read moreread less

Journal Article•DOI•

Fast Nearest Neighbor Condensation for Large Data Sets Classification

[...]

Fabrizio Angiulli

01 Nov 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The fast condensed nearest neighbor (FCNN) rule was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

...read moreread less

Abstract: This work has two main objectives, namely, to introduce a novel algorithm, called the fast condensed nearest neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

...read moreread less

Journal Article•DOI•

Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

[...]

Yuchun Tang¹, Yan-Qing Zhang², Zhen Huang²•Institutions (2)

Secure Computing¹, Georgia State University²

01 Jul 2007-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: It is demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM - Recursive Feature Elimination and three correlation-based methods based on the analysis of three publicly available microarray expression datasets.

...read moreread less

Abstract: Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression datasets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is $O(d * \log{_2d})$, where $d$ is the size of the original gene set.

...read moreread less

Proceedings Article•DOI•

Detecting time series motifs under uniform scaling

[...]

Dragomir Yankov¹, Eamonn Keogh¹, Jose Medina¹, Bill Chiu¹, Victor Zordan¹ - Show less +1 more•Institutions (1)

University of California, Riverside¹

12 Aug 2007

TL;DR: This work introduces a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and shows that it produces objectively superior results in several important domains.

...read moreread less

Abstract: Time series motifs are approximately repeated patterns foundwithin the data Such motifs have utility for many data mining algorithms, including rule-discovery,novelty-detection, summarization and clustering Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied tomany domains, including medicine, motion capture, robotics and meteorologyIn this work we show that most previous applications of time series motifs have been severely limited by the definition's brittleness to even slight changes of uniform scaling, the speed at which the patterns develop We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains Apart from being more general than all other motifdiscovery algorithms, a further contribution of our work isthat it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified

...read moreread less

Book Chapter•DOI•

On acyclic conjunctive queries and constant delay enumeration

[...]

Guillaume Bagan¹, Arnaud Durand², Etienne Grandjean¹•Institutions (2)

University of Caen Lower Normandy¹, University of Paris²

11 Sep 2007

TL;DR: The enumeration complexity of the natural extension of acyclic conjunctive queries with disequalities is studied and it is shown that for each query of free-connex treewidth bounded by some constant k, enumeration of results can be done with O(|M|k+1) precomputation steps and constant delay.

...read moreread less

Abstract: We study the enumeration complexity of the natural extension of acyclic conjunctive queries with disequalities. In this language, a number of NP-complete problems can be expressed. We first improve a previous result of Papadimitriou and Yannakakis by proving that such queries can be computed in time c.|M|ċ|ϕ(M)| where M is the structure, ϕ(M) is the result set of the query and c is a simple exponential in the size of the formula ϕ. A consequence of our method is that, in the general case, tuples of such queries can be enumerated with a linear delay between two tuples. We then introduce a large subclass of acyclic formulas called CCQ≠ and prove that the tuples of a CCQ≠ query can be enumerated with a linear time precomputation and a constant delay between consecutive solutions. Moreover, under the hypothesis that the multiplication of two n×n boolean matrices cannot be done in time O(n2), this leads to the following dichotomy for acyclic queries: either such a query is in CCQ≠ or it cannot be enumerated with linear precomputation and constant delay. Furthermore we prove that testing whether an acyclic formula is in CCQ≠ can be performed in polynomial time. Finally, the notion of free-connex treewidth of a structure is defined. We show that for each query of free-connex treewidth bounded by some constant k, enumeration of results can be done with O(|M|k+1) precomputation steps and constant delay.

...read moreread less

Collapse