scispace - formally typeset
Search or ask a question

Showing papers on "Time complexity published in 2004"


Journal ArticleDOI
TL;DR: A hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O (md log n) where d is the depth of the dendrogram describing the community structure.
Abstract: The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O (md log n) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with m approximately n and d approximately log n, in which case our algorithm runs in essentially linear time, O (n log(2) n). As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large on-line retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2 x 10(6) edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers.

6,599 citations


Proceedings ArticleDOI
14 Jun 2004
TL;DR: It is proved that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation.
Abstract: The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4, However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.

853 citations


Journal ArticleDOI
TL;DR: The smoothed analysis of algorithms is introduced, which continuously interpolates between the worst-case and average-case analyses of algorithms, and it is shown that the simplex algorithm has smoothed complexity polynomial in the input size and the standard deviation of Gaussian perturbations.
Abstract: We introduce the smoothed analysis of algorithms, which continuously interpolates between the worst-case and average-case analyses of algorithms. In smoothed analysis, we measure the maximum over inputs of the expected performance of an algorithm under small random perturbations of that input. We measure this performance in terms of both the input size and the magnitude of the perturbations. We show that the simplex algorithm has smoothed complexity polynomial in the input size and the standard deviation of Gaussian perturbations.

802 citations


Proceedings ArticleDOI
01 Sep 2004
TL;DR: In this article, the stability properties of linear time-varying systems in continuous time with zero row sums were studied and sufficient conditions for uniform exponential stability of the equilibrium set were presented.
Abstract: We study the stability properties of linear time-varying systems in continuous time whose system matrix is Metzler with zero row sums. This class of systems arises naturally in the context of distributed decision problems, coordination and rendezvous tasks and synchronization problems. The equilibrium set contains all states with identical state components. We present sufficient conditions guaranteeing uniform exponential stability of this equilibrium set, implying that all state components converge to a common value as time grows unbounded. Furthermore it is shown that this convergence result is robust with respect to an arbitrary delay, provided that the delay affects only the off-diagonal terms in the differential equation.

750 citations


Journal ArticleDOI
TL;DR: This paper presents their own distributed algorithm that outperforms the existing algorithms for minimum CDS and establishes the Ω(nlog n) lower bound on the message complexity of any distributed algorithm for nontrivial CDS, thus message-optimal.
Abstract: Connected dominating set (CDS) has been proposed as virtual backbone or spine of wireless ad hoc networks. Three distributed approximation algorithms have been proposed in the literature for minimum CDS. In this paper, we first reinvestigate their performances. None of these algorithms have constant approximation factors. Thus these algorithms cannot guarantee to generate a CDS of small size. Their message complexities can be as high as O(n2), and their time complexities may also be as large as O(n2) and O(n3). We then present our own distributed algorithm that outperforms the existing algorithms. This algorithm has an approximation factor of at most 8, O(n) time complexity and O(n log n) message complexity. By establishing the Ω(n log n) lower bound on the message complexity of any distributed algorithm for nontrivial CDS, our algorithm is thus message-optimal.

652 citations


01 Jan 2004
TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity that uses a multilevel approach that recursively projects a solution from a coarse resolution and refines the projected solution.
Abstract: The dynamic time warping (DTW) algorithm is able to find the optimal alignment between two time series It is often used to determine time series similarity, classification, and to find corresponding regions between two time series DTW has a quadratic time and space complexity that limits its use to only small time series data sets In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity FastDTW uses a multilevel approach that recursively projects a solution from a coarse resolution and refines the projected solution We prove the linear time and space complexity of FastDTW both theoretically and empirically We also analyze the accuracy of FastDTW compared to two other existing approximate DTW algorithms: Sakoe-Chuba Bands and Data Abstraction Our results show a large improvement in accuracy over the existing methods

524 citations


Journal ArticleDOI
TL;DR: This method is based on notions of voltage drops across networks that are both intuitive and easy to solve regardless of the complexity of the graph involved and allows for the swift discovery of the community surrounding a given node.
Abstract: We present a method that allows for the discovery of communities within graphs of arbitrary size in times that scale linearly with their size. This method avoids edge cutting and is based on notions of voltage drops across networks that are both intuitive and easy to solve regardless of the complexity of the graph involved. We additionally show how this algorithm allows for the swift discovery of the community surrounding a given node without having to extract all the communities out of a graph.

488 citations


Proceedings ArticleDOI
26 Apr 2004
TL;DR: In this article, three approximation algorithms for a variation of the set k-cover problem, where the objective is to partition the sensors into covers such that the number of covers that include an area, summed over all areas, is maximized, are presented.
Abstract: Wireless sensor networks (WSNs) are emerging as an effective means for environment monitoring. This paper investigates a strategy for energy efficient monitoring in WSNs that partitions the sensors into covers, and then activates the covers iteratively in a round-robin fashion. This approach takes advantage of the overlap created when many sensors monitor a single area. Our work builds upon previous work by Slijepcevic and Potkonjak (2001), where the model is first formulated. We have designed three approximation algorithms for a variation of the set k-cover problem, where the objective is to partition the sensors into covers such that the number of covers that include an area, summed over all areas, is maximized. The first algorithm is randomized and partitions the sensors, in expectation, within a fraction 1 - (1/e) (/spl sim/ .63) of the optimum. We present two other deterministic approximation algorithms. One is a distributed greedy algorithm with a 1/2 approximation ratio and the other is a centralized greedy algorithm with a 1 - (1/e) approximation ratio. We show that it is NP-complete to guarantee better than 15/16 of the optimal coverage, indicating that all three algorithms perform well with respect to the best approximation algorithm possible in polynomial time, assuming P /spl ne/ NP. Simulations indicate that in practice, the deterministic algorithms perform far above their worst case bounds, consistently covering more than 72% of what is covered by an optimum solution. Simulations also indicate that the increase in longevity is proportional to the amount of overlap amongst the sensors. The algorithms are fast, easy to use, and according to simulations, significantly increase the longevity of sensor networks. The randomized algorithm in particular seems quite practical.

386 citations


Journal ArticleDOI
TL;DR: It is shown that there are graphs with optimal labels of length 3 log n, such that if the authors use any labels with fewer than n bits per label, computing the distance function requires exponential time.

369 citations


Journal ArticleDOI
TL;DR: A surprisingly simple framework for the random generation of combinatorial configurations based on what the authors call Boltzmann models is proposed, which can be implemented easily, be analysed mathematically with great precision, and, when suitably tuned, tend to be very efficient in practice.
Abstract: This article proposes a surprisingly simple framework for the random generation of combinatorial configurations based on what we call Boltzmann models. The idea is to perform random generation of possibly complex structured objects by placing an appropriate measure spread over the whole of a combinatorial class – an object receives a probability essentially proportional to an exponential of its size. As demonstrated here, the resulting algorithms based on real-arithmetic operations often operate in linear time. They can be implemented easily, be analysed mathematically with great precision, and, when suitably tuned, tend to be very efficient in practice.

365 citations


Proceedings ArticleDOI
06 Dec 2004
TL;DR: This paper considers basestation allocation of subcarriers and power to each user to maximize the sum of user data rates, subject to constraints on total power, bit error rate, and proportionality among users data rates.
Abstract: Orthogonal frequency division multiple access (OFDMA) basestations allow multiple users to transmit simultaneously on different subcarriers during the same symbol period. This paper considers basestation allocation of subcarriers and power to each user to maximize the sum of user data rates, subject to constraints on total power, bit error rate, and proportionality among user data rates. Previous allocation methods have been iterative nonlinear methods suitable for offline optimization. In the special high subchannel SNR case, an iterative root-finding method has linear-time complexity in the number of users and N log N complexity in the number of subchannels. We propose a non-iterative method that is made possible by our relaxation of strict user rate proportionality constraints. Compared to the root-finding method, the proposed method waives the restriction of high subchannel SNR, has significantly lower complexity, and in simulation, yields higher user data rates.

Journal ArticleDOI
TL;DR: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm, and the adequacy of the canonization approach and the algorithm is shown.
Abstract: Background The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program.

Journal ArticleDOI
TL;DR: This paper identifies classes of decentralized control problems whose complexity ranges between NEXP and P, and distinguishes between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents.
Abstract: Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems.

Journal ArticleDOI
TL;DR: A novel search principle for optimal feature subset selection using the branch & bound method using a simple mechanism for predicting criterion values is introduced and two implementations of the proposed prediction mechanism are proposed that are suitable for use with nonrecursive and recursive criterion forms.
Abstract: A novel search principle for optimal feature subset selection using the branch & bound method is introduced. Thanks to a simple mechanism for predicting criterion values, a considerable amount of time can be saved by avoiding many slow criterion evaluations. We propose two implementations of the proposed prediction mechanism that are suitable for use with nonrecursive and recursive criterion forms, respectively. Both algorithms find the optimum usually several times faster than any other known branch & bound algorithm. As the algorithm computational efficiency is crucial, due to the exponential nature of the search problem, we also investigate other factors that affect the search performance of all branch & bound algorithms. Using a set of synthetic criteria, we show that the speed of the branch & bound algorithms strongly depends on the diversity among features, feature stability with respect to different subsets, and criterion function dependence on feature set size. We identify the scenarios where the search is accelerated the most dramatically (finish in linear time), as well as the worst conditions. We verify our conclusions experimentally on three real data sets using traditional probabilistic distance criteria.

Proceedings ArticleDOI
17 Oct 2004
TL;DR: This work presents the first linear time (1 + /spl epsiv/)-approximation algorithm for the k-means problem for fixed k and /spl Epsiv/, which runs in O(nd) time.
Abstract: We present the first linear time (1 + /spl epsiv/)-approximation algorithm for the k-means problem for fixed k and /spl epsiv/. Our algorithm runs in O(nd) time, which is linear in the size of the input. Another feature of our algorithm is its simplicity - the only technique involved is random sampling.

Journal ArticleDOI
TL;DR: It is demonstrated that with careful implementation it is possible, in most cases, to maintain the O( n 3 ) complexity or, in a few cases, increase the time complexity toO( n3logn).
Abstract: Insertion heuristics have proven to be popular methods for solving a variety of vehicle routing and scheduling problems. In this paper, we focus on the impact of incorporating complicating constraints on the efficiency of insertion heuristics. The basic insertion heuristic for the standard vehicle routing problem has a time complexity ofO( n 3 ). However, straightforward implementations of handling complicating constraints lead to an undesirable time complexity ofO( n 4 ). We demonstrate that with careful implementation it is possible, in most cases, to maintain theO( n 3 ) complexity or, in a few cases, increase the time complexity toO( n 3logn). The complicating constraints we consider in this paper are time windows, shift time limits, variable delivery quantities, fixed and variable delivery times, and multiple routes per vehicle. Little attention has been given to some of these complexities (with time windows being the notable exception), which are common in practice and have a significant impact on the feasibility of a schedule as well as the efficiency of insertion heuristics.

Journal ArticleDOI
TL;DR: An algorithm for the construction of an explicit piecewise linear state feedback approximation to nonlinear constrained receding horizon control that allows such controllers to be implemented via an efficient binary tree search, avoiding real-time optimization.

Journal ArticleDOI
TL;DR: A subsystem of second-order linear logic with restricted rules for exponentials so that proofs correspond to polynomial time algorithms, and vice versa, is presented.

Journal ArticleDOI
TL;DR: A new schedulability test is derived which can be tuned through a parameter to balance complexity versus acceptance ratio, so that it can be used on line to better exploit the processor, based on the available computational power.
Abstract: Feasibility analysis of fixed priority systems has been widely studied in the real-time literature and several acceptance tests have been proposed to guarantee a set of periodic tasks. They can be divided in two main classes: polynomial time tests and exact tests. Polynomial time tests can efficiently be used for online guarantee of real-time applications, where tasks are activated at runtime. These tests introduce a negligible overhead, when executed upon a new task arrival, however provide only a sufficient schedulability condition, which may cause a poor processor utilization. On the other hand, exact tests, which are based on response time analysis, provide a necessary and sufficient schedulability condition, but are too complex to be executed on line for large task sets. As a consequence, for large task sets, they are often executed off line. This paper proposes a novel approach for analyzing the schedulability of periodic task sets on a single processor under an arbitrary fixed priority assignment: Using this approach, we derive a new schedulability test which can be tuned through a parameter to balance complexity versus acceptance ratio, so that it can be used on line to better exploit the processor, based on the available computational power. Extensive simulations show that our test, when used in its exact form, is significantly faster than the current response time analysis methods. Moreover the proposed approach, for its elegance and compactness, offers an explanation of some known phenomena of fixed priority scheduling and could be helpful for further work on schedulability analysis.

Book ChapterDOI
16 Jul 2004
TL;DR: It is shown that no polynomial-time algorithm can solve the localization problem for sensor networks in the worst case, even for sets of distance pairs for which a unique solution exists, unless RP = NP.
Abstract: Determining the positions of the sensor nodes in a network is essential to many network functionalities such as routing, coverage and tracking, and event detection. The localization problem for sensor networks is to reconstruct the positions of all of the sensors in a network, given the distances between all pairs of sensors that are within some radius r of each other. In the past few years, many algorithms for solving the localization problem were proposed, without knowing the computational complexity of the problem. In this paper, we show that no polynomial-time algorithm can solve this problem in the worst case, even for sets of distance pairs for which a unique solution exists, unless RP = NP. We also discuss the consequences of our result and present open problems.

Journal ArticleDOI
TL;DR: It is proved that if there is a galled-tree, then the one produced by the algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations.
Abstract: A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. In a seminal paper, Wang et al.(1) studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galled-tree". They gave a polynomial-time algorithm that was intended to determine whether or not a set of sequences could be generated on galled-tree. Unfortunately, the algorithm by Wang et al.(1) is incomplete and does not constitute a necessary test for the existence of a galled-tree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galled-tree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations. We also prove that when there is a galled-tree for the data, the galled-tree minimizing the number of recombinations is "essentially unique". We also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.(8) PowerPoint slides of the conference talk can be found at our website.(7).

Book ChapterDOI
02 Oct 2004
TL;DR: In this article, a prefix-preserving closure extension of closed patterns is proposed, which enables searching all frequent closed patterns in a depth-first manner, in linear time for the number of frequent closed closed patterns.
Abstract: The class of closed patterns is a well known condensed representations of frequent patterns, and have recently attracted considerable interest. In this paper, we propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining frequent closed patterns from large transaction databases. The main theoretical contribution is our proposed prefix-preserving closure extension of closed patterns, which enables us to search all frequent closed patterns in a depth-first manner, in linear time for the number of frequent closed patterns. Our algorithm do not need any storage space for the previously obtained patterns, while the existing algorithms needs it. Performance comparisons of LCM with straightforward algorithms demonstrate the advantages of our prefix-preserving closure extension.

Proceedings ArticleDOI
27 Sep 2004
TL;DR: The paper shows that response time based schedulability tests with linear time bounds do not need to consider all tasks but just a small subset, which may lead to substantial speed-ups, and goes a step further with respect to other recent works in the literature by considering a more complete task model.
Abstract: As the bandwidth of CPUs and networks continues to grow, it becomes more attractive, for efficiency reasons, to share such resources among several applications with the minimum level of interference. This can be achieved using temporal partitions, with each application assigned to its own partition and executing as if it was executing alone on a resource with lower bandwidth. The partitions are associated to servers that execute the application tasks according to a given application-level scheduler. On the other hand, the set of servers is scheduled by a system-level scheduler. This paper addresses the particular case of fixed priorities-based application-level schedulers together with a periodic server model at the system level. It starts with an adequate response time analysis based on the notion of server availability for a known server. Then it addresses the inverse problem of designing a server with minimum system-level resource requirements to fulfill the application time constraints. In this context, the paper shows that response time based schedulability tests with linear time bounds do not need to consider all tasks but just a small subset, which may lead to substantial speed-ups. The proposed method goes a step further with respect to other recent works in the literature by considering a more complete task model, effectively computing the server parameters and establishing a better trade-off concerning complexity and tightness.

Journal ArticleDOI
TL;DR: It is shown that finding the optimal steady state can be solved using a linear programming approach and, thus, in polynomial time, and a theoretical comparison of the computing power of tree-based versus arbitrary platforms is provided.
Abstract: We consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous computing platform. We use a nonoriented graph to model the platform, where resources can have different speeds of computation and communication. Because the number of tasks is large, we focus on the question of determining the optimal steady state scheduling strategy for each processor (the fraction of time spent computing and the fraction of time spent communicating with each neighbor). In contrast to minimizing the total execution time, which is NP-hard in most formulations, we show that finding the optimal steady state can be solved using a linear programming approach and, thus, in polynomial time. Our result holds for a quite general framework, allowing for cycles and multiple paths in the interconnection graph, and allowing for several masters. We also consider the simpler case where the platform is a tree. While this case can also be solved via linear programming, we show how to derive a closed-form formula to compute the optimal steady state, which gives rise to a bandwidth-centric scheduling strategy. The advantage of this approach is that it can directly support autonomous task scheduling based only on information local to each node; no global information is needed. Finally, we provide a theoretical comparison of the computing power of tree-based versus arbitrary platforms.

Proceedings ArticleDOI
07 Mar 2004
TL;DR: This work is the first attempt to derive a performance guaranteed polynomial time approximation algorithm for jointly solving three problems of energy efficient communication strategies over a multi-hop wireless network.
Abstract: With increasing interest in energy constrained multi-hop wireless networks (Bambos, N. et al., 1991), a fundamental problem is one of determining energy efficient communication strategies over these multi-hop networks. The simplest problem is one where a given source node wants to communicate with a given destination, with a given rate over a multi-hop wireless network, using minimum power. Here the power refers to the total amount of power consumed over the entire network in order to achieve this rate between the source and the destination. There are three decisions that have to be made (jointly) in order to minimize the power requirement. (1) The path(s) that the data has to take between the source and the destination. (Routing). (2) The power with each link transmission is done. (Power Control). (3) Depending on the interference or the MAC characteristics, the time slots in which specific link transmissions have to take place. (Scheduling). (4) To the best of our knowledge, ours is the first attempt to derive a performance guaranteed polynomial time approximation algorithm for jointly solving these three problems. We formulate the overall problem as an optimization problem with non-linear objective function and non-linear constraints. We then derive a polynomial time 3-approximation algorithm to solve this problem. We also present a simple version of the algorithm, with the same performance bound, which involves solving only shortest path problems and which is quite efficient in practice. Our approach readily extends to the case where there are multiple source-destination pairs that have to communicate simultaneously over the multi-hop network.

Journal ArticleDOI
Ali Dasdan1
TL;DR: This article focuses on the fastest OCR algorithms only, provides a unified theoretical framework and a few new results, and runs these algorithms on the largest circuit benchmarks available.
Abstract: Optimum cycle ratio (OCR) algorithms are fundamental to the performance analysis of (digital or manufacturing) systems with cycles. Some applications in the computer-aided design field include cycle time and slack optimization for circuits, retiming, timing separation analysis, and rate analysis. There are many OCR algorithms, and since a superior time complexity in theory does not mean a superior time complexity in practice, or vice-versa, it is important to know how these algorithms perform in practice on real circuit benchmarks. A recent published study experimentally evaluated almost all the known OCR algorithms, and determined the fastest one among them. This article improves on that study in the following ways: (1) it focuses on the fastest OCR algorithms only; (2) it provides a unified theoretical framework and a few new results; (3) it runs these algorithms on the largest circuit benchmarks available; (4) it compares the algorithms in terms of many properties in addition to running times such as operation counts, convergence behavior, space requirements, generality, simplicity, and robustness; (5) it analyzes the experimental results using statistical techniques and provides asymptotic time complexity of each algorithm in practice; and (6) it provides clear guidance to the use and implementation of these algorithms together with our algorithmic improvements.

01 Jan 2004
TL;DR: This paper presents an efficient algorithm for constructing Bayesian belief networks from databases that guarantees that the perfect map of the underlying dependency model is generated, and enjoys the time complexity of O N ( ) 2 on conditional independence (CI) tests.
Abstract: This paper presents an efficient algorithm for constructing Bayesian belief networks from databases. The algorithm takes a database and an attributes ordering (i.e., the causal attributes of an attribute should appear earlier in the order) as input and constructs a belief network structure as output. The construction process is based on the computation of mutual information of attribute pairs. Given a data set which is large enough and has a DAGIsomorphic probability distribution, this algorithm guarantees that the perfect map [1] of the underlying dependency model is generated, and at the same time, enjoys the time complexity of O N ( ) 2 on conditional independence (CI) tests. To evaluate this algorithm, we present the experimental results on three versions of the well-known ALARM network database, which has 37 attributes and 10,000 records. The correctness proof and the analysis of computational complexity are also presented. We also discuss the features of our work and relate it to previous works.

Journal ArticleDOI
TL;DR: It is shown that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time.
Abstract: It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern-the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the Good-Turing probability-estimation problem.

Book ChapterDOI
01 Jan 2004
TL;DR: The reader may have noticed that for all the considered variants of the knapsack problem, no polynomial time algorithm have been presented which solves the problem to optimality.
Abstract: The reader may have noticed that for all the considered variants of the knapsack problem, no polynomial time algorithm have been presented which solves the problem to optimality. Indeed all the algorithms described are based on some kind of search and prune methods, which in the worst case may take exponential time. It would be a satisfying result if we somehow could prove it is not possible to find an algorithm which runs in polynomial time, somehow having evidence that the presented methods are “as good as we can do”. However, no proof has been found showing that the considered variants of the knapsack problem cannot be solved to optimality in polynomial time.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work introduces a framework, which is called Divide-by-2 (DB2), for extending support vector machines (SVM) to multi-class problems and shows that, DB2 is faster than one-against-one and one- against-rest algorithms in terms of testing time, significantly faster than the standard one- Against-Rest algorithms interms of training time, and the cross-validation accuracy ofDB2 is comparable to these two methods.
Abstract: We introduce a framework, which we call Divide-by-2 (DB2), for extending support vector machines (SVM) to multi-class problems. DB2 offers an alternative to the standard one-against-one and one-against-rest algorithms. For an N class problem, DB2 produces an N − 1 node binary decision tree where nodes represent decision boundaries formed by N − 1 SVM binary classifiers. This tree structure allows us to present a generalization and a time complexity analysis of DB2. Our analysis and related experiments show that, DB2 is faster than one-against-one and one-against-rest algorithms in terms of testing time, significantly faster than one-against-rest in terms of training time, and that the cross-validation accuracy of DB2 is comparable to these two methods.