Showing papers in "Journal of the ACM in 2002"
••
TL;DR: The degradation in network performance due to unregulated traffic is quantified and it is proved that if the latency of each edge is a linear function of its congestion, then the total latency of the routes chosen by selfish network users is at most 4/3 times the minimum possible total latency.
Abstract: We consider the problem of routing traffic to optimize the performance of a congested network. We are given a network, a rate of traffic between each pair of nodes, and a latency function for each edge specifying the time needed to traverse the edge given its congestion; the objective is to route traffic such that the sum of all travel times---the total latency---is minimized.In many settings, it may be expensive or impossible to regulate network traffic so as to implement an optimal assignment of routes. In the absence of regulation by some central authority, we assume that each network user routes its traffic on the minimum-latency path available to it, given the network congestion caused by the other users. In general such a "selfishly motivated" assignment of traffic to paths will not minimize the total latency; hence, this lack of regulation carries the cost of decreased network performance.In this article, we quantify the degradation in network performance due to unregulated traffic. We prove that if the latency of each edge is a linear function of its congestion, then the total latency of the routes chosen by selfish network users is at most 4/3 times the minimum possible total latency (subject to the condition that all traffic must be routed). We also consider the more general setting in which edge latency functions are assumed only to be continuous and nondecreasing in the edge congestion. Here, the total latency of the routes chosen by unregulated selfish network users may be arbitrarily larger than the minimum possible total latency; however, we prove that it is no more than the total latency incurred by optimally routing twice as much traffic.
1,703 citations
••
TL;DR: This work introduces a third, more general variety of temporal logic: alternating-time temporal logic, which offers selective quantification over those paths that are possible outcomes of games, such as the game in which the system and the environment alternate moves.
Abstract: Temporal logic comes in two varieties: linear-time temporal logic assumes implicit universal quantification over all paths that are generated by the execution of a system; branching-time temporal logic allows explicit existential and universal quantification over all paths. We introduce a third, more general variety of temporal logic: alternating-time temporal logic offers selective quantification over those paths that are possible outcomes of games, such as the game in which the system and the environment alternate moves. While linear-time and branching-time logics are natural specification languages for closed systems, alternating-time logics are natural specification languages for open systems. For example, by preceding the temporal operator "eventually" with a selective path quantifier, we can specify that in the game between the system and the environment, the system has a strategy to reach a certain state. The problems of receptiveness, realizability, and controllability can be formulated as model-checking problems for alternating-time formulas. Depending on whether or not we admit arbitrary nesting of selective path quantifiers and temporal operators, we obtain the two alternating-time temporal logics ATL and ATLa.ATL and ATLa are interpreted over concurrent game structures. Every state transition of a concurrent game structure results from a choice of moves, one for each player. The players represent individual components and the environment of an open system. Concurrent game structures can capture various forms of synchronous composition for open systems, and if augmented with fairness constraints, also asynchronous composition. Over structures without fairness constraints, the model-checking complexity of ATL is linear in the size of the game structure and length of the formula, and the symbolic model-checking algorithm for CTL extends with few modifications to ATL. Over structures with weak-fairness constraints, ATL model checking requires the solution of 1-pair Rabin games, and can be done in polynomial time. Over structures with strong-fairness constraints, ATL model checking requires the solution of games with Boolean combinations of Buchi conditions, and can be done in PSPACE. In the case of ATLa, the model-checking problem is closely related to the synthesis problem for linear-time formulas, and requires doubly exponential time.
1,449 citations
••
TL;DR: It is shown that the GVA payment scheme does not provide for a truth revealing mechanism, and another scheme is introduced that does guarantee truthfulness for a restricted class of players.
Abstract: Some important classical mechanisms considered in Microeconomics and Game Theory require the solution of a difficult optimization problem. This is true of mechanisms for combinatorial auctions, which have in recent years assumed practical importance, and in particular of the gold standard for combinatorial auctions, the Generalized Vickrey Auction (GVA). Traditional analysis of these mechanisms---in particular, their truth revelation properties---assumes that the optimization problems are solved precisely. In reality, these optimization problems can usually be solved only in an approximate fashion. We investigate the impact on such mechanisms of replacing exact solutions by approximate ones. Specifically, we look at a particular greedy optimization method. We show that the GVA payment scheme does not provide for a truth revealing mechanism. We introduce another scheme that does guarantee truthfulness for a restricted class of players. We demonstrate the latter property by identifying natural properties for combinatorial auctions and showing that, for our restricted class of players, they imply that truthful strategies are dominant. Those properties have applicability beyond the specific auction studied.
598 citations
••
TL;DR: The first nontrivial polynomial-time approximation algorithms for a general family of classification problems of this type are provided, the metric labeling problem, which contains as special cases a number of standard classification frameworks, including several arising from the theory of Markov random fields.
Abstract: In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pairwise relationships among the objects to be classified; this issue is one of the principal motivations for the framework of Markov random fields, and it arises in areas such as image processing, biometry, and document analysis. In its most basic form, this style of analysis seeks to find a classification that optimizes a combinatorial function consisting of assignment costs---based on the individual choice of label we make for each object---and separation costs---based on the pair of choices we make for two "related" objects.We formulate a general classification problem of this type, the metric labeling problem; we show that it contains as special cases a number of standard classification frameworks, including several arising from the theory of Markov random fields. From the perspective of combinatorial optimization, our problem can be viewed as a substantial generalization of the multiway cut problem, and equivalent to a type of uncapacitated quadratic assignment problem.We provide the first nontrivial polynomial-time approximation algorithms for a general family of classification problems of this type. Our main result is an O(log k log log k)-approximation algorithm for the metric labeling problem, with respect to an arbitrary metric on a set of k labels, and an arbitrary weighted graph of relationships on a set of objects. For the special case in which the labels are endowed with the uniform metric---all distances are the same---our methods provide a 2-approximation algorithm.
502 citations
••
TL;DR: It is shown that a new algorithm, Harmonic++, has asymptotic performance ratio at most 1.58889, and the analysis of Harmonic+1 presented in Richey [1991] is incorrect; this is a fundamental logical flaw.
Abstract: A new framework for analyzing online bin packing algorithms is presented. This framework presents a unified way of explaining the performance of algorithms based on the Harmonic approach. Within this framework, it is shown that a new algorithm, Harmonic++, has asymptotic performance ratio at most 1.58889. It is also shown that the analysis of Harmonic+1 presented in Richey [1991] is incorrect; this is a fundamental logical flaw, not an error in calculation or an omitted case. The asymptotic performance ratio of Harmonic+1 is at least 1.59217. Thus, Harmonic++ provides the best upper bound for the online bin packing problem to date.
313 citations
••
TL;DR: A multilink multisource model of the TCP Vegas congestion control mechanism is described, which implies that Vegas stabilizes around a weighted proportionally fair allocation of network capacity when there is sufficient buffering in the network.
Abstract: We view congestion control as a distributed primal--dual algorithm carried out by sources and links over a network to solve a global optimization problem. We describe a multilink multisource model of the TCP Vegas congestion control mechanism. The model provides a fundamental understanding of delay, fairness and loss properties of TCP Vegas. It implies that Vegas stabilizes around a weighted proportionally fair allocation of network capacity when there is sufficient buffering in the network. It clarifies the mechanism through which persistent congestion may arise and its consequences, and suggests how we might use REM active queue management to prevent it. We present simulation results that validate our conclusions.
306 citations
••
TL;DR: It is established that the algorithmic complexity of the minimumspanning tree problem is equal to its decision-tree complexity and a deterministic algorithm to find aminimum spanning tree of a graph with vertices and edges that runs in time is presented.
Abstract: We establish that the algorithmic complexity of the minimum spanning tree problem is equal to its decision-tree complexity. Specifically, we present a deterministic algorithm to find a minimum spanning tree of a graph with n vertices and m edges that runs in time O(T*(m,n)) where T* is the minimum number of edge-weight comparisons needed to determine the solution. The algorithm is quite simple and can be implemented on a pointer machine.Although our time bound is optimal, the exact function describing it is not known at present. The current best bounds known for T* are T*(m,n) = Ω(m) and T*(m,n) = O(m ∙ α(m,n)), where α is a certain natural inverse of Ackermann's function.Even under the assumption that T* is superlinear, we show that if the input graph is selected from Gn,m, our algorithm runs in linear time with high probability, regardless of n, m, or the permutation of edge weights. The analysis uses a new martingale for Gn,m similar to the edge-exposure martingale for Gn,p.
296 citations
••
TL;DR: Two new algorithms for solving the All Pairs Shortest Paths (APSP) problem for weighted directed graphs using fast matrix multiplication algorithms are presented.
Abstract: We present two new algorithms for solving the All Pairs Shortest Paths (APSP) problem for weighted directed graphs. Both algorithms use fast matrix multiplication algorithms.The first algorithm solves the APSP problem for weighted directed graphs in which the edge weights are integers of small absolute value in O(n2+μ) time, where μ satisfies the equation ω(1, μ, 1) = 1 + 2μ and ω(1, μ, 1) is the exponent of the multiplication of an n × nμ matrix by an nμ × n matrix. Currently, the best available bounds on ω(1, μ, 1), obtained by Coppersmith, imply that μ 0 is an error parameter and W is the largest edge weight in the graph, after the edge weights are scaled so that the smallest non-zero edge weight in the graph is 1. It returns estimates of all the distances in the graph with a stretch of at most 1 + ϵ. Corresponding paths can also be found efficiently.
286 citations
••
TL;DR: It is shown that the acyclic and bounded tree-width fragments have the same expressive power as the well-known guarded fragment and the finite-variable fragments of first-order logic, respectively.
Abstract: A number of efficient methods for evaluating first-order and monadic-second order queries on finite relational structures are based on tree-decompositions of structures or queries. We systematically study these methods.In the first part of the article, we consider arbitrary formulas on tree-like structures. We generalize a theorem of Courcelle [1990] by showing that on structures of bounded tree-width a monadic second-order formula (with free first- and second-order variables) can be evaluated in time linear in the structure size plus the size of the output.In the second part, we study tree-like formulas on arbitrary structures. We generalize the notions of acyclicity and bounded tree-width from conjunctive queries to arbitrary first-order formulas in a straightforward way and analyze the complexity of evaluating formulas of these fragments. Moreover, we show that the acyclic and bounded tree-width fragments have the same expressive power as the well-known guarded fragment and the finite-variable fragments of first-order logic, respectively.
265 citations
••
TL;DR: The technique is applied to show the surprising result that there are languages for which quantum finite automata take exponentially more states than those of corresponding classical automata.
Abstract: We consider the possibility of encoding m classical bits into many fewer n quantum bits (qubits) so that an arbitrary bit from the original m bits can be recovered with good probability. We show that nontrivial quantum codes exist that have no classical counterparts. On the other hand, we show that quantum encoding cannot save more than a logarithmic additive factor over the best classical encoding. The proof is based on an entropy coalescence principle that is obtained by viewing Holevo's theorem from a new perspective.In the existing implementations of quantum computing, qubits are a very expensive resource. Moreover, it is difficult to reinitialize existing bits during the computation. In particular, reinitialization is impossible in NMR quantum computing, which is perhaps the most advanced implementation of quantum computing at the moment. This motivates the study of quantum computation with restricted memory and no reinitialization, that is, of quantum finite automata. It was known that there are languages that are recognized by quantum finite automata with sizes exponentially smaller than those of corresponding classical automata. Here, we apply our technique to show the surprising result that there are languages for which quantum finite automata take exponentially more states than those of corresponding classical automata.
263 citations
•
TL;DR: It is found that under heavy-tailed workloads, TAGS can outperform all task assignment policies known to us by several orders of magnitude with respect to both mean response time and mean slowdown, provided the system load is not too high.
Abstract: We consider a distributed server system and ask which policy should be used for assigning tasks to hosts. In our server tasks are not preemptible. Also, the task's service demand is not known a priori. We are particularly concerned with the case where the workload is heavy-tailed, as is characteristic of many empirically measured computer workloads. We analyze several natural task assignment policies and propose a new one TAGS (Task Assignment based on Guessing Size). The TAGS algorithm is counterintuitive in many respects, including load unbalancing, non-work-conserving and fairness. We find that under heavy-tailed workloads, TAGS can outperform all task assignment policies known to us by several orders of magnitude with respect to both mean response time and mean slowdown, provided the system load is not too high.
••
TL;DR: Two polynomial-time approximationalgorithms with approximation ratio 1 + ε for any smallε to settle both the Closest String problem and the ClOSest Substring problem are presented.
Abstract: The problem of finding a center string that is "close" to every given string arises in computational molecular biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring problem. Given a set of strings S = {s1, s2, ..., sn}, each of length m, the Closest String problem is to find the smallest d and a string s of length m which is within Hamming distance d to each si e S. This problem comes from coding theory when we are looking for a code not too far away from a given set of codes. Closest Substring problem, with an additional input integer L, asks for the smallest d and a string s, of length L, which is within Hamming distance d away from a substring, of length L, of each si. This problem is much more elusive than the Closest String problem. The Closest Substring problem is formulated from applications in finding conserved regions, identifying genetic drug targets and generating genetic probes in molecular biology. Whether there are efficient approximation algorithms for both problems are major open questions in this area. We present two polynomial-time approximation algorithms with approximation ratio 1 + e for any small e to settle both questions.
•
TL;DR: This article defines timed regular expressions, a formalism for specifying discrete behaviors augmented with timing information, and proves that its expressive power is equivalent to the timed automata of Alur and Dill, the timed analogue of Kleene Theorem.
Abstract: In this article, we define timed regular expressions, a formalism for specifying discrete behaviors augmented with timing information, and prove that its expressive power is equivalent to the timed automata of Alur and Dill. This result is the timed analogue of Kleene Theorem and, similarly to that result, the hard part in the proof is the translation from automata to expressions. This result is extended from finite to infinite (in the sense of Buchi) behaviors. In addition to these fundamental results, we give a clean algebraic framework for two commonly accepted formalisms for timed behaviors, time-event sequences and piecewise-constant signals.
••
TL;DR: This work shows how to use an interactive theorem prover, HOL, together with a model checker, SPIN, to prove key properties of distance vector routing protocols, and develops verification techniques suited to routing protocols generally.
Abstract: We show how to use an interactive theorem prover, HOL, together with a model checker, SPIN, to prove key properties of distance vector routing protocols. We do three case studies: correctness of the RIP standard, a sharp real-time bound on RIP stability, and preservation of loop-freedom in AODV, a distance vector protocol for wireless networks. We develop verification techniques suited to routing protocols generally. These case studies show significant benefits from automated support in reduced verification workload and assistance in finding new insights and gaps for standard specifications.
••
TL;DR: This article investigates XML document specifications with DTDs and integrity constraints, such as keys and foreign keys, and establishes complexity bounds on the implication problem, which is shown to be coNP-complete for unary keys andforeign keys.
Abstract: The article investigates XML document specifications with DTDs and integrity constraints, such as keys and foreign keys. We study the consistency problem of checking whether a given specification is meaningful: that is, whether there exists an XML document that both conforms to the DTD and satisfies the constraints. We show that DTDs interact with constraints in a highly intricate way and as a result, the consistency problem in general is undecidable. When it comes to unary keys and foreign keys, the consistency problem is shown to be NP-complete. This is done by coding DTDs and integrity constraints with linear constraints on the integers. We consider the variations of the problem (by both restricting and enlarging the class of constraints), and identify a number of tractable cases, as well as a number of additional NP-complete ones. By incorporating negations of constraints, we establish complexity bounds on the implication problem, which is shown to be coNP-complete for unary keys and foreign keys.
••
TL;DR: It is explained why there has been little progress in developing practical, substantially subcubic general CFG parsers since Valiant showed that Boolean matrix multiplication can be used for parsing context-free grammars.
Abstract: In 1975, Valiant showed that Boolean matrix multiplication can be used for parsing context-free grammars (CFGs), yielding the asympotically fastest (although not practical) CFG parsing algorithm known. We prove a dual result: any CFG parser with time complexity O(gn3-∈), where g is the size of the grammar and n is the length of the input string, can be efficiently converted into an algorithm to multiply m × m Boolean matrices in time O(m3-∈/3). Given that practical, substantially subcubic Boolean matrix multiplication algorithms have been quite difficult to find, we thus explain why there has been little progress in developing practical, substantially subcubic general CFG parsers. In proving this result, we also develop a formalization of the notion of parsing.
••
TL;DR: This work presents new differencing algorithms that operate at a fine granularity (the atomic unit of change), make no assumptions about the format or alignment of input data, and in practice use linear time, use constant space, and give good compression.
Abstract: The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains [log n] bits), as compared to algorithms that run in O(n) time and O(n) space or in O(n2) time and O(1) space. We introduce two new techniques for differential compression and apply these to give additional algorithms that improve compression and time performance. We experimentally explore the properties of our algorithms by running them on actual versioned data. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.
••
TL;DR: An efficient heuristic called the greedy-path merging algorithm for solving the Contig Scaffolding Problem is described, originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics.
Abstract: Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.
••
TL;DR: It is argued that for many algorithms, and static analysis algorithms in particular, bottom-up logic program presentations are clearer and simpler to analyze, for both correctness and complexity, than classical pseudo-code presentations.
Abstract: This paper argues that for many algorithms, and static analysis algorithms in particular, bottom-up logic program presentations are clearer and simpler to analyze, for both correctness and complexity, than classical pseudo-code presentations. The main technical contribution consists of two theorems which allow, in many cases, the asymptotic running time of a bottom-up logic program to be determined by inspection. It is well known that a datalog program runs in O(nk) time where k is the largest number of free variables in any single rule. The theorems given here are significantly more refined. A variety of algorithms are presented and analyzed as examples.
••
[...]
TL;DR: An NP characterization is given for planarity, in which two nations of a map are considered adjacent when they share any point of their boundaries (not necessarily an edge, as planarity requires).
Abstract: We consider a modified notion of planarity, in which two nations of a map are considered adjacent when they share any point of their boundaries (not necessarily an edge, as planarity requires). Such adjacencies define a map graph. We give an NP characterization for such graphs, derive some consequences regarding sparsity and coloring, and survey some algorithmic results.
••
TL;DR: A lower-bound theorem for deriving the minimum redundancy is proved and interesting upper and lower bounds and trade-offs between A and r are shown in the case of multidimensional range queries and set queries.
Abstract: We develop a theoretical framework to characterize the hardness of indexing data sets on block-access memory devices like hard disks. We define an indexing workload by a data set and a set of potential queries. For a workload, we can construct an indexing scheme, which is a collection of fixed-sized subsets of the data. We identify two measures of efficiency for an indexing scheme on a workload: storage redundancy, r (how many times each item in the data set is stored), and access overhead, A (how many times more blocks than necessary does a query retrieve).For many interesting families of workloads, there exists a trade-off between storage redundancy and access overhead. Given a desired access overhead A, there is a minimum redundancy that any indexing scheme must exhibit. We prove a lower-bound theorem for deriving the minimum redundancy. By applying this theorem, we show interesting upper and lower bounds and trade-offs between A and r in the case of multidimensional range queries and set queries.
••
TL;DR: It is shown that the assumption of ordinal invariance enforces a qualitative decision procedure that presupposes a comparative possibility representation of uncertainty, originally due to Lewis, and usual in nonmonotonic reasoning.
Abstract: This paper investigates to what extent a purely symbolic approach to decision making under uncertainty is possible, in the scope of artificial intelligence. Contrary to classical approaches to decision theory, we try to rank acts without resorting to any numerical representation of utility or uncertainty, and without using any scale on which both uncertainty and preference could be mapped. Our approach is a variant of Savage's where the setting is finite, and the strict preference on acts is a partial order. It is shown that although many axioms of Savage theory are preserved and despite the intuitive appeal of the ordinal method for constructing a preference over acts, the approach is inconsistent with a probabilistic representation of uncertainty. The latter leads to the kind of paradoxes encountered in the theory of voting. It is shown that the assumption of ordinal invariance enforces a qualitative decision procedure that presupposes a comparative possibility representation of uncertainty, originally due to Lewis, and usual in nonmonotonic reasoning. Our axiomatic investigation thus provides decision-theoretic foundations to the preferential inference of Lehmann and colleagues. However, the obtained decision rules are sometimes either not very decisive or may lead to overconfident decisions, although their basic principles look sound. This paper points out some limitations of purely ordinal approaches to Savage-like decision making under uncertainty, in perfect analogy with similar difficulties in voting theory.
••
TL;DR: The setransformations are used to solve NP-hard clustering problems in the cube as well as in geometric settings, and it is shown that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube.
Abstract: The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝd) endowed with a distance function (e.g., L2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1}d with Hamming distance, and ℝd either with L1 distance, or with L2 distance, or with the square of L2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k-median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
••
TL;DR: This work shows that the queries expressible by BAGs are precisely those definable by first-order inductions of linear depth, or, equivalently, those computable in linear time on a parallel machine with polynomially many processors, and shows that RAGs is more expressive than monadic second-order logic for queries of any arity.
Abstract: Structured document databases can be naturally viewed as derivation trees of a context-free grammar. Under this view, the classical formalism of attribute grammars becomes a formalism for structured document query languages. From this perspective, we study the expressive power of BAGs: Boolean-valued attribute grammars with propositional logic formulas as semantic rules, and RAGs: relation-valued attribute grammars with first-order logic formulas as semantic rules. BAGs can express only unary queries; RAGs can express queries of any arity. We first show that the (unary) queries expressible by BAGs are precisely those definable in monadic second-order logic. We then show that the queries expressible by RAGs are precisely those definable by first-order inductions of linear depth, or, equivalently, those computable in linear time on a parallel machine with polynomially many processors. Further, we show that RAGs that only use synthesized attributes are strictly weaker than RAGs that use both synthesized and inherited attributes. We show that RAGs are more expressive than monadic second-order logic for queries of any arity. Finally, we discuss relational attribute grammars in the context of BAGs and RAGs. We show that in the case of BAGs this does not increase the expressive power, while different semantics for relational RAGs capture the complexity classes NP, coNP and UP ∩ coUP.
••
TL;DR: In this article, the authors present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters.
Abstract: We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-efficient algorithms in the single-level cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the average-case cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence.We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and for dealing with the hitherto unresolved problem of limited associativity.
••
TL;DR: It is shown that the degradation reduces and finally disappears in the limit as the intermediately distributed decision scheme tends to a completely distributed one.
Abstract: In completely symmetric systems that have homogeneous nodes (hosts, computers, or processors) with identical arrival processes, an optimal static load balancing scheme does not involve the forwarding of jobs among nodes. Using an appropriate analytic model of a distributed computer system, we examine the following three decision schemes for load balancing: completely distributed, intermediately distributed, and completely centralized. We show that there is no forwarding of jobs in the completely centralized and completely distributed schemes, but that in an intermediately distributed decision scheme, mutual forwarding of jobs among nodes is possible, leading to degradation in system performance for every decision maker. This result appears paradoxical, because by adding communication capacity to the system for the sharing of jobs between nodes, the overall system performance is degraded. We characterize conditions under which such paradoxical behavior occurs, and we give examples in which the degradation of performance may increase without bound. We show that the degradation reduces and finally disappears in the limit as the intermediately distributed decision scheme tends to a completely distributed one.
••
TL;DR: An exponential lower bound on the circuit complexity of deciding the weak monadic second-order theory of one successor (WS1S) is proved and the result is extended to probabilistic circuits.
Abstract: An exponential lower bound on the circuit complexity of deciding the weak monadic second-order theory of one successor (WS1S) is proved. Circuits are built from binary operations, or 2-input gates, which compute arbitrary Boolean functions. In particular, to decide the truth of logical formulas of length at most 610 in this second-order language requires a circuit containing at least 10125 gates. So even if each gate were the size of a proton, the circuit would not fit in the known universe. This result and its proof, due to both authors, originally appeared in 1974 in the Ph.D. thesis of the first author. In this article, the proof is given, the result is put in historical perspective, and the result is extended to probabilistic circuits.a
••
TL;DR: In this paper, a wait-free construction of bounded concurrent timestamp systems from 1-writer shared registers is proposed. But this construction is based on a construction proposed by the second author in 1986.
Abstract: Shared registers are basic objects used as communication mediums in asynchronous concurrent computation. A concurrent timestamp system is a higher typed communication object, and has been shown to be a powerful tool to solve many concurrency control problems. It has turned out to be possible to construct such higher typed objects from primitive lower typed ones. The next step is to find efficient constructions. We propose a very efficient wait-free construction of bounded concurrent timestamp systems from 1-writer shared registers. This finalizes, corrects, and extends a preliminary bounded multiwriter construction proposed by the second author in 1986. That work partially initiated the current interest in wait-free concurrent objects, and introduced a notion of discrete vector clocks in distributed algorithms.
••
TL;DR: This work derives tight bounds on cache misses for evaluation of explicit stencil operators on rectangular grids and shows that stencil calculations on grids whose interference lattices have a short vector feature abnormally high numbers of cache misses are unfavorable.
Abstract: We derive tight bounds on cache misses for evaluation of explicit stencil operators on rectangular grids. Our lower bound is based on the isoperimetric property of the discrete crosspolytope. Our upper bound is based on a good surface-to-volume ratio of a parallelepiped spanned by a reduced basis of the interference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by a factor of three, relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattices have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on a MIPS R10000 processor we show a good correlation between abnormally high numbers of cache misses and unfavorable three-dimensional grids.
••
TL;DR: This work studies a property of correctness of programs written in a shared-memory parallel language that is a semantic equivalence between the parallel program and its sequential version, that is defined by some standard parallel imperative language.
Abstract: We study a property of correctness of programs written in a shared-memory parallel language. This property is a semantic equivalence between the parallel program and its sequential version, that we define. We consider some standard parallel imperative language. Within this language, this correctness property follows from the preservation of data dependences by the control flow and the synchronizations. Our result makes use of the semantics of the sequential version only. Hence, through our result, checking the correctness of some parallel program boils down to verifying properties of some sequential program.