# Showing papers in "Journal of the ACM in 1989"

••

TL;DR: Information Dispersal Algorithm (IDA) has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission ofInformation in networks, and to communications between processors in parallel computers.

Abstract: An Information Dispersal Algorithm (IDA) is developed that breaks a file F of length L = u Fu into n pieces Fi, l ≤ i ≤ n, each of length uFiu = L/m, so that every m pieces suffice for reconstructing F. Dispersal and reconstruction are computationally efficient. The sum of the lengths uFiu is (n/m) · L. Since n/m can be chosen to be close to l, the IDA is space efficient. IDA has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission of information in networks, and to communications between processors in parallel computers. For the latter problem provably time-efficient and highly fault-tolerant routing on the n-cube is achieved, using just constant size buffers.

2,453 citations

••

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

1,835 citations

••

TL;DR: It is proved that any routing scheme for general networks that achieves a stretch factor k ≥ 1 must use a total of &OHgr; bits of routing information in the networks, which is a trade-off between the efficiency of a routing scheme and its space requirements.

Abstract: Two conflicting goals play a crucial role in the design of routing schemes for communication networks. A routing scheme should use paths that are as short as possible for routing messages in the network, while keeping the routing information stored in the processors' local memory as succinct as possible. The efficiency of a routing scheme is measured in terms of its stretch factor-the maximum ratio between the length of a route computed by the scheme and that of a shortest path connecting the same pair of vertices.Most previous work has concentrated on finding good routing schemes (with a small fixed stretch factor) for special classes of network topologies. In this paper the problem for general networks is studied, and the entire range of possible stretch factors is examined. The results exhibit a trade-off between the efficiency of a routing scheme and its space requirements. Almost tight upper and lower bounds for this trade-off are presented. Specifically, it is proved that any routing scheme for general n-vertex networks that achieves a stretch factor k ≥ 1 must use a total of O(n1+1/(2k+4)) bits of routing information in the networks. This lower bound is complemented by a family K(k) of hierarchical routing schemes (for every k ≥ l) for unit-cost general networks, which guarantee a stretch factor of O(k), require storing a total of O(k3n1+(1/h)logn)- bits of routing information in the network, name the vertices with O(log2n)-bit names and use O(logn)-bit headers.

380 citations

••

TL;DR: It is shown that a judicious choice of cycles for canceling leads to a polynomial bound on the number of iterations in this algorithm, which is comparable to those of the fastest previously known algorithms.

Abstract: A classical algorithm for finding a minimum-cost circulation consists of repeatedly finding a residual cycle of negative cost and canceling it by pushing enough flow around the cycle to saturate an arc. We show that a judicious choice of cycles for canceling leads to a polynomial bound on the number of iterations in this algorithm. This gives a very simple strongly polynomial algorithm that uses no scaling. A variant of the algorithm that uses dynamic trees runs in O(nm(log n)min{log(nC), m log n}) time on a network of n vertices, m arcs, and arc costs of maximum absolute value C. This bound is comparable to those of the fastest previously known algorithms.

283 citations

••

TL;DR: A serializability theory is developed that can be used to prove the correctness of concurrency control algorithms for nested transactions and for multilevel database systems, and axioms are presented that express the basic properties that programs that manage or access data need to satisfy.

Abstract: Today's standard model for database concurrency control, called serializability theory, represents executions of transactions as partial orders of operations. The theory tells when an execution is serializable, that is, when the set of operations of a transaction execute atomically with respect to those of other transactions. It has been used successfully to prove correctness of most database concurrency control algorithms. Its most serious limitation is its inability to represent nested computations conveniently.This paper presents a more general model that permits nested transactions. In this model, transactions may execute subtransactions, giving rise to tree-structured computations. A serializability theory is developed for this model, which can be used to prove the correctness of concurrency control algorithms for nested transactions and for multilevel database systems.The theory is based on an abstract model of computation that allows arbitrary operations, and parallel and even nondeterministic programs. Axioms are presented that express the basic properties that programs that manage or access data need to satisfy. We use these axioms to derive proof techniques. One new technique—substitution—shows the equivalence of two executions by substituting one subcomputation by another, usually shallower (i.e., less nested), one. Our proof techniques are illustrated by applying them to several well-known concurrency control problems.

191 citations

••

TL;DR: To construct a short tour through points in the plane, the points are sequenced as they appear along a spacefilling curve, so it is easily coded and requires only memory and log operations.

Abstract: To construct a short tour through points in the plane, the points are sequenced as they appear along a spacefilling curve. This heuristic consists essentially of sorting, so it is easily coded and requires only O(N) memory and O(N log N) operations. Its performance is competitive with that of other fast methods.

173 citations

••

TL;DR: The class of acyclic fork-join queuing networks that arise in various applications, including parallel processing and flexible manufacturing are studied and stability conditions are obtained and upper and lower bounds on the network response times are developed.

Abstract: In this paper the class of acyclic fork-join queuing networks that arise in various applications, including parallel processing and flexible manufacturing are studied. In such queuing networks, a fork describes the simultaneous creation of several new customers, which are sent to different queues. The corresponding join occurs when the services of all these new customers are completed. The evolution equations that govern the behavior of such networks are derived. From this, the stability conditions are obtained and upper and lower bounds on the network response times are developed. These bounds are based on various stochastic ordering principles and on the notion of association of random variables.

158 citations

••

TL;DR: Efficient algorithms are given for inferring sequences produced by certain pseudo-random number generators and specific examples of generators having this form are shown to be cryptographically insecure.

Abstract: In this paper, efficient algorithms are given for inferring sequences produced by certain pseudo-random number generators. The generators considered are all of the form Xn = Σkj-l αjφj(Xo, Xl, . . ., Xn-l) (mod m). In each case, we assume that the functions φj are known and polynomial time computable, but that the coefficients aj and the modulus m are unknown. Using this general method, specific examples of generators having this form, the linear congruential method, linear congruences with n terms in the recurrence, and quadratic congruences are shown to be cryptographically insecure.

142 citations

••

TL;DR: Lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven and almost all Boolean functions of n bits require log.

Abstract: Optimal O(log n/log log n) lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds up to O(log n/log log n) time is also exhibited. That is, for every time bound T within this range a function is exhibited that can be easily computed using polynomial resources in time T but requires more than polynomial resources to be computed in time T - 1. Finally, it is shown that almost all Boolean functions of n bits require log n - log log n + O(1) time when the number of processors is at most polynomial in n. The bounds do not place restrictions on the uniformity of the algorithms nor on the instruction sets of the machines.

138 citations

••

TL;DR: The randomization method is used to calculate various measures over a finite observation period related to availability modeling of repairable computer systems, and is extended to calculate performability distributions.

Abstract: Repairable computer systems are considered, the availability behavior of which can be modeled as a homogeneous Markov process. The randomization method is used to calculate various measures over a finite observation period related to availability modeling of these systems. These measures include the distribution of the number of events of a certain type, the distribution of the length of time in a set of states, and the probability of a near-coincident fault. The method is then extended to calculate performability distributions. The method relies on coloring subintervals of the finite observation period based on the particular application, and then calculating the measure of interest using these colored intervals.

130 citations

••

TL;DR: The time needed to find the modular decomposition of a graph is reduced by using less time to insert each vertex successively into the decomposition tree, using &Ogr;(n ) time to inserting each vertex.

Abstract: Modular decomposition is a form of graph decomposition that has been discovered independently by researchers in graph theory, game theory, network theory, and other areas. This paper reduces the time needed to find the modular decomposition of a graph from O(n3) to O(n2). Together with a new algorithm for transitive orientation given in [21], this leads to fast new algorithms for a number of problems in graph recognition and isomorphism, including recognition of comparability graphs and permutation graphs. The new algorithm works by inserting each vertex successively into the decomposition tree, using O(n) time to insert each vertex.

••

TL;DR: The periodic balanced sorting network, which consists of log log blocks, is introduced and each block, called a balanced merging block, merges elements on the even input lines with those on the odd input lines.

Abstract: A periodic sorting network consists of a sequence of identical blocks. In this paper, the periodic balanced sorting network, which consists of log n blocks, is introduced. Each block, called a balanced merging block, merges elements on the even input lines with those on the odd input lines.The periodic balanced sorting network sorts n items in O([log n]2) time using (n/2)(log n)2 comparators. Although these bounds are comparable to many existing sorting networks, the periodic structure enables a hardware implementation consisting of only one block with the output of the block recycled back as input until the output is sorted. An implementation of our network on the shuffle exchange interconnection model in which the direction of the comparators are all identical and fixed is also presented.

••

TL;DR: The main result of this paper is an 0.0-time algorithm for deciding whether a given graph is a circle graph, that is, the intersection graph of a set of chords on a circle.

Abstract: The main result of this paper is an 0([V] x [E]) time algorithm for deciding whether a given graph is a circle graph, that is, the intersection graph of a set of chords on a circle. The algorithm utilizes two new graph-theoretic results, regarding necessary induced subgraphs of graphs having neither articulation points nor similar pairs of vertices. Furthermore, as a substep of the algorithm, it is shown how to find in 0([V] x [E]) time a decomposition of a graph into prime graphs, thereby improving on a result of Cunningham.

••

TL;DR: It is shown that any class of functions that can be inference from examples with probability exceeding 1/2 can be inferred deterministically, and that for probabilities p there is a discrete hierarchy of inferability parameterized by p.

Abstract: Inductive inference machines construct programs for total recursive functions given only example values of the functions. Probabilistic inductive inference machines are defined, and for various criteria of successful inference, it is asked whether a probabilistic inductive inference machine can infer larger classes of functions if the inference criterion is relaxed to allow inference with probability at least p, (0

••

TL;DR: The efficient solutions to the component merging problem and the new observation about F-heaps lead to an &Ogr;(n ) algorithm for finding a maximum weighted matching in general graphs, giving the fastest algorithm currently known for this problem.

Abstract: The (component) merging problem is a new graph problem. Versions of this problem appear as bottlenecks in various graph algorithms. A new data structure solves this problem efficiently, and two special cases of the problem have even more efficient solutions based on other data structures. The performance of the data structures is sped up by introducing a new algorithmic tool called packets.The algorithms that use these solutions to the component merging problem also exploit new properties of two existing data structures. Specifically, B-trees can be used simultaneously as a priority queue and a concatenable queue. Similarly, F-heaps support some kinds of split operations with no loss of efficiency.An immediate application of the solution to the simplest version of the merging problem is an O(t(m, n)) algorithm for finding minimum spanning trees in undirected graphs without using F-heaps, where t(m, n) = mlog2log2logdn, the graph has n vertices and m edges, and d = max(m/n, 2). Packets also improve the F-heap minimum spanning tree algorithm, giving the fastest algorithm currently known for this problem.The efficient solutions to the merging problem and the new observation about F-heaps lead to an O(n(t(m, n) + nlogn)) algorithm for finding a maximum weighted matching in general graphs. This settles an open problem posed by Tarjan [ 15, p. 123], where the weaker bound of O(nm log (n2/m)) was conjectured.

••

TL;DR: A new equivalence between concurrent processes is proposed, which generalizes the well-known bisimulation equivalence to take into account the distributed nature of processes and is a noninterleaving semantic theory.

Abstract: A new equivalence between concurrent processes is proposed. It generalizes the well-known bisimulation equivalence to take into account the distributed nature of processes. The result is a noninterleaving semantic theory; concurrent processes are differentiated from processes that are non-deterministic but sequential. The new equivalence, together with its observational version, is investigated for a subset of the language CCS, and various algebraic characterizations are obtained.

••

TL;DR: A method is described that takes a constraint C and a class of updates, and either proves that an update in the class cannot violate C, or produces a formula that is satisfied before the update if and only if C would continue to be satisfied were the update to occur.

Abstract: If a relational database is required to satisfy a set of integrity constraints, then when the database is updated, one must ensure that it continues to satisfy the constraints. It is desirable not to have to evaluate each constraint after each update. A method is described that takes a constraint C and a class of updates, and either proves that an update in the class cannot violate C, or produces a formula C' (a complete test) that is satisfied before the update if and only if C would continue to be satisfied were the update to occur. C' is frequently much easier to evaluate than C. In addition, a formula D (a sufficient test) is sometimes produced such that if D is satisfied before the update, then C would continue to be satisfied were the update to occur. The method is proved correct. The method is substantially more general than other reported techniques for this problem. The method has been implemented, and a number of experiments with the implementation are presented.

••

TL;DR: Using simple protocols, it is shown how to achieve consensus in constant expected time, within a variety of fail-stop and omission failure models, which are based on distributively flipping a coin.

Abstract: Using simple protocols, it is shown how to achieve consensus in constant expected time, within a variety of fail-stop and omission failure models. Significantly, the strongest models considered are completely asynchronous. All of the results are based on distributively flipping a coin, which is usable by a significant majority of the processors. Finally, a nearly matching lower bound is also given for randomized protocols for consensus.

••

TL;DR: A hierarchical graph model that permits taking advantage of the hierarchy is presented and algorithms are given that test planarity of a hierarchically described graph in linear time in the length of the hierarchical description.

Abstract: Using hierarchical definitions, one can describe very large graphs in small space. The blow-up from the length of the hierarchical description to the size of the graph can be as large as exponential. If the efficiency of graph algorithms is measured in terms of the length of the hierarchical description rather than in terms of the graph size, algorithms that do not exploit the hierarchy become hopelessly inefficient. Whether the hierarchy can be exploited to speed up the solution of graph problems depends on the hierarchical graph model. In the literature, hierarchical graph models have been described that allow almost no exploitation of the hierarchy [ 16]. In this paper, a hierarchical graph model that permits taking advantage of the hierarchy is presented. For this model algorithms are given that test planarity of a hierarchically described graph in linear time in the length of the hierarchical description.

••

Brown University

^{1}TL;DR: A set of techniques for organizing temporalinformation by exploiting the local and global structure inherent in awide class of temporal reasoning problems are described, which have been used to support a variety of powerful inference mechanisms.

Abstract: Many real-world applications involve the management of large amounts of time-dependent information. Temporal database systems maintain this information in order to support various sorts of inference (e.g., answering questions involving propositions that are true over some intervals and false over others). For any given proposition, there are typically many different occasions on which that proposition becomes true and persists for some length of time. In this paper, these occasions are referred to as time tokens. Many routine database operations must search through the database for time tokens satisfying certain temporal constraints. To expedite these operations, this paper describes a set of techniques for organizing temporal information by exploiting the local and global structure inherent in a wide class of temporal reasoning problems. The global structure of time is exemplified in conventions for partitioning time according to the calendar and the clock. This global structure is used to partition the set of time tokens to facilitate retrieval. The local structure of time is exemplified in the causal relationships between events and the dependencies between planned activities. This local structure is used as part of a strategy for reducing the computation required during constraint propagation. The organizational techniques described in this paper are quite general, and have been used to support a variety of powerful inference mechanisms. Integrating these techniques into an existing temporal database system has increased, by an order of magnitude or more in most applications, the number of time tokens that can be efficiently handled. —Author's Abstract

••

TL;DR: P-uniform NC (PUNC) is characterized in terms of space-bounded AuxPDAs and alternating Turing Machines with bounded access to the input and the notions of general-purpose and special-purpose computation are considered.

Abstract: Much complexity-theoretic work on parallelism has focused on the class NC, which is defined in terms of logspace-uniform circuits. Yet P-uniform circuit complexity is in some ways a more natural setting for studying feasible parallelism. In this paper, P-uniform NC (PUNC) is characterized in terms of space-bounded AuxPDAs and alternating Turing Machines with bounded access to the input. The notions of general-purpose and special-purpose computation are considered, and a general-purpose parallel computer for PUNC is presented. It is also shown that NC = PUNC if all tally languages in P are in NC; this implies that the NC = PUNC question and the NC = P question are both instances of the ASPACE(S(n)) = ASPACE,TIME(S(n), S(n)o(1)) question. As a corollary, it follows that NC = PUNC implies PSPACE = DTIME(2no(1)).

••

TL;DR: It is shown that n + k - O comparisons are necessary, on average, to find the smallest of n numbers and this lower bound matches the behavior of the technique of Floyd and Rivest to within a lower-order term.

Abstract: It is shown that n + k - O(1) comparisons are necessary, on average, to find the kth smallest of n numbers (k l n/2). This lower bound matches the behavior of the technique of Floyd and Rivest to within a lower-order term. 7n/4 ± o(n) comparisons, on average, are shown to be necessary and sufficient to find the maximum and median of a set. An upper bound of 9n/4 ± o(n) and a lower bound of 2n - o(n) are shown for the max-min-median problem.

••

TL;DR: Almost-optimum algorithms for the lopsided case of unbounded searching are obtained and some extensions to nonconstant costs are briefly sketched.

Abstract: Binary search trees with costs a and b, respectively, on the left and right edges (lopsided search trees) are considered. The exact shape, minimum worst-case cost, and minimum average cost of lopsided trees of n internal nodes are determined for nonnegative a and b; the costs are both roughly logp(n + 1) where p is the unique real number in the interval (1. 2] satisfying 1/pa + 1/pb = 1. Search procedures are given that come within a small additive constant of the lower bounds. Almost-optimum algorithms for the lopsided case of unbounded searching are also obtained. Some extensions to nonconstant costs are briefly sketched.

••

TL;DR: A new probabilistic failure model for networks of gates is formulated and supports the proofs of both the positive and negative results appearing in the literature.

Abstract: A new probabilistic failure model for networks of gates is formulated. Although this model has not been used previously, it supports the proofs of both the positive and negative results appearing in the literature. Furthermore, with respect to this new model, the complexity measures of both size and depth are affected by at most constant multiplicative factors when the set of functions that can be computed by gates is changed from one finite and complete basis to another, or when the bound on the failure probability of the gates is changed (within the limits allowed by the basis), or when the bound on the error probability of the network is changed (within the limits allowed by the basis and the failure probability of the gates).

••

TL;DR: It is shown that testing for minimality is, in general, undecidable, and an efficient algorithm for a useful class of recursive rules is presented, and it is used to transform a recursive definition to a minimal recursive definition.

Abstract: Recursive inference rules arise in recursive definitions in logic programming systems and in database systems with recursive query languages. Let D be a recursive definition of a relation t. D is considered minimal if for any predicate p in a recursive rule in D, p must appear in a recursive rule in any definition of t. It is shown that testing for minimality is, in general, undecidable. However, an efficient algorithm for a useful class of recursive rules is presented, and it is used to transform a recursive definition to a minimal recursive definition. Evaluating the minimized definition avoids redundant computation without the overhead of caching intermediate results and run-time checking for duplicate goals.

••

TL;DR: Although it is shown that transaction- based specification and constraint-based specification are incomparable, constraints of practical interest that have corresponding transactional schemas are identified and the preservation of constraints by transactions is studied.

Abstract: An operational approach to database specification is proposed and investigated. Valid database states are described as the states resulting from the application of admissible transactions, specified by a transactional schema. The approach is similar in spirit to the modeling of behavior by methods and encapsulation in object-oriented systems. The transactions considered are line programs consisting of insertions, deletions, and modifications, using simple selection conditions. The results concern basic properties of transactional schemas, as well as the connection with traditional constraint schemas. In particular, the expressive power of transactional schemas is characterized. Although it is shown that transaction-based specification and constraint-based specification are incomparable, constraints of practical interest that have corresponding transactional schemas are identified. The preservation of constraints by transactions is also studied.

••

TL;DR: A new model for dynamic programming and branch and bound algorithms is presented that views these algorithms as utilizing computationally feasible dominance relations to infer the orderings of application objects, thereby implicitly enumerating a finite solution space.

Abstract: A new model for dynamic programming and branch and bound algorithms is presented. The model views these algorithms as utilizing computationally feasible dominance relations to infer the orderings of application objects, thereby implicitly enumerating a finite solution space. The formalism is broad enough to apply the computational strategies of dynamic programming and branch and bound to problems with nonassociative objects, and can model both oblivious and nonoblivious algorithms, as well as parallel algorithms. The model is used to classify computations based, in part, on the types of computationally feasible dominances that they employ. It is demonstrated that the model is computationally precise enough to support the derivation of lower bounds on the number of operations required to solve various types of problems.

••

TL;DR: It is shown that if q/2 ≪ n ≤ q + 1, the tight bound Mq(n) = 3n + 1 - ⌊q/2⌋ is established.

Abstract: Let Mq(n) denote the number of multiplications required to compute the coefficients of the product of two polynomials of degree n over a q-element field by means of bilinear algorithms. It is shown that Mq(n) n 3n - o(n). In particular, if q/2

••

TL;DR: A new computational algorithm called distribution analysis by chain (DAC) is developed that computes joint queue-length distributions for product-form queuing networks with single-server fixed rate, infinite server, and queue-dependent service centers.

Abstract: A new computational algorithm called distribution analysis by chain (DAC) is developed. This algorithm computes joint queue-length distributions for product-form queuing networks with single-server fixed rate, infinite server, and queue-dependent service centers. Joint distributions are essential in problems such as the calculation of availability measures using queuing network models. The algorithm is efficient since the cost to evaluate joint queue-length probabilities is of the same order as the number of these probabilities. This contrasts with the cost of evaluating these probabilities using previous algorithms. The DAC algorithm also computes mean queue lengths and throughputs more efficiently than the recently proposed RECAL and MVAC algorithms. Furthermore, the algorithm is numerically stable and its recursion is surprisingly simple.

••

TL;DR: An algebra is proposed that does allow us to simplify relations by disregarding the internal structure of a certain class of information, based on a careful manipulation of attribute names.

Abstract: The algebras and query languages for nested relations defined thus far do not allow us to “flatten” a relation scheme by disregarding the internal representation of data. In real life, however, the degree in which the structure of certain information, such as addresses, phone numbers, etc., is taken into account depends on the particular application and may even vary in time. Therefore, an algebra is proposed that does allow us to simplify relations by disregarding the internal structure of a certain class of information. This algebra is based on a careful manipulation of attribute names. Furthermore, the key operator in this algebra, called “copying,” allows us to deal with various other common queries in a very uniform manner, provided these queries are interpreted as operations on classes of semantically equivalent relations rather than individual relations. Finally, it is shown that the proposed algebra is complete in the sense of Bancilhon and Paredaens.