scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the ACM in 2003"


Journal ArticleDOI
TL;DR: An automatic iterative abstraction-refinement methodology that extends symbolic model checking to large hardware designs and devise new symbolic techniques that analyze such counterexamples and refine the abstract model correspondingly.
Abstract: The state explosion problem remains a major hurdle in applying symbolic model checking to large hardware designs. State space abstraction, having been essential for verifying designs of industrial complexity, is typically a manual process, requiring considerable creativity and insight.In this article, we present an automatic iterative abstraction-refinement methodology that extends symbolic model checking. In our method, the initial abstract model is generated by an automatic analysis of the control structures in the program to be verified. Abstract models may admit erroneous (or "spurious") counterexamples. We devise new symbolic techniques that analyze such counterexamples and refine the abstract model correspondingly. We describe aSMV, a prototype implementation of our methodology in NuSMV. Practical experiments including a large Fujitsu IP core design with about 500 latches and 10000 lines of SMV code confirm the effectiveness of our approach.

1,040 citations


Journal ArticleDOI
TL;DR: The algorithm runs in polynomial time for the case of parity functions that depend on only the first O(log n log log n) bits of input, which provides the first known instance of an efficient noise-tolerant algorithm for a concept class that is not learnable in the Statistical Query model of Kearns [1998].
Abstract: We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise, a problem closely related to several cryptographic and coding problems. Our algorithm runs in polynomial time for the case of parity functions that depend on only the first O(log n log log n) bits of input, which provides the first known instance of an efficient noise-tolerant algorithm for a concept class that is not learnable in the Statistical Query model of Kearns [1998]. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model.In coding-theory terms, what we give is a poly(n)-time algorithm for decoding linear k × n codes in the presence of random noise for the case of k = c log n log log n for some c > 0. (The case of k = O(log n) is trivial since one can just individually check each of the 2k possible messages and choose the one that yields the closest codeword.)A natural extension of the statistical query model is to allow queries about statistical properties that involve t-tuples of examples, as opposed to just single examples. The second result of this article is to show that any class of functions learnable (strongly or weakly) with t-wise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence, this natural extension to the statistical query model does not increase the set of weakly learnable functions.

689 citations


Journal ArticleDOI
TL;DR: In this paper, the authors formalized the dual fitting and the idea of factor-revealing LP for the metric uncapacitated facility location problem and proposed a greedy algorithm with running time of O(m log m) and O(n 3 ) where m is the total number of vertices and n is the number of edges in the underlying complete bipartite graph.
Abstract: In this article, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the metric uncapacitated facility location problem. Their approximation factors are 1.861 and 1.61, with running times of O(m log m) and O(n3), respectively, where n is the total number of vertices and m is the number of edges in the underlying complete bipartite graph between cities and facilities. The algorithms are used to improve recent results for several variants of the problem.

441 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a new approach to inference in Bayesian networks, which is based on representing the network using a polynomial and then retrieving answers to probabilistic queries by evaluating and differentiating the poynomial.
Abstract: We present a new approach to inference in Bayesian networks, which is based on representing the network using a polynomial and then retrieving answers to probabilistic queries by evaluating and differentiating the polynomial. The network polynomial itself is exponential in size, but we show how it can be computed efficiently using an arithmetic circuit that can be evaluated and differentiated in time and space linear in the circuit size. The proposed framework for inference subsumes one of the most influential methods for inference in Bayesian networks, known as the tree-clustering or jointree method, which provides a deeper understanding of this classical method and lifts its desirable characteristics to a much more general setting. We discuss some theoretical and practical implications of this subsumption.

440 citations


Journal ArticleDOI
Tony Hoare1
TL;DR: This contribution proposes a set of criteria that distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers.
Abstract: This contribution proposes a set of criteria that distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers. As an example drawn from Computer Science, it revives an old challenge: the construction and application of a verifying compiler that guarantees correctness of a program before running it.

280 citations


Journal ArticleDOI
TL;DR: The upper and lower bounds on the maximum load are tight up to additive constants, proving that the Always-Go-Left algorithm achieves an almost optimal load balancing among all sequential multiple-choice algorithm.
Abstract: This article deals with randomized allocation processes placing sequentially n balls into n bins. We consider multiple-choice algorithms that choose d locations (bins) for each ball at random, inspect the content of these locations, and then place the ball into one of them, for example, in a location with minimum number of balls. The goal is to achieve a good load balancing. This objective is measured in terms of the maximum load, that is, the maximum number of balls in the same bin.Multiple-choice algorithms have been studied extensively in the past. Previous analyses typically assume that the d locations for each ball are drawn uniformly and independently from the set of all bins. We investigate whether a nonuniform or dependent selection of the d locations of a ball may lead to a better load balancing. Three types of selection, resulting in three classes of algorithms, are distinguished: (1) uniform and independent, (2) nonuniform and independent, and (3) nonuniform and dependent.Our first result shows that the well-studied uniform greedy algorithm (class 1) does not obtain the smallest possible maximum load. In particular, we introduce a nonuniform algorithm (class 2) that obtains a better load balancing. Surprisingly, this algorithm uses an unfair tie-breaking mechanism, called Always-Go-Left, resulting in an asymmetric assignment of the balls to the bins. Our second result is a lower bound showing that a dependent allocation (class 3) cannot yield significant further improvement.Our upper and lower bounds on the maximum load are tight up to additive constants, proving that the Always-Go-Left algorithm achieves an almost optimal load balancing among all sequential multiple-choice algorithm. Furthermore, we show that the results for the Always-Go-Left algorithm can be generalized to allocation processes with more balls than bins and even to infinite processes in which balls are inserted and deleted by an oblivious adversary.

260 citations


Journal ArticleDOI
TL;DR: A class of approximation algorithms that extend the idea of bounded-complexity inference, inspired by successful constraint propagation algorithms, to probabilistic inference and combinatorial optimization to bound the dimensionality of dependencies created by inference algorithms.
Abstract: This article presents a class of approximation algorithms that extend the idea of bounded-complexity inference, inspired by successful constraint propagation algorithms, to probabilistic inference and combinatorial optimization. The idea is to bound the dimensionality of dependencies created by inference algorithms. This yields a parameterized scheme, called mini-buckets, that offers adjustable trade-off between accuracy and efficiency. The mini-bucket approach to optimization problems, such as finding the most probable explanation (MPE) in Bayesian networks, generates both an approximate solution and bounds on the solution quality. We present empirical results demonstrating successful performance of the proposed approximation scheme for the MPE task, both on randomly generated problems and on realistic domains such as medical diagnosis and probabilistic decoding.

215 citations


Journal ArticleDOI
TL;DR: The first complete problem for SZK, the class of promise problems possessing statistical zero-knowledge proofs (against an honest verifier) is presented, to decide whether two efficiently samplable distributions are either statistically close or far apart.
Abstract: We present the first complete problem for SZK, the class of promise problems possessing statistical zero-knowledge proofs (against an honest verifier). The problem, called Statistical Difference, is to decide whether two efficiently samplable distributions are either statistically close or far apart. This gives a new characterization of SZK that makes no reference to interaction or zero knowledge.We propose the use of complete problems to unify and extend the study of statistical zero knowledge. To this end, we examine several consequences of our Completeness Theorem and its proof, such as:---A way to make every (honest-verifier) statistical zero-knowledge proof very communication efficient, with the prover sending only one bit to the verifier (to achieve soundness error 1/2).---Simpler proofs of many of the previously known results about statistical zero knowledge, such as the Fortnow and Aiello--Hestad upper bounds on the complexity of SZK and Okamoto's result that SZK is closed under complement.---Strong closure properties of SZK that amount to constructing statistical zero-knowledge proofs for complex assertions built out of simpler assertions already shown to be in SZK.---New results about the various measures of "knowledge complexity," including a collapse in the hierarchy corresponding to knowledge complexity in the "hint" sense.---Algorithms for manipulating the statistical difference between efficiently samplable distributions, including transformations that "polarize" and "reverse" the statistical relationship between a pair of distributions.

165 citations


Journal ArticleDOI
TL;DR: This article provides the final step in the classification of complexity for satisfiability problems over constraints expressed in Allen's interval algebra, and shows that this algebra contains exactly eighteen maximal tractable subalgebras, and reasoning in any fragment not entirely contained in one of these subalagbras is NP-complete.
Abstract: Allen's interval algebra is one of the best established formalisms for temporal reasoning. This article provides the final step in the classification of complexity for satisfiability problems over constraints expressed in this algebra. When the constraints are chosen from the full Allen's algebra, this form of satisfiability problem is known to be NP-complete. However, eighteen tractable subalgebras have previously been identified; we show here that these subalgebras include all possible tractable subsets of Allen's algebra. In other words, we show that this algebra contains exactly eighteen maximal tractable subalgebras, and reasoning in any fragment not entirely contained in one of these subalgebras is NP-complete. We obtain this dichotomy result by giving a new uniform description of the known maximal tractable subalgebras, and then systematically using a general algebraic technique for identifying maximal subalgebras with a given property.

162 citations


Journal ArticleDOI
TL;DR: A simple and new randomized primality testing algorithm is given by reducing primalityTesting for number n to testing if a specific univariate identity over Zn holds, and two new randomized algorithms forTesting if a multivariate polynomial, over a finite field or over rationals, is identically zero are given.
Abstract: We give a simple and new randomized primality testing algorithm by reducing primality testing for number n to testing if a specific univariate identity over Zn holds.We also give new randomized algorithms for testing if a multivariate polynomial, over a finite field or over rationals, is identically zero. The first of these algorithms also works over Zn for any n. The running time of the algorithms is polynomial in the size of arithmetic circuit representing the input polynomial and the error parameter. These algorithms use fewer random bits and work for a larger class of polynomials than all the previously known methods, for example, the Schwartz--Zippel test [Schwartz 1980; Zippel 1979], Chen--Kao and Lewin--Vadhan tests [Chen and Kao 1997; Lewin and Vadhan 1998].

121 citations


Journal ArticleDOI
TL;DR: The condition-based approach to solve the consensus problem in asynchronous systems is introduced and two examples of realistic acceptable conditions are presented, and proved to be maximal, in the sense that they cannot be extended and remain acceptable.
Abstract: This article introduces and explores the condition-based approach to solve the consensus problem in asynchronous systems. The approach studies conditions that identify sets of input vectors for which it is possible to solve consensus despite the occurrence of up to f process crashes. The first main result defines acceptable conditions and shows that these are exactly the conditions for which a consensus protocol exists. Two examples of realistic acceptable conditions are presented, and proved to be maximal, in the sense that they cannot be extended and remain acceptable. The second main result is a generic consensus shared-memory protocol for any acceptable condition. The protocol always guarantees agreement and validity, and terminates (at least) when the inputs satisfy the condition with which the protocol has been instantiated, or when there are no crashes. An efficient version of the protocol is then designed for the message passing model that works when f

Journal ArticleDOI
Bruce Reed1
TL;DR: It is shown that there exist constants α = 4.311… and β = 1.953 such that E(Hnln n − β(i) ln n + O(1), and thatVar(H) = O(1), which indicates the height of a random binary search tree on H(n) nodes.
Abstract: Let Hn be the height of a random binary search tree on n nodes. We show that there exist constants α = 4.311… and β = 1.953… such that E(Hn) = αln n − βln ln n + O(1), We also show that Var(Hn) = O(1).

Journal ArticleDOI
TL;DR: The Turing Test is a test for computational intelligence in which a human judgment must be made concerning whether a set of observed behaviors is sufficiently similar to human behaviors that the same word—intelligent—can justifiably be used.
Abstract: When the terms “intelligence” or “intelligent” are used by scientists, they are referring to a large collection of human cognitive behaviors— people thinking When life scientists speak of the intelligence of animals, they are asking us to call to mind a set of human behaviors that they are asserting the animals are (or are not) capable of When computer scientists speak of artificial intelligence, machine intelligence, intelligent agents, or (as I chose to do in the title of this essay) computational intelligence, we are also referring to that set of human behaviors Although intelligence means people thinking, we might be able to replicate the same set of behaviors using computation Indeed, one branch of modern cognitive psychology is based on the model that the human mind and brain are complex computational “engines,” that is, we ourselves are examples of computational intelligence 2 Turing’s Vision and the Turing Test for Humanoid Behavior The idea, of course, is not new It was discussed by Turing in the 1940s In the play about Turing’s life, Breaking the Code [Whitemore 1987], Turing is shown visiting his old grammar school and delivering a talk to the boys, in which he offers a vision of the thinking computer The memories of those of Turing’s colleagues of the 1940s who are still alive confirm that he spoke often of this vision In 1950, he wrote of it, in a famous article [Turing 1950], in which he proposed a test (now called the Turing Test (TT)) for computational intelligence In the test, a human judgment must be made concerning whether a set of observed behaviors is sufficiently similar to human behaviors that the same word—intelligent—can justifiably be used The judgment is about behavior not mechanism Computers are not like human brains, but if they perform the same acts and one performer (the human) is labeled intelligent, then the other must be labeled intelligent also I have always liked the Turing Test because it gave a clear and tangible vision, was reasonably objective, and made concrete the tie to human behavior by using the unarticulated criteria of a human judge Turing Award winner Jim Gray, who works in fields of Computer Science other than AI, appears to agree His list of challenges for the future includes: “The Turing test: Win the imitation game 30% of the time” Significantly, he adds: “Read and understand as well as a human Think and write as well as a human,” [Gray 2003] I will have more to say about necessary conditions for these human activities later But there are problems with the Turing Test (TT) Human intelligence is very multidimensional However, the judge must fuse all of these dimensions into a

Journal ArticleDOI
TL;DR: The first time-space lower bound trade-offs for randomized computation of decision problems are proved, and the bounds hold even in the case that the computation is allowed to have arbitrary probability of error on a small fraction of inputs.
Abstract: We prove the first time-space lower bound trade-offs for randomized computation of decision problems. The bounds hold even in the case that the computation is allowed to have arbitrary probability of error on a small fraction of inputs. Our techniques are extension of those used by Ajtai and by Beame, Jayram, and Saks that applied to deterministic branching programs. Our results also give a quantitative improvement over the previous results.Previous time-space trade-off results for decision problems can be divided naturally into results for functions with Boolean domain, that is, each input variable is {0,1}-valued, and the case of large domain, where each input variable takes on values from a set whose size grows with the number of variables.In the case of Boolean domain, Ajtai exhibited an explicit class of functions, and proved that any deterministic Boolean branching program or RAM using space S = o(n) requires superlinear time T to compute them. The functional form of the superlinear bound is not given in his paper, but optimizing the parameters in his arguments gives T = Ω(n log log n/log log log n) for S = O(n1-ϵ). For the same functions considered by Ajtai, we prove a time-space trade-off (for randomized branching programs with error) of the form T = Ω(n √ log(n/S)/log log (n/S)). In particular, for space O(n1-ϵ), this improves the lower bound on time to Ω(n√ log n/log log n).In the large domain case, we prove lower bounds of the form T = Ω(n√ log(n/S)/log log (n/S)) for randomized computation of the element distinctness function and lower bounds of the form T = Ω(n log (n/S)) for randomized computation of Ajtai's Hamming closeness problem and of certain functions associated with quadratic forms over large fields.

Journal ArticleDOI
TL;DR: The fundamental biological structures are rich enough to repay study and yet simple enough that there is hope of making real progress on an information theory of structure, and the most fundamental gap in the theoretical underpinnings of information science and of computer science is addressed.
Abstract: and the associated concept of noise, have proved rich sources of further theory and of applications galore. We have no theory, however, that gives us a metric for the information embodied in structure, especially physical structure. We know that an automobile is a more complex structure than a rowboat. We cannot yet say it is x imes more complex, wherex is some number. Yet we know that the complexity is related to the Shannon information that would be required to specify the structures of the car and the boat. I consider this missing metric to be the most fundamental gap in the theoretical underpinnings of information science and of computer science. Recent developments, however, make it timely to address it. The fundamental biological structures are rich enough to repay study and yet simple enough that there is hope of making real progress on an information theory of structure. (Rowboats and automobiles are much too hard.) The coding of genetic information by DNA is apparently simple enough that it can be handled with existing communication theory. The folded structure of proteins is not. Entropic and energetic considerations are necessary, but not yet sufficient, for explaining and predicting that structure, even after the amino acid sequence is known. Yet proteins are relatively simple structures. A young information theory scholar willing to spend years on a deeply fundamental problem need look no further.

Journal ArticleDOI
Jim Gray1
TL;DR: In this article, the authors define a set of fundamental research problems that broaden the Babbage, Bush, and Turing visions of computing, including highly secure, highly available, self-programming, selfmanaging, and self-replicating systems.
Abstract: Charles Babbage's vision of computing has largely been realized. We are on the verge of realizing Vannevar Bush's Memex. But, we are some distance from passing the Turing Test. These three visions and their associated problems have provided long-range research goals for many of us. For example, the scalability problem has motivated me for several decades. This talk defines a set of fundamental research problems that broaden the Babbage, Bush, and Turing visions. They extend Babbage's computational goal to include highly-secure, highly-available, self-programming, self-managing, and self-replicating systems. They extend Bush's Memex vision to include a system that automatically organizes, indexes, digests, evaluates, and summarizes information (as well as a human might). Another group of problems extends Turing's vision of intelligent machines to include prosthetic vision, speech, hearing, and other senses. Each problem is simply stated and each is orthogonal from the others, though they share some common core technologies

Journal ArticleDOI
TL;DR: It is proved that three apparently unrelated fundamental problems in distributed computing, cryptography, and complexity theory, are essentially the same problem, and the definition of weak zero-knowledge is obtained by a sequence of weakenings of the standard definition, forming a hierarchy.
Abstract: We prove that three apparently unrelated fundamental problems in distributed computing, cryptography, and complexity theory, are essentially the same problem. These three problems and brief descriptions of them follow. (1) The selective decommitment problem. An adversary is given commitments to a collection of messages, and the adversary can ask for some subset of the commitments to be opened. The question is whether seeing the decommitments to these open plaintexts allows the adversary to learn something unexpected about the plaintexts that are unopened. (2) The power of 3-round weak zero-knowledge arguments. The question is what can be proved in (a possibly weakened form of) zero-knowledge in a 3-round argument. In particular, is there a language outside of BPP that has a 3-round public-coin weak zero-knowledge argument? (3) The Fiat-Shamir methodology. This is a method for converting a 3-round public-coin argument (viewed as an identification scheme) to a 1-round signature scheme. The method requires what we call a "magic function" that the signer applies to the first-round message of the argument to obtain a second-round message (queries from the verifier). An open question here is whether every 3-round public-coin argument for a language outside of BPP has a magic function.It follows easily from definitions that if a 3-round public-coin argument system is zero-knowledge in the standard (fairly strong) sense, then it has no magic function. We define a weakening of zero-knowledge such that zero-knowledge ⇒ no-magic-function still holds. For this weakened form of zero-knowledge, we give a partial converse: informally, if a 3-round public-coin argument system is not weakly zero-knowledge, then some form of magic is possible for this argument system. We obtain our definition of weak zero-knowledge by a sequence of weakenings of the standard definition, forming a hierarchy. Intermediate forms of zero-knowledge in this hierarchy are reasonable ones, and they may be useful in applications. Finally, we relate the selective decommitment problem to public-coin proof systems and arguments at an intermediate level of the hierarchy, and obtain several positive security results for selective decommitment.

Journal ArticleDOI
Peter W. Shor1
TL;DR: The question of why so few classes of quantum algorithms have been discovered is examined, and some thoughts about what lines of research might lead to the discovery of more quantum algorithms are given.
Abstract: I examine the question of why so few classes of quantum algorithms have been discovered. I give two possible explanations for this, and some thoughts about what lines of research might lead to the discovery of more quantum algorithms.

Journal ArticleDOI
TL;DR: The sieve is the cornerstone of the first wait-free algorithms that adapt to point contention using only read and write operations and efficient algorithms for long-lived renaming, timestamping and collecting information are presented.
Abstract: This article introduces the sieve, a novel building block that allows to adapt to the number of simultaneously active processes (the point contention) during the execution of an operation. We present an implementation of the sieve in which each sieve operation requires O(k log k) steps, where k is the point contention during the operation.The sieve is the cornerstone of the first wait-free algorithms that adapt to point contention using only read and write operations. Specifically, we present efficient algorithms for long-lived renaming, timestamping and collecting information.

Journal ArticleDOI
TL;DR: This article studies analogs of classical relational calculus in the context of strings, and shows that by choosing the string vocabulary carefully, one gets string logics that have desirable properties: computable evaluation and normal forms.
Abstract: We study analogs of classical relational calculus in the context of strings. We start by studying string logics. Taking a classical model-theoretic approach, we fix a set of string operations and look at the resulting collection of definable relations. These form an algebra---a class of n-ary relations for every n, closed under projection and Boolean operations. We show that by choosing the string vocabulary carefully, we get string logics that have desirable properties: computable evaluation and normal forms. We identify five distinct models and study the differences in their model-theory and complexity of evaluation. We identify a subset of these models that have additional attractive properties, such as finite VC dimension and quantifier elimination.Once you have a logic, the addition of free predicate symbols gives you a string query language. The resulting languages have attractive closure properties from a database point of view: while SQL does not allow the full composition of string pattern-matching expressions with relational operators, these logics yield compositional query languages that can capture common string-matching queries while remaining tractable. For each of the logics studied in the first part of the article, we study properties of the corresponding query languages. We give bounds on the data complexity of queries, extend the normal form results from logics to queries, and show that the languages have corresponding algebras expressing safe queries.

Journal ArticleDOI
TL;DR: Three problems that each have a history of research that goes back to the early days of computer science are described, and it is believed that they are well-posed scientific questions that will be solved eventually.
Abstract: We describe three problems that each have a history of research that goes back to the early days of computer science. These past efforts, which we do not review here, have succeeded in uncovering some the essence of these otherwise elusive challenges, and make it possible to focus on these questions more clearly now. Difficult as they may appear to be, they are well-posed scientific questions, we believe, and therefore will be solved eventually. While their solutions can be expected to have mathematical content, the problems as we state them are not purely mathematical. In each case, a part of the solution would be the emergence of some consensus on the nature of the right mathematical formulation.

Journal ArticleDOI
TL;DR: It is shown here that differentiation and integration on distributions are computable operator, and various types of Fourier transforms and convolutions are also computable operators.
Abstract: The theory of generalized functions is the foundation of the modern theory of partial differential equations (PDE). As computers are playing an ever-larger role in solving PDEs, it is important to know those operations involving generalized functions in analysis and PDE that can be computed on digital computers. In this article, we introduce natural concepts of computability on test functions and generalized functions, as well as computability on Schwartz test functions and tempered distributions. Type-2 Turing machines are used as the machine model [Weihrauch 2000]. It is shown here that differentiation and integration on distributions are computable operators, and various types of Fourier transforms and convolutions are also computable operators. As an application, it is shown that the solution operator of the distributional inhomogeneous three dimensional wave equation is computable.

Journal ArticleDOI
TL;DR: In this article, it is observed that there is fundamental tension between the Extended Church--Turing Thesis and the existence of numerous seemingly intractable computational problems arising from classical physics.
Abstract: Would physical laws permit the construction of computing machines that are capable of solving some problems much faster than the standard computational model? Recent evidence suggests that this might be the case in the quantum world. But the question is of great interest even in the realm of classical physics. In this article, we observe that there is fundamental tension between the Extended Church--Turing Thesis and the existence of numerous seemingly intractable computational problems arising from classical physics. Efforts to resolve this incompatibility could both advance our knowledge of the theory of computation, as well as serve the needs of scientific computing.

Journal ArticleDOI
TL;DR: This article gives a randomized nonclairvoyant algorithm, RMLF, that has competitive ratio O(log n log log n) against an oblivious adversary, further justifying the adoption of this algorithm.
Abstract: We consider the problem of scheduling a collection of dynamically arriving jobs with unknown execution times so as to minimize the average flow time. This is the classic CPU scheduling problem faced by time-sharing operating systems where preemption is allowed. It is easy to see that every algorithm that doesn't unnecessarily idle the processor is at worst n-competitive, where n is the number of jobs. Yet there was no known nonclairvoyant algorithm, deterministic or randomized, with a competitive ratio provably O(n1−e). In this article, we give a randomized nonclairvoyant algorithm, RMLF, that has competitive ratio O(log n log log n) against an oblivious adversary. RMLF is a slight variation of the multilevel feedback (MLF) algorithm used by the UNIX operating system, further justifying the adoption of this algorithm. It is known that every randomized nonclairvoyant algorithm is Ω(log n)-competitive, and that every deterministic nonclairvoyant algorithm is Ω(n1/3)-competitive.

Journal ArticleDOI
TL;DR: It is shown that all centralized absolute moments of the height of binary search trees of size and saturation level and of the saturation level of Hn′ are bounded.
Abstract: It is shown that all centralized absolute moments EvHn − EHnvα (α ≥ 0) of the height Hn of binary search trees of size n and of the saturation level Hn′ are bounded. The methods used rely on the analysis of a retarded differential equation of the form Φ′(u) = −α−2Φ(u/α)2 with α > 1. The method can also be extended to prove the same result for the height of m-ary search trees. Finally the limiting behaviour of the distribution of the height of binary search trees is precisely determined.

Journal ArticleDOI
TL;DR: It is shown that for any polyhedral norm, the problem of finding a tour of the maximum length can be solved in polynomial time, and it is proved that, for the case of Euclidean distances in &##x211D;d for ≥ 3, the Maximum TSP is NP-hard.
Abstract: We consider the traveling salesman problem when the cities are points in ℝd for some fixed d and distances are computed according to geometric distances, determined by some norm. We show that for any polyhedral norm, the problem of finding a tour of maximum length can be solved in polynomial time. If arithmetic operations are assumed to take unit time, our algorithms run in time O(nf-2 log n), where f is the number of facets of the polyhedron determining the polyhedral norm. Thus, for example, we have O(n2 log n) algorithms for the cases of points in the plane under the Rectilinear and Sup norms. This is in contrast to the fact that finding a minimum length tour in each case is NP-hard. Our approach can be extended to the more general case of quasi-norms with a not necessarily symmetric unit ball, where we get a complexity of O(n2f-2 log n).For the special case of two-dimensional metrics with f = 4 (which includes the Rectilinear and Sup norms), we present a simple algorithm with O(n) running time. The algorithm does not use any indirect addressing, so its running time remains valid even in comparison based models in which sorting requires Ω(n log n) time. The basic mechanism of the algorithm provides some intuition on why polyhedral norms allow fast algorithms.Complementing the results on simplicity for polyhedral norms, we prove that, for the case of Euclidean distances in ℝd for d ≥ 3, the Maximum TSP is NP-hard. This sheds new light on the well-studied difficulties of Euclidean distances.

Journal ArticleDOI
TL;DR: In this article, the authors study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. They provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates.
Abstract: We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [2000], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on this theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35--55p improvement over gzip with negligible slowdown; the off-line reordering provides up to 20p further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.

Journal ArticleDOI
TL;DR: This article proposes a framework within which properties of the SSA form and φ-placement algorithms are derived, based on a new relation called merge which captures succinctly the structure of a program's control flow graph that is relevant to its SSAform.
Abstract: The Static Single Assignment (SSA) form is a program representation used in many optimizing compilers. The key step in converting a program to SSA form is called φ-placement. Many algorithms for φ-placement have been proposed in the literature, but the relationships between these algorithms are not well understood.In this article, we propose a framework within which we systematically derive (i) properties of the SSA form and (ii) φ-placement algorithms. This framework is based on a new relation called merge which captures succinctly the structure of a program's control flow graph that is relevant to its SSA form. The φ-placement algorithms we derive include most of the ones described in the literature, as well as several new ones. We also evaluate experimentally the performance of some of these algorithms on the SPEC92 benchmarks.Some of the algorithms described here are optimal for a single variable. However, their repeated application is not necessarily optimal for multiple variables. We conclude the article by describing such an optimal algorithm, based on the transitive reduction of the merge relation, for multi-variable φ-placement in structured programs. The problem for general programs remains open.

Journal ArticleDOI
TL;DR: The P versus NP problem is to determine whether every language accepted by some nondeterministic Turing machine in polynomial time is also acceptance by some deterministic Turing Machine in poynomial time.
Abstract: The P versus NP problem is to determine whether every language accepted by some nondeterministic Turing machine in polynomial time is also accepted by some deterministic Turing machine in polynomial time. Unquestionably this problem has caught the interest of the mathematical community. For example, it is the first of seven million-dollar “Millennium Prize Problems” listed by the Clay Mathematics Institute [www.claymath.org]. The Riemann Hypothesis and Poincare Conjecture, both mathematical classics, are farther down the list. On the other hand, Fields Medalist Steve Smale lists P versus NP as problem number three, after Riemann and Poincare, in “Mathematical Problems for the Next Century” [Smale 1998]. But P versus NP is also a problem of central interest in computer science. It was posed thirty years ago [Cook 1971; Levin 1973] as a problem concerned with the fundamental limits of feasible computation. Although this question is front and center in complexity theory, NP-completeness proofs have become pervasive in many other areas of computer science, including artificial intelligence, databases, programming languages, and computer networks (see Garey and Johnson [1979] for 300 early examples). If the question is resolved, what would be the consequences? Consider first a proof of P=NP. It is possible that the proof is nonconstructive, in the sense that it does not yield an algorithm for any NP-complete problem. Or it might give an impractical algorithm, for example, running in time n100. In either of these cases, the proof would probably have few practical consequences other than to disappoint complexity theorists. However, experience has shown that when natural problems are proved to be in P, a feasible algorithm can be found. There are potential counterexamples to this assertion; most famously, the deep results of Robertson and Seymour [1993–1995], who prove that every minor closed family of graphs can be recognized in time O(n3), but their algorithm has such huge constants it is not practical. But practical algorithms are known for some specific minor-closed families (such as planar graphs), and possibly could be found for other examples if sufficient effort is expended. If P=NP is proved by exhibiting a truly feasible algorithm for an NP-complete problem such as SATISFIABILITY (deciding whether a collection of propositional clauses has a satisfying assignment), the practical consequences would be stunning. First, most of the hundreds of problems shown to be NP-complete can be efficiently reduced to SATISFIABILITY, so many of the optimization problems important to industry could be solved. Second, mathematics would be transformed, because computers could find a formal proof of any theorem which has a proof of reasonable length. This is because formal proofs (say in Zermelo–Fraenkel set theory) are

Journal ArticleDOI
TL;DR: Once a computer can read one book and prove that it understands it by answering questions about it correctly, then, in principle, it can read all the books that have ever been written.
Abstract: Reading and understanding books is a quintessentially human activity. It is the process by which much knowledge transfer occurs from generation to generation. For example, we test students’ understanding of a given subject by asking them to answer questions at the end of the chapter in a textbook. In general, we are satisfied if a student correctly answers 80 to 90% of the questions. When a computer can do the same task, we will have arrived at a significant milestone. A human being who starts reading at age 4, lives to be 100 years old, and reads a book a day every day could complete 35,000 books in a lifetime. By many estimates, the total number books ever written in all languages is under 100 million. Harvard library has around 12 million volumes. The Library of Congress has fewer than 30 million volumes. All the unique titles in the OCLC member libraries is under 42 million. However, US libraries do not have most of the books published in other languages internationally, thus leading to an approximate estimate of 100 million books ever published. Once a computer can read one book and prove that it understands it by answering questions about it correctly, then, in principle, it can read all the books that have ever been written. This has led to the speculation that once computers can read, understand, and share knowledge with each other, without the limitations that biology imposes, they will begin to exhibit super human intelligence. For a machine to read a book, understand it and answer questions about it, it needs mechanisms