scispace - formally typeset
Search or ask a question

Showing papers in "The Computer Journal in 1999"


Journal ArticleDOI
TL;DR: TANE is an efficient algorithm for finding functional dependencies from large databases based on partitioning the set of rows with respect to their attribute values, which makes testing the validity of functional dependencies fast even for a large number of tuples.
Abstract: The discovery of functional dependencies from relations is an important database analysis technique. We present TANE, an efficient algorithm for finding functional dependencies from large databases. TANE is based on partitioning the set of rows with respect to their attribute values, which makes testing the validity of functional dependencies fast even for a large number of tuples. The use of partitions also makes the discovery of approximate functional dependencies easy and efficient and the erroneous or exceptional rows can be identified easily. Experiments show that T ANE is fast in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods.

602 citations


Journal ArticleDOI
TL;DR: This work attempts to establish a parallel between a restricted (two-part) version of the Kolmogorov model and the minimum message length approach to statistical inference and machine learning of Wallace and Boulton (1968), in which an ‘explanation’ of a data string is modelled as a two-part message.
Abstract: The notion of algorithmic complexity was developed by Kolmogorov (1965) and Chaitin (1966) independently of one another and of Solomonoff’s notion (1964) of algorithmic probability. Given a Turing machine T , the (prefix) algorithmic complexity of a string S is the length of the shortest input to T which would cause T to output S and stop. The Solomonoff probability of S given T is the probability that a random binary string of 0s and 1s will result in T producing an output having S as a prefix. We attempt to establish a parallel between a restricted (two-part) version of the Kolmogorov model and the minimum message length approach to statistical inference and machine learning of Wallace and Boulton (1968), in which an ‘explanation’ of a data string is modelled as a two-part message, the first part stating a general hypothesis about the data and the second encoding details of the data not implied by the hypothesis. Solomonoff’s model is tailored to prediction rather than inference in that it considers not just the most likely explanation, but it also gives weights to all explanations depending upon their posterior probability. However, as the amount of data increases, we typically expect the most likely explanation to have a dominant weighting in the prediction.

371 citations


Journal ArticleDOI
TL;DR: It is shown experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access.
Abstract: Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems, and numeric data forms a large part of most databases, Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decoding such a representation is less than that of retrieving uncompressed data. In this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare different approaches to compressing integers, including the Elias gamma and delta codes, Golomb coding, and a variable-byte integer scheme. As a conclusion, we recommend that, for fast access to integers, files be stored compressed.

248 citations


Journal ArticleDOI
TL;DR: The central idea of the MDL (Minimum Description Length) principle is to represent a class of models (hypotheses) by a universal model capable of imitating the behavior of any model in the class.
Abstract: The central idea of the MDL (Minimum Description Length) principle is to represent a class of models (hypotheses) by a universal model capable of imitating the behavior of any model in the class. The principle calls for a model class whose representative assigns the largest probability or density to the observed data. Two examples of universal models for parametric classes M are the normalized maximum likelihood (NML) model

164 citations


Journal ArticleDOI
TL;DR: The Digestor system automatically converts web-based documents designed for desktop viewinginto formats appropriate for handheld devices with small display screens, such as Palm-PCs, PDAs, and cellular phones.
Abstract: The Digestor system automatically converts web-based documents designed for desktop viewinginto formats appropriate for handheld devices with small display screens, such as Palm-PCs, PDAs,and cellular phones. Digestor employs a heuristic planning algorithm and a set of structural pagetransformations to produce the ‘best’ looking document for a given display size. Digestor can also beinstructed, via a scripting language, to render portions of documents, thereby avoiding navigationthrough many screens of information. Two versions of Digestor have been deployed, one that re-authors HTML into HTML for conventional browsers and one that converts HTML into HDML forPhone.com’s micro-browsers. Digestor provides a crucial technology for rapidly accessing, scanningand processing information from arbitrary web-based documents from any location reachable bywired or unwired communication.

156 citations


Journal ArticleDOI
TL;DR: Four eecient heuristics to predict the location of a mobile user in the cellular mobile environment, which assume a hierarchy of location areas, which might change dynamically with changing traac patterns are presented.
Abstract: We present four eecient heuristics (one basic scheme and three of its variants) to predict the location of a mobile user in the cellular mobile environment. The proposed location management schemes assume a hierarchy of location areas, which might change dynamically with changing traac patterns. A method to compute this hierarchical tree is also proposed. Depending on the proole of the user movements for the last time units, the most probable (and the future probable) location areas are computed for the user in the basic scheme and its rst variant. The second variant predicts the location probabilities of the user in the future cells combining them with those already traversed in the last time units to form the most probable location area. The third variant is a hybrid of the rst and second variants. Finally, the proposed heuristics are validated by extensive simulation of a real time cellular mobile system, where all the four schemes are compared under various traac patterns.

61 citations


Journal ArticleDOI
TL;DR: This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource bounded complexity, and considers also a new type of complexity statistical complexity closely related to mathematical statistics.
Abstract: This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource bounded complexity. We consider also a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity A.N.Kolmogorov’s leading motive was developing on its basis a mathematical theory more adequately substantiating applications of the probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of random object from its complexity characteristics without use the notion of probability. In the first part of this paper we present several results in this direction. Though the following development of algorithmic complexity and randomness was different algorithmic complexity has successful applications in the traditional probabilistic framework. The second part of the paper is a survey of applications to parameters estimation and definition of Bernoulli sequences. All considerations have finite combinatorial character.

60 citations


Journal ArticleDOI
TL;DR: A number of different scenarios and applications within which a redistribution of shares in a secret sharing scheme might be required are described, some techniques for conducting a redistribution are given, and the optimisation of the efficiency of such a process is discussed.
Abstract: We consider the problem of redistributing shares in a secret sharing scheme in such a way that shareholders of a scheme with one access structure can transfer information to a new set of shareholders, resulting in a sharing of the old secret among a new access structure. We describe a number of different scenarios and applications within which such a redistribution might be required, give some techniques for conducting a redistribution, and discuss the optimisation of the efficiency of such a process.

45 citations


Journal ArticleDOI
TL;DR: A few simple observations are pointed out that are worth keeping in mind while discussing the connection between Kolmogorov (algorithm) complexity and minimum description length (minimum message length) principle.
Abstract: The question why and how probability theory can be applied to the real-world phenomena has been discussed for several centuries. When the algorithmic information theory was created, it became possible to discuss these problems in a more specific way. In particular, Li and Vitányi [6], Rissanen [3], Wallace and Dowe [7] have discussed the connection between Kolmogorov (algorithmic) complexity and minimum description length (minimum message length) principle. In this note we try to point out a few simple observations that (we believe) are worth keeping in mind while discussing these topics.

39 citations


Journal ArticleDOI
TL;DR: Simulation results show that the DDCR can significantly improve the system performance under different workload and workload distributions, and its performance is consistently better than the base protocol and the Opt protocols in both main-memory resident and disk-resident DRTDBS.
Abstract: In a distributed real-time database system (DRTDBS), a commit protocol is required to ensure transaction failure atomicity. If data conflicts occur between executing and committing transactions, the performance of the system may be greatly affected. In this paper, we propose a new protocol, called deadline-driven conflict resolution (DDCR), which integrates concurrency control and transaction commitment management for resolving executing and committing data conflicts amongst firm real-time transactions. With the DDCR, a higher degree of concurrency can be achieved, as many data conflicts of such kind can be alleviated, and executing transactions can access data items which are being held by committing transactions in conflicting modes. Also, the impact of temporary failures which occurred during the commitment of a transaction on other transactions, and the dependencies created due to sharing of data items is much reduced by reversing the dependencies between the transactions. A simulation model has been developed and extensive simulation experiments have been performed to compare the performance of the DDCR with other protocols such as the Opt [1], the Healthy-Opt [2], and the base protocol, which use priority inheritance and blocking to resolve the data conflicts. The simulation results show that the DDCR can significantly improve the system performance under different workload and workload distributions. Its performance is consistently better than the base protocol and the Opt protocols in both main-memory resident and disk-resident DRTDBS.

36 citations


Journal ArticleDOI
TL;DR: The design aim is to produce a tool which makes step-by-step proof calculations so straightforward that novices can learn by exploring the use of a pre-encoded logic.
Abstract: Jape is a program which supports the step-by-step interactive development of proofs in formal logics, in the style of proofs-on-paper. It is uncommitted to any particular logic and is customized by a description of a collection of inference rules and the syntax of judgements. It works purely at the surface syntactic level, as a person working on paper might. In that spirit it makes use of explicit visible provisos rather than a conventional encoding of logical properties. Its principal mechanism is unification, employed as a search tool to defer decisions about how to proceed at difficult points in a proof. The design aim is to produce a tool which makes step-by-step proof calculations so straightforward that novices can learn by exploring the use of a pre-encoded logic. Examples of proof development are given in several small logics.

Journal ArticleDOI
TL;DR: Rissanen’s scheme of ‘complete coding’ in which a two-part data code is further shortened by conditioning the second part not only on the estimates, but also on the fact that these estimates were preferred to any others is discussed.
Abstract: We discuss Rissanen’s scheme of ‘complete coding’ in which a two-part data code is further shortened by conditioning the second part not only on the estimates, but also on the fact that these estimates were preferred to any others. We show that the scheme does not lead to improved estimates of parameters. The resulting message lengths may validly be employed to select among competing model classes in a global hypothesis space, but not to select a single member of the chosen class. A related coding scheme is introduced in which the message commences by encoding an ancillary statistic, and then states parameter estimates using a code conditioned on this statistic. The use of Jeffreys priors in MDL codes is questioned and the resulting normalization difficulties and violations of the likelihood principle are discussed. We argue that the MDL objective of avoiding Bayesian priors may be better pursued by other means.

Journal ArticleDOI
TL;DR: The results of the project show that the method provides an effective way to model the business processes rigorously and yet the business users still have the control and flexibility to handle exceptions.
Abstract: A modelling approach for handling business rules and exceptions to support the development of information systems (IS) is presented to enable business rules to be captured rigorously and at the same time to allow handling exceptions. The approach is developed by extending the workflow modelling with Norm Analysis. The workflow modelling is one of the methods of object-oriented information engineering (OOIE), whereas Norm Analysis is one of the methods of a semiotic approach towards IS. The workflow modelling is concerned with the behavioural perspective of IS and describes the sequence of events of a business process. As for Norm Analysis, it identifies responsibilities and rules that govern human behaviour. It also recognizes conditions and constraints of the actions driven by their responsibilities. The paper describes the base methods briefly and illustrates the applications of these two methods on their own. Added value for extending the workflow modelling With Norm Analysis is then identified, and the extended method is presented. The extended method is applied in a case study of an equipment servicing company for analysis of the work processes and designing a computer support system. The results of the project show that the method provides an effective way to model the business processes rigorously and yet the business users still have the control and flexibility to handle exceptions.

Journal ArticleDOI
TL;DR: This paper presents a mobile database system model that takes into account the timing requirements of applications supported by mobile computing systems and provides a transaction execution model with two alternative execution strategies for mobile transactions.
Abstract: A critical issue in mobile data management is to respond to real-time data access requirements of the supported application. However, it is difficult to handle real-time constraints in a mobile computing environment due to the physical constraints imposed by the mobile computer hardware and the wireless network technology. In this paper, we present a mobile database system model that takes into account the timing requirements of applications supported by mobile computing systems. We provide a transaction execution model with two alternative execution strategies for mobile transactions and evaluate the performance of the system considering various mobile system characteristics, such as the number of mobile hosts in the system, the handoff process, disconnection, coordinator site relocation and wireless link failure. Performance results are provided in terms of the fraction of real-time requirements that are satisfied.

Journal ArticleDOI
TL;DR: This special issue contains both material on non-computable aspects of Kolmogorov complexity and material on many fascinating applications based on different ways of approximating Kolmogsorovcomplexity.
Abstract: 1. UNIVERSALITYThe theory of Kolmogorov complexity is based on thediscovery, by Alan Turing in 1936, of the universal Turingmachine. After proposing the Turing machine as anexplanation of the notion of a computing machine, Turingfound that there exists one Turing machine which cansimulate any other Turing machine.Complexity, according to Kolmogorov, can be measuredby the length of the shortest program for a universal Turingmachine that correctly reproduces the observed data. It hasbeen shown that, although there are many universal Turingmachines(andthereforemanypossible ‘shortest’ programs),the corresponding complexities differ by at most an additiveconstant.The main thrust of the theory of Kolmogorov complexityis its ‘universality’; it strives to construct universal learningmethods based on universal coding methods. Thisapproach was originated by Solomonoff and made moreappealing to mathematicians by Kolmogorov. Typicallythese universal methods will be computable only in someweak sense. In applications, therefore, we can only hopeto approximate Kolmogorov complexity and related notions(such as randomnessdeficiency and algorithmic informationmentioned below). This special issue contains both materialon non-computable aspects of Kolmogorov complexity andmaterial on many fascinating applications based on differentways of approximating Kolmogorov complexity.2. BEGINNINGSAs we have already mentioned, the two main originators ofthe theory of Kolmogorovcomplexity were Ray Solomonoff(born 1926) and Andrei Nikolaevich Kolmogorov (1903–1987). The motivations behind their work were completelydifferent; Solomonoff was interested in inductive inferenceand artificial intelligence and Kolmogorov was interestedin the foundations of probability theory and, also, ofinformation theory. They arrived, nevertheless, at the samemathematical notion, which is now known as Kolmogorovcomplexity.In 1964 Solomonoff published his model of inductiveinference. He argued that any inference problem can bepresentedas a problemof extrapolatinga verylongsequenceof binary symbols; ‘given a very long sequence, representedby T, what is the probability that it will be followed by a

Journal ArticleDOI
TL;DR: The mechanism of tree traversal and computation of changing places are new in this paper, which formalizes the idea of combinatorial Gray code by a twisted lexico tree, which is obtained from the lexicographic tree for the given set of combinatorsial objects by twisting branches depending on the parity of the nodes.
Abstract: There are algorithms for generating combinatorial objects such as combinations, permutations and wellformed parenthesis strings in O(1) time per object in the worst case. Those known algorithms are designed based on the intrinsic nature of each problem, causing difficulty in applying a method in one area to the other. On the other hand, there are many results on combinatorial generation with minimal change order, in which just a few changes, one or two, are allowed from object to object. These results are classified in a general framework of combinatorial Gray code, many of which are based on a recursive algorithm, causing O(n) time from object to object. To derive O(1) time algorithms for combinatorial generation systematically, we formalize the idea of combinatorial Gray code by a twisted lexico tree, which is obtained from the lexicographic tree for the given set of combinatorial objects by twisting branches depending on the parity of the nodes. An iterative algorithm which traverses this tree will generate the given set of combinatorial objects in O(1) time as well as with a fixed number of changes from the present combinatorial object to the next. Although the idea of twisted lexico tree is not new, the mechanism of tree traversal and computation of changing places are new in this paper. As examples of this approach, we present new algorithms for generating well-formed parenthesis strings and combinations in O(1) time per object. The generation of combinations is done “in-place”, that is, taking O(n) space to generate combinations of n elements out of r elements. Previous algorithms take O(r) space to represent a combination by a binary vector of size r.

Journal ArticleDOI
TL;DR: This work presents two general solutions for unordered data and shows how machines can be constructed to summarize sequential and un ordered data in optimum ways.
Abstract: Problems in probabilistic induction are of two general kinds. In the first, we have a linearly ordered sequence of symbols that must be extrapolated. In the second we want to extrapolate an unordered set of finite strings. A very general formal solution to the first kind of problem is well known and much work has been done in obtaining good approximations to it [LI 93, Ris 78, Ris 89, Sol 64a, Sol 64b, Sol 78, Wal 68]. Though the second kind of problem is of much practical importance, no general solution has been published. We present two general solutions for unordered data. We also show how machines can be constructed to summarize sequential and unordered data in optimum ways.

Journal ArticleDOI
TL;DR: A review of existing research is given with an identification of some open problems and a taxonomy of various possible partitioning schemes and a unified view of the vertical partitioning problem are presented.
Abstract: In this paper, some interesting issues related to vertical partitioning in object oriented database systems are presented. A review of existing research is given with an identification of some open problems. A taxonomy of various possible partitioning schemes and a unified view of the vertical partitioning problem are also presented. Existing vertical partitioning algorithms have been studied for their use in both parallel and distributed object-oriented databases.

Journal ArticleDOI
TL;DR: This paper addresses the problem of building reliable computing programs over remote procedure call (RPC) systems by using replication and transaction techniques by establishing the computational model: the RPC transactions.
Abstract: This paper addresses the problem of building reliable computing programs over remote procedure call (RPC) systems by using replication and transaction techniques. We first establish the computational model: the RPC transactions. Based on this RPC transaction model, we present the design of our system for managing RPC transactions in the replicated-server environment. Finally, we present some results of a correctness study on the system and two examples of the system.

Journal ArticleDOI
TL;DR: A natural significance test is shown to be rarely fooled by apparent similarities between two sequences that are merely typical of all or most members of the population, even unrelated members.
Abstract: A population of sequences is called non-random if there is a statistical model and an associated compression algorithm that allows members of the population to be compressed, on average. Any available statistical model of population should be incorporated into algorithms for alignment of the sequences and doing so changes the rank order of possible alignments in general. The model should also be used in deciding if a resulting approximate match between two sequences is significant or not. It is shown how to do this for two plausible interpretations involving pairs of sequences that might or might not be related. Efficient alignment algorithms are described for quite general statistical models of sequences. The new alignment algorithms are more sensitive to what might be termed 'features' of the sequences. A natural significance test is shown to be rarely fooled by apparent similarities between two sequences that are merely typical of all or most members of the population, even unrelated members.

Journal ArticleDOI
TL;DR: A new inductive principle, which is a natural generalization of Rissanen’s minimum description length (MDL) principle and Wallace's minimum message length (MML) principle, based on the notion of predictive complexity, a recentgeneralization of Kolmogorov complexity.
Abstract: We propose a new inductive principle, which we call the complexity approximation principle (CAP). This principle is a natural generalization of Rissanen’s minimum description length (MDL) principle and Wallace’s minimum message length (MML) principle and is based on the notion of predictive complexity, a recent generalization of Kolmogorov complexity. Like the MDL principle, CAP can be regarded as an implementation of Occam’s razor.

Journal ArticleDOI
TL;DR: Two new algorithms for computing the reliability of a distributed computing system with imperfect nodes are proposed based on a symbolic approach and a general factoring technique on both nodes and edges.
Abstract: The reliability of a distributed computing system depends on the reliability of its communication links and nodes and on the distribution of its resources, such as programs and data files. Many algorithms have been proposed for computing the reliability of distributed computing systems, but they have been applied mostly to distributed computing systems with perfect nodes. However, in real problems, nodes as well as links may fail. This paper proposes two new algorithms for computing the reliability of a distributed computing system with imperfect nodes. Algorithm I is based on a symbolic approach that includes two passes of computation. Algorithm II employs a general factoring technique on both nodes and edges. Comparisons with existing methods show the usefulness of the proposed algorithms for computing the reliability of large distributed computing systems.

Journal ArticleDOI
TL;DR: This study argues that hypergraph topologies, where a channel connects any number of nodes, thus providing total bypasses within a dimension, represent potential candidates for future high-performance networks.
Abstract: Low-dimensional k-ary n-cubes have been popular in recent multicomputers. These networks, however, suffer from high switching delays due to their high message distance. To overcome this problem, Dally has proposed a variation, called express k-ary n-cubes, with express channels that allow non-local messages to partially bypass clusters of nodes within a dimension. K-ary n-cubes are graph topologies where a channel connects exactly two nodes. This study argues that hypergraph topologies, where a channel connects any number of nodes, thus providing total bypasses within a dimension, represent potential candidates for future high-performance networks. It presents a comparative analysis, of a regular hypergraph, referred to as the distributed crossbar switch hypermesh (DCSH), and the express k-ary n-cube. The analysis considers channel bandwidth constraints which apply in different implementation technologies. The results conclude that the DCSH’s total bypass strategy yields superior performance characteristics to the partial bypassing of its express cube counterpart.

Journal ArticleDOI
TL;DR: A new algorithm for division in residue number system, which can be applied to any moduli set, is introduced, and simulation results indicated that the algorithm is faster than the most competitive published work.
Abstract: In this paper we introduce a new algorithm for division in residue number system, which can be applied to any moduli set. Simulation results indicated that the algorithm is faster than the most competitive published work. To further improve this speed, we customize this algorithm to serve two specific moduli sets: (2 k , 2 k -1, 2 k-1 -1) and (2 k +1, 2 k , 2 k -1). The customization results in eliminating memory devices (ROMs), thus increasing the speed of operation. A semi-custom VLSI design for this algorithm for the moduli (2 k + 1, 2 k , 2 k - 1) has been implemented, fabricated and tested.

Journal ArticleDOI
TL;DR: This paper describes a much more powerful recursion removal and introduction operation which describes its source and target in the form of an action system (a collection of labels and calls to labels).
Abstract: The transformation of a recursive program to an iterative equivalent is a fundamental operation in Computer Science. In the reverse direction, the task of reverse engineering (analysing a given program in order to determine its speciflcation) can be greatly ameliorated if the program can be re-expressed in a suitable recursive form. But the existing recursion removal transformations, such as the techniques discussed by Knuth [1] and Bird [2], can only be applied in the reverse direction if the source program happens to match the structure produced by a particular recursion removal operation. In this paper we describe a much more powerful recursion removal and introduction operation which describes its source and target in the form of an action system (a collection of labels and calls to labels). A simple, mechanical, restructuring operation can be applied to a great many iterative programs which will put them in a suitable form for recursion

Journal ArticleDOI
TL;DR: The Dijkstra/Scholten calculational logic is adapted to partial functions in a way that preserves the fixed point rule.
Abstract: We adapt the Dijkstra/Scholten calculational logic to partial functions in a way that preserves the fixed point rule.

Journal ArticleDOI
TL;DR: This paper identifies some typical verification problems in process control specifications, such as reachability, termination, freedom of deadlock and livelock, and determines their computational complexity, to provide computational lower bounds for any technique using these concepts for process control.
Abstract: Many techniques in many diverse areas in computer science, such as process modelling, process programming, decision support systems and workflow systems, use concepts for the specification of process control. These typically include concepts for the specification of sequential execution, parallelism, synchronization and moments of choice. This paper identifies some typical verification problems in process control specifications, such as reachability, termination, freedom of deadlock and livelock, and determines their computational complexity. These results then provide computational lower bounds for any technique using these concepts for process control.

Journal ArticleDOI
TL;DR: It is shown that the incompressibility method is particularly suited to obtain average-case computational complexity lower bounds, which have been difficult to obtain in the past by other methods.
Abstract: The incompressibility method is an elementary yet powerful proof technique based on Kolmogorov complexity [13]. We show that it is particularly suited to obtain average-case computational complexity lower bounds. Such lower bounds have been difficult to obtain in the past by other methods. In this paper we present four new results and also give four new proofs of known results to demonstrate the power and elegance of the new method.


Journal ArticleDOI
TL;DR: This paper presents a framework for network-layer multicasting while keeping a low overhead in adapting multicast routes to mobile host locations by partitioning the mobile environment into non-overlapping regions, so that changes in the multicasting routes due to host intra-region movements are hidden from other regions.
Abstract: A mobile computing environment allows hosts to roam while retaining access to the Internet. Multicasting is one of the most important facilities for constructing reliable distributed systems and cooperative applications. Host mobility, however, challenges multicasting in this environment: the established multicast delivery paths may frequently restructure along with host migrations, incurring expensive overheads. This paper presents a framework for network-layer multicasting while keeping a low overhead in adapting multicast routes to mobile host locations. This is achieved by partitioning the mobile environment into non-overlapping regions, so that changes in the multicast routes due to host intra-region movements are hidden from other regions. An analytical model was developed for performance evaluation. It shows that, compared with the best known proposal, our scheme reduces the average multicast latency by more than 66%, while causing less than 7% overhead.