scispace - formally typeset
Search or ask a question

Showing papers in "The Computer Journal in 1996"


Journal ArticleDOI
TL;DR: This paper investigates the fundamental operation of the block sorting algorithm and presents some improvements based on that analysis, which relates the compression to the proportion of zeros after the MTF stage.
Abstract: A recent development in text compression is a 'block sorting' algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with Move-ToFront (MTF) and a final statistical compressor. The technique combines good speed with excellent compression performance. This paper investigates the fundamental operation of the algorithm and presents some improvements based on that analysis. Although block sorting is clearly related to previous compression techniques, it appears that it is best described by techniques derived from work by Shannon on the prediction and entropy of English text. A simple model is developed which relates the compression to the proportion of zeros after the MTF stage.

130 citations


Journal ArticleDOI
TL;DR: An infinite family of efficient and practical algorithms for generating order preserving minimal perfect hash functions is presented, and it is shown that almost all members of the family construct space and time optimal order preserve minimal perfectHash functions, and the one with minimum constants is identified.
Abstract: Minimal perfect hash functions are used for memory efficient storage and fast retrieval of items from static sets. We present an infinite family of efficient and practical algorithms for generating order preserving minimal perfect hash functions. We show that almost all members of the family construct space and time optimal order preserving minimal perfect hash functions, and we identify the one with minimum constants. Members of the family generate a hash function in two steps. First a special kind of function into an r-graph is computed probabilistically. Then this function is refined deterministically to a minimal perfect hash function. We give strong theoretical evidence that the first step uses linear random time. The second step runs in linear deterministic time. The family not only has theoretical importance, but also offers the fastest known method for generating perfect hash functions.

105 citations


Journal ArticleDOI
TL;DR: This paper provides a tutorial introduction to the belief network framework and highlights some issues of ongoing research in applying the framework for real-life problem solving.
Abstract: In artificial intelligence research, the belief network framework for automated reasoning with uncertainty is rapidly gaining in popularity. The framework provides a powerful formalism for representing a joint probability distribution on a set of statistical variables. In addition, it offers algorithms for efficient probabilistic inference. At present, more and more knowledge-based systems employing the framework are being developed for various domains of application ranging from probabilistic information retrieval to medical diagnosis. This paper provides a tutorial introduction to the belief network framework and highlights some issues of ongoing research in applying the framework for real-life problem solving.

92 citations


Journal ArticleDOI
TL;DR: This paper provides a mathematical definition of the Transputer Instruction Set architecture for executing Occam together with a correctness proof for a general compilation schema of Occam programs into Transputer code, making the proof applicable to a large class of compilers.
Abstract: This paper contributes to the development of a rigorous mathematical framework for the study of provably correct compilation techniques. The proposed method is developed through an implementation of a real-life non-toy imperative programming language with non-determinism and parallelism, i.e. Occam, to a commercial machine, namely the Transputer. We provide a mathematical definition of the Transputer Instruction Set architecture for executing Occam together with a correctness proof for a general compilation schema of Occam programs into Transputer code. We start from the ground model, an abstract processor, running a high and a low priority queue of Occam processes, which formalizes the semantics of Occam at the abstraction level of atomic Occam instructions. We develop here increasingly more refined levels of Transputer semantics, proving correctness (and when possible also completeness) for each refinement step. Along the way we collect our proof assumptions, a set of natural conditions for a compiler to be correct, thus making our proof applicable to a large class of compilers. As a by-product our construction provides a challenging realistic case study for proof verification by theorem provers.

86 citations


Journal ArticleDOI
TL;DR: This article develops a strategy to cope with the problem of overwhelm by formulating ad hoc queries, based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture.
Abstract: Query formulation in the context of large conceptual schemata is known to be a hard problem. When formulating ad hoc queries users may become overwhelmed by the vast amount of information that is stored in the information system; leading to a feeling of lost in conceptual space. In this article we develop a strategy to cope with this problem. This strategy is based on ideas from the information retrieval world, in particular the query by navigation mechanism and the stratified hypermedia architecture. The stratified hypermedia architecture is used to describe the information contained in the information system on multiple levels of abstraction. When using our approach to the formulation of queries, a user will first formulate a number of simple queries corresponding to linear paths through the information structure. The formulation of the linear paths is the result of the explorative phase of query formulation. Once users have specified a number of these linear paths, they may combine them to form more complex queries. This last process is referred to as query by construction and corresponds to the constructive phase of the query formulation process.

69 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel protocol to achieve non-repudiation of receipt by using two simple ideas, a conditional signature and a public notice-board, and shows how this security service can be achieved in a simple but effective manner.
Abstract: A designer of secure Electronic Data Interchange (EDI) or other emerging electronic commerce systems must consider an EDI-specific threat-repudiation of receipt-in addition to general security services such as authentication, confidentiality and integrity. We address this security issue by first examining the previous work done in this area, and then proposing a novel protocol to achieve non-repudiation of receipt. By using two simple ideas, a conditional signature and a public notice-board, the new protocol can achieve this security service in a simple but effective manner.

56 citations


Journal ArticleDOI
Xindong Wu1
TL;DR: A new discretization method is designed, called the Bayesian discretizer, and its performance with the information gain methods implemented in C4.5 and HCV (Version 2.0).
Abstract: Discretization of real-valued attributes into nominal intervals has been an important area for symbolic induction systems because many real world classification tasks involve both symbolic and numerical attributes Among various supervised and unsupervised discretization methods, the information gain-based methods have been widely used and cited This paper designs a new discretization method, called the Bayesian discretizer, and compares its performance with the information gain methods implemented in C45 and HCV (Version 20) Over the seven datasets tested, the Bayesian discretizer has the best results of four of them in terms of predictive accuracy

51 citations


Journal ArticleDOI
TL;DR: This work considers linear interval routing schemes studied by [3,5] from a graph-theoretic perspective and establishes an Ω(n 1/3 ) lower bound on the minimum number of intervals needed to achieve shortest path routings in the network considered.
Abstract: We consider linear interval routing schemes studied by [3,5] from a graph-theoretic perspective. We examine how the number of linear intervals needed to obtain shortest path routings in networks is affected by the product, join and composition operations on graphs. This approach allows us to generalize some of the results of [3,5] concerning the minimum number of intervals needed to achieve shortest path routings in certain special classes of networks. We also establish an Ω(n 1/3 ) lower bound on the minimum number of intervals needed to achieve shortest path routings in the network considered.

50 citations


Journal ArticleDOI
TL;DR: A comparison is presented of five systems, two based on attribute counting and three using metrics based on structure, and it is found that the systems based on structural information consistently equal or better the performance of systems based of attribute counting metrics.
Abstract: Early automated systems for detecting plagiarism in student programs employed attribute counting techniques in their comparisons of program texts, while more recent systems use encoded structural information. Whales claims that the latter are more effective in their detection of plagiarisms than systems based on attribute counting. To explore the validity of these claims, a comparison is presented of five systems, two based on attribute counting and three using metrics based on structure. The major result of this study is that the systems based on structural information consistently equal or better the performance of systems based on attribute counting metrics. A second conclusion is that of the structure metric systems, one using approximate tokenization of input texts (YAP) is as effective as a system that undertakes a complete parse (Plague). Approximate tokenization offers a considerable reduction in the costs of porting to new languages. A distinction is also made between forms of plagiarism common among novice programmers and those employed by more experienced programmers.

46 citations


Journal ArticleDOI
TL;DR: A richer class of admission control algorithms that make acceptance/rejection decisions not only to satisfy the hardware requirements of client requests but also to optimize the reward of the system based on a performance criterion as it services clients of different priority classes are considered.
Abstract: Traditional admission control algorithms for on-demand multimedia servers concern the acceptance decisions for new clients' requests so as to guarantee that continuous services to all clients are executed. These algorithms determine whether a new client can be accepted, based on the consideration whether the underlying hardware can satisfy the quality of service (QoS) requirements of admitted client requests. In this paper, we consider a richer class of admission control algorithms that make acceptance/rejection decisions not only to satisfy the hardware requirements of client requests but also to optimize the reward of the system based on a performance criterion as it services clients of different priority classes. We divide the server capacity into

45 citations


Journal ArticleDOI
TL;DR: An operator for the composition of two processes, where one process has priority over the other process, is studied and it is argued that this operator is adequate for modelling priorities as found in programming languages and operating systems.
Abstract: An operator for the composition of two processes, where one process has priority over the other process, is studied. Processes are described by action systems, and data refinement is used for transforming processes. The operator is shown to be compositional, i.e. monotonic with respect to refinement. It is argued that this operator is adequate for modelling priorities as found in programming languages and operating systems. Rules for introducing priorities and for raising and lowering priorities of processes are given. Dynamic priorities are modelled with special priority variables which can be freely mixed with other variables and the prioritising operator in program development. A number of applications show the use of prioritising composition for modelling and specification in general.

Journal ArticleDOI
TL;DR: It is shown that ignoring concurrent site and link failures/repairs events or repair dependency can very unrealistically overestimate the availability of replicated data.
Abstract: Pessimistic control algorithms for replicated data permit only one partition to perform update operations at any time so as to ensure mutual exclusion of the replicated data object. Existing availability modelling and analyses of pessimistic control algorithms for replicated data management are constrained to either site-failure-only or link-failure-only models, but not both, because of the large state space which needs to be considered. Moreover, the assumption of having an independent repairman for each link and each site has been made to reduce the complexity of analysis. In this paper, we remove these restrictions with the help of stochastic Petri nets. In addition to including both site and link failures/repairs events in our analysis, we investigate the effect of repair dependency which occurs when many sites and links may have to share the same repairman due to repair constraints. Four repairman models are examined in the paper: (a) independent repairman with one repairman assigned to each link and each node; (b) dependent repairman with FIFO servicing discipline; (c) dependent repairman with linear-order servicing discipline; and (d) dependent repairman with best-first servicing discipline. Using dynamic voting as a case study, we compare and contrast the resulting availabilities due to the use of these four different repairman models and give a physical interpretation of the differences. We show that ignoring concurrent site and link failures/repairs events or repair dependency can very unrealistically overestimate the availability of replicated data.

Journal ArticleDOI
TL;DR: A more generalized algorithm is provided which allows a uniform and simplified solution to find all possible keys of a relational database schema when the attribute graph of Functional Dependencies (FDs) is not strongly connected.
Abstract: We provide an efficient algorithm for computing the candidate keys of a relational database schema. The algorithm exploits the 'arrangement' of attributes in the functional dependencies to determine which attributes are essential and useful for determining the keys and which attributes should not be considered. A more generalized algorithm using attribute graphs is then provided which allows a uniform and simplified solution to find all possible keys of a relational database schema when the attribute graph of Functional Dependencies (FDs) is not strongly connected.

Journal ArticleDOI
TL;DR: This paper presents another algorithm which, given at most 2n - 3 faulty nodes such that the faulty nodes can be covered by n - 1 subgraphs of diameter 1, finds a fault-free path s → t of length at most d(s, t) + 4 in O(n) time.
Abstract: In this paper, we give an algorithm which, given at most n - 1 faulty nodes and non-faulty nodes s and t in the n-dimensional hypercube, H n , finds a fault-free path s → t of length at most d(s, t) + 2 in O(n) time, where d(s, t) is the distance between s and t in H n . Using this algorithm as a subroutine, we present another algorithm which, given at most 2n - 3 faulty nodes such that the faulty nodes can be covered by n - 1 subgraphs of diameter 1, finds a fault-free path s → t of length at most d(s, t) + 4 in O(n) time. The algorithms are optimal in the sense that both the upper bounds on the length of s → t and the time complexity are optimal.

Journal ArticleDOI
TL;DR: This work extends algorithms that represent the problem of minimizing the testing against a finite state automaton model as a max flow/min cost problem for an associated network by introducing the use of invertibility to utilize test sequence overlap.
Abstract: Finite state automata can be used to model a system; in particular they can be used to model the control section of a communications protocol. A number of authors have produced algorithms that represent the problem of minimizing the testing against a finite state automaton model as a max flow/min cost problem for an associated network. We extend this work by introducing the use of invertibility to utilize test sequence overlap.

Journal ArticleDOI
TL;DR: Quantitative results include parallel runtime comparisons with the decomposition approach for several images of varying complexity, an evaluation of attained compression ratios and SNR for reconstructed images, and experimental results using an nCUBE-2 supercomputer are presented.
Abstract: In this paper we present a model and experimental results for performing parallel fractal image compression using circulating pipeline computation and employing a new quadtree recomposition approach. A circular linear array of processors is employed and utilized in a pipelined fashion. In this approach, a modification of the scheme given by Jackson and Blom, quadtree sub-blocks of an image are iteratively recombined into larger blocks for fractal coding. For complex images, this approach exhibits superior parallel runtime performance when compared to a classical quadtree decomposition scheme for fractal image compression, while maintaining high fidelity for reconstructed images. Quantitative results include parallel runtime comparisons with the decomposition approach for several images of varying complexity, an evaluation of attained compression ratios and SNR for reconstructed images. Experimental results using an nCUBE-2 supercomputer are presented.

Journal ArticleDOI
TL;DR: It is argued that direct networks based on hypergraph topologies have characteristics which make them particularly appropriate for use in future high-performance parallel systems, and compared the Distributed Crossbar Switch Hypermesh to multi-stage networks, under both VLSI and multiple-chip technological constraints.
Abstract: In a multicomputer network, the channel bandwidth is greatly constrained by implementation technology. Two important constraints have been identified in the literature, each appropriate for a particular technology: wiring density limits dominate in VLSI and pin-out in multiple-chip implementations. Most existing interconnection networks are categorized as either direct or indirect (multi-stage) topologies, the mesh and binary n-cube being examples of the former category, the omega and banyan networks of the latter. This paper argues that direct networks based on hypergraph topologies have characteristics which make them particularly appropriate for use in future high-performance parallel systems. The authors have recently introduced a regular multidimensional hypergraph network, called the Distributed Crossbar Switch Hypermesh (DCSH), which has topological properties that permit a relaxation of bandwidth constraints, and which has important topological and performance advantages over direct graph networks. This paper compares the DCSH to multi-stage networks, under both VLSI and multiple-chip technological constraints. The results suggest that in both cases, with a realistic model which includes routing delays through intermediate nodes, the DCSH exhibits superior performance across a wide range of traffic loads.

Journal ArticleDOI
TL;DR: The experiences of developing and offering a hypermedia course in computer architecture, in which lectures were replaced by on-line courseware, using the Hyper-G system are described.
Abstract: Hypermedia technology provides both an opportunity for universities to provide a better learning experience for their students, and a way to cope with funding reductions. Second-generation hypermedia systems makes it cost-effective to develop and deliver multimedia courseware, while permitting learning to occur within a community. We illustrate by describing our experiences developing and offering a hypermedia course in computer architecture, in which lectures were replaced by on-line courseware, using the Hyper-G system.

Journal ArticleDOI
Martin Ward1
TL;DR: Supported by an EPSRC project: "A proof theory for program refinement and equivalence: extensions".
Abstract: Supported by an EPSRC project: "A proof theory for program refinement and equivalence: extensions".

Journal ArticleDOI
TL;DR: The outcomes of this work are both a recognition of the important relationship between the disciplines of urban planning and the design of information visualizations as well as more concrete algorithms to be used by the designers of such visualizations.
Abstract: This paper shows how previous research into navigation through urban environments, which has emerged from the discipline of urban planning, can be adapted to enhance the design of information visualizations. The paper draws on Kevin Lynch's seminal work on the legibility of urban landscapes in order to propose a set of general techniques which can be applied to the task of information visualization. It describes a specific instantiation of these techniques called LEADS, a legibility system which post-processes the output of a range of existing visualization systems in order to enhance their legibility. The paper provides four examples of the application of LEADS to different information visualizations. Following this, it discusses experimental work, the conclusions of which provide some tentative support for the likely success of this approach. The outcomes of this work are both a recognition of the important relationship between the disciplines of urban planning and the design of information visualizations as well as more concrete algorithms to be used by the designers of such visualizations.

Journal ArticleDOI
TL;DR: This paper describes window inference in terms of a sequent formulation of natural deduction, demonstrating the soundness of window inference itself and illustrating how mechanized support for window inference can be implemented using existing sequent-based theorem provers.
Abstract: This paper presents a generalization of Robinson and Staples's window inference system of hierarchical reasoning. The generalization enhances window inference so that it is capable of supporting transformational proofs. In addition, while Robinson and Staples proposed window inference as an alternative to existing styles of reasoning, this paper describes window inference in terms of a sequent formulation of natural deduction. Expressing window inference in terms of natural deduction, a style of reasoning already known to be sound, demonstrates the soundness of window inference itself. It also illustrates how mechanized support for window inference can be implemented using existing sequent-based theorem provers. The paper also examines the use of contextual assumptions with window inference. Two definitions of what may be assumed in a context are presented. The first is a general definition; while the second has a simpler form. These definitions are shown to be equivalent for contexts that do not bind variables.

Journal ArticleDOI
TL;DR: Well-known conceptual data modelling concepts, such as relationship types, generalization, specialization, collection types and constraint types,such as the total role constraint and the uniqueness constraint, are discussed from a categorical point of view.
Abstract: For successful information systems development, conceptual data modelling is essential. Nowadays many conceptual data modelling techniques exist. In-depth comparisons of concepts of these techniques are very difficult as the mathematical formalizations of these techniques, if they exist at all, are very different. Consequently, there is a need for a unifying formal framework providing a sufficiently high level of abstraction. In this paper the use of category theory for this purpose is addressed. Well-known conceptual data modelling concepts, such as relationship types, generalization, specialization, collection types and constraint types, such as the total role constraint and the uniqueness constraint, are discussed from a categorical point of view. An important advantage of this framework is its 'configurable semantics'. Features such as null values, uncertainty and temporal behavior can be added by selecting appropriate instance categories. The addition of these features usually requires a complete redesign of the formalization in traditional set-based approaches to semantics.

Journal ArticleDOI
TL;DR: By extending the popular hypercube, a family of parameterized network topologies called Gaussian Cubes (GCs) is presented and it is demonstrated that these new networks can approximate the concurrency offered by hypercubes while lowering the cost of interconnection.
Abstract: The interconnection topology of a network plays a key role in determining the performance and cost of routing messages in the system. Presently, there are many known results about individual network topologies and their applications. However, relatively few results exist about the relations among these topologies ; for instance, it is rather difficult to relate quantitatively the cost of interconnection with the routing efficiency. As such, the theory is incomplete for assisting a designer to choose a suitable topology for a required routing performance under a given cost of interconnection. Here, by extending the popular hypercube, a family of parameterized network topologies called Gaussian Cubes (GCs) is presented. By varying the parameter that controls the interconnection density, the routing performance of a GC can be scaled according to the traffic loads without changing the routing algorithm. It is demonstrated that these new networks can approximate the concurrency offered by hypercubes while lowering the cost of interconnection ; common communication primitives such as Unicast, Multicast and Broadcast can all be supported efficiently on GCs. Therefore, the new network topologies can be applied to distributed systems with scalable messaging performance. Because GCs are also subgraphs of the conventional hypercubes, our results may also be useful to fault-tolerant computing based on hypercubes.

Journal ArticleDOI
TL;DR: An algorithm is given which, given at most n - 1 faulty clusters of diameter at most 1 with 2n - 3 faulty nodes in total and non-faulty nodes s and t in H n, finds a fault-free path s → t of length at mostn + 2 in O(n) optimal time.
Abstract: In this paper, we study the node-to-node fault tolerant routing problem in n-dimensional hypercubes H n based on the cluster fault tolerant model. For a graph G, a faulty cluster is a connected subgraph of G such that all its nodes are faulty. In cluster fault tolerant routing problems, how many faulty clusters and how large those clusters can be tolerated are studied. It was proved that for node-to-node routing, H n can tolerate as many as n - 1 faulty clusters of diameter at most 1 with at most 2n - 3 faulty nodes in total. In this paper, we give an algorithm which, given at most n - 1 faulty clusters of diameter at most 1 with 2n - 3 faulty nodes in total and non-faulty nodes s and t in H n , finds a fault-free path s → t of length at most n + 2 in O(n) optimal time. The upper bound on the length of the path is optimal when the distance between s and t is n - 2.

Journal ArticleDOI
TL;DR: A model called HyperSoft is presented, which can be used for viewing programs as hypertext, and provides for a systematic and automated way of representing programs as different kinds of dependency graphs.
Abstract: A model called HyperSoft is presented, which can be used for viewing programs as hypertext. The main goal in developing the model has been to offer a framework for new program browsing tools to support the maintenance of legacy software in particular. The model consists of four layers: source code as such, its syntactic structure, hypertextual access structures based on the source code and its syntax, and the user interface for viewing and manipulating the source code and the access structures. The access structures are based on a general relational model of program dependencies. Both the hypertextual software model and the program dependency model are language independent and provide for a systematic and automated way of representing programs as different kinds of dependency graphs. The models are implemented in a program browsing tool which analyses C programs and automatically generates relevant hypertextual representations for them, according to requests of the maintainer.

Journal ArticleDOI
TL;DR: The new logic allows one to predicate and quantify over propositional terms while according a special status of time; for example, assertions such as ‘effects cannot precede their causes’ is ensured in the logic, and some problematic temporal aspects including the delay time between events and their effects can be conveniently expressed.
Abstract: This paper presents a reified temporal logic for representing and reasoning about temporal and non-temporal relationships between non-temporal assertions. A clear syntax and semantics for the logic is formally provided. Three types of predicates, temporal predicates, non-temporal predicates and meta-predicates, are introduced. Terms of the proposed language are partitioned into three types, temporal terms, non-temporal terms and propositional terms. Reified propositions consist of formulae with each predicate being either a temporal predicate or a meta-predicate. Meta-predicates may take both temporal terms and propositional terms together as arguments or take propositional terms alone. A standard formula of the classical first-order language with each predicate being a non-temporal predicate taking only non-temporal terms as arguments is reified as just a propositional term. A general time ontology has been provided which can be specialized to a variety of existing temporal systems. The new logic allows one to predicate and quantify over propositional terms while according a special status of time; for example, assertions such as ‘effects cannot precede their causes’ is ensured in the logic, and some problematic temporal aspects including the delay time between events and their effects can be conveniently expressed. Applications of the logic are presented including the characterization of the negation of properties and their contextual sentences, and the expression of temporal relations between actions and effects.

Journal ArticleDOI
TL;DR: It will be shown that this schedule is optimal in terms of time, outperforming all linear schedules, and includes a heuristic refinement of the free schedule which 'shuffles' computations into time, without loss of the optimal time performance, to augment the mean processor utilization.
Abstract: The most important issue when parallelizing sequential programs is the efficient assignment of computations into different processing elements. The most extensive, in terms of time execution, part of a program is the nested loops. Too many approaches have been devoted in parallelizing nested loops, and assigning the concurrent partitions of such a loop into different processors. In the past, all methods have been focused upon linear schedules produced by manipulating the reduced dependence graph, which in some cases achieve near optimal solutions. This paper presents a new method of free scheduling loop computations into time, based on task graph scheduling techniques. It will be shown that this schedule is optimal in terms of time, outperforming all linear schedules. Furthermore, in terms of total number of processors, the presented method includes a heuristic refinement of the free schedule which 'shuffles' computations into time, without loss of the optimal time performance, to augment the mean processor utilization. In all cases, the proposed method uses less number of processors, while preserving the optimal total execution time. The 'shuffling' of computations is based on graph theory approaches, and uses PERT problem techniques. Such scheduling is convenient for parallelizing tools (such as compilers), but has practical interest for shared memory multiprocessor systems, where the communication delay imposed by such non-regular scheduling is of no interest.

Journal ArticleDOI
TL;DR: It is proved that, when the input domain is large enough with respect to the numbers of failure-causing inputs and test cases, a safe partition testing strategy cannot deviate from the proportional sampling strategy other than rounding due to integral constraints.
Abstract: Although previous studies have shown that partition testing strategies are not always very effective, with appropriate restrictions on the test allocation they can be guaranteed to be safe, in the sense that they will never be less reliable in detecting at least one failure than random testing. Several sufficient conditions for this have already been established in the literature. In particular, the proportional sampling strategy, which allocates test cases in proportion to the size of the subdomains from which they are selected, has been proved to be safe for all programs. In practice, since the number of test cases must be positive integers, often the proportional sampling strategy can only be approximated. This paper examines the necessary conditions for safe partition testing strategies. We also prove that, when the input domain is large enough with respect to the numbers of failure-causing inputs and test cases, a safe partition testing strategy cannot deviate from the proportional sampling strategy other than rounding due to integral constraints.

Journal ArticleDOI
TL;DR: The chase procedure for an imprecise relation r over R is redefined as a means of maintaining consistency of r with respect to F and the output chase(r, F) of the chase procedure is the most precise approximation of r in the following sense.
Abstract: We extend functional dependencies (FDs), which are the most fundamental integrity constraints that arise in practice in relational databases, to be satisfied in an imprecise relation. The problem we tackle is the following: given an imprecise relation r over a relation schema R and a set of FDs F over R, what is the most precise approximation of r, which is also consistent with respect to F. We formalize the notion of an imprecise relation by defining tuple values to be sets of values rather than just single values as is the case when the information is precise. We interpret each value in such a set as being equally likely to be the true value. This gives rise to equivalence classes of equally likely values thus allowing us to define the merge of an imprecise relation r which replaces values in r by their equivalence class. We also define a partial order on merged imprecise relations leading to the notion of an imprecise relation being less precise than another imprecise relation. This partial order induces a lattice on the set of merged imprecise relations. An imprecise relation is consistent with respect is consistent with respect to a set of FDs F if it satisfies F. Satisfaction of an FD in an imprecise relation is defined in terms of values being equally likely rather than equal. We show that Armstrong's axiom system is sound and complete for FDs being satisfied in imprecise relations. We redefine the chase procedure for an imprecise relation r over R as a means of maintaining consistency of r with respect to F. Our main results is that the output chase (r,F) of the chase procedure is the most precise approximation of r with respect to F in the following sense. It is shown to be the join of all consistent imprecise relations s with respect to F in the following sense. It is shown to be the join of all consistent imprecise relations s such that s is a merged imprecise relation that is less precise than r. It is also shown that chase (r,F) can be computed in polynomial time in the sizes of r and F.

Journal ArticleDOI
TL;DR: This paper considers the programming of passive (cold and warm standbys) and active replicated systems in Ada 95 and considers two extensions to the Distributed Systems Annex to help give the application programmer more control.
Abstract: This paper considers the programming of passive (cold and warm standbys) and active replicated systems in Ada 95. We show that it is relatively easy to develop systems which act as standbys using the facilities provided by the language and the Distributed Systems Annex. Arguably, active replication in Ada 95 can be supported in a manner which is transparent to the application. However, this is implementation-dependent, requires a complex distributed consensus algorithm (or a carefully chosen subset of the language to be used) and has little flexibility. We therefore consider two extensions to the Distributed Systems Annex to help give the application programmer more control. The first is a via a new categorization pragma which specifies that a RCI package can be replicated in more than one partition. The second is through the introduction of a coordinated type which has a single primitive operation. Objects which are created from extensions to coordinated types can be freely replicated across the distributed system. When the primitive operation is called, the call is posted to all sites where a replica resides, effectively providing a broadcast (multicast) facility. We also consider extensions to the partition communication subsystem which implement these new features.