scispace - formally typeset
Search or ask a question

Showing papers on "Tuple published in 1994"


Proceedings Article
01 Aug 1994
TL;DR: A new filtering algorithm is presented that achieves the generalized arc-consistency condition for these non-binary constraints and has been successfully used in the system RESYN, to solve the subgraph isomorphism problem.
Abstract: Many real-life Constraint Satisfaction Problems (CSPs) involve some constraints similar to the alldifferent constraints. These constraints are called constraints of difference. They are defined on a subset of variables by a set of tuples for which the values occuring in the same tuple are all different. In this paper, a new filtering algorithm for these constraints is presented. It achieves the generalized arc-consistency condition for these non-binary constraints. It is based on matching theory and its complexity is low. In fact, for a constraint defined on a subset of p variables having domains of cardinality at most d, its space complexity is O(pd) and its time complexity is O(p2d2). This filtering algorithm has been successfully used in the system RESYN (Vismara et al. 1992), to solve the subgraph isomorphism problem.

823 citations


Proceedings Article
Lothar F. Mackert1, Guy M. Lohman1
01 Jul 1994
TL;DR: This paper extends an earlier optimizer validation and performance evaluation of R’ to di.rfribu& queries, i.e. single SQL statements having tables at multiple sites, confirming the accuracy of R*‘s message cost model and the significant contribution of local (CPU and I/O) costs, even for a medium-speed network.
Abstract: Few database query optimizer models have been validated against actual performance. This paper extends an earlier optimizer validation and performance evaluation of R’ to di.rfribu& queries, i.e. single SQL statements having tables at multiple sites. Actual R* message, I/O, and CPU resources consumed and the corresponding costs estimated by the optimizer were written to database tables using new SQL commands, permitting automated control from application programs for collecting, reducing, and comparing test data. A number of tests were rnn over a wide variety of dynamically-created test databases, SQL queries, and system parameters. Both high-speed networks (comparable to a local area network) and medium-speed long-haul networks (for linking geographically dispersed hosts) were evaluated. The tests confirmed the accuracy of R*‘s message cost model and the significant contribution of local (CPU and I/O) costs, even for a medium-speed network. Although distributed queries consume more resources overall, the response time for some execution strategies improves disproportionately by exploiting both concurrency and reduced contention for buffers. For distributed joins in which a copy of the inner table must be transferred to the join site, shipping the whole inner table dominated the strategy of fetching only those inner tuples that matched each outer-table value, even though the former strategy may require additional I/O. Bloomjoins (hashed semijoins) consistently performed better than semijoins and the best R* strategies. Few of the distributed optimizer models proposed over the last decade CAPER 83, BERN 81~. CHAN 82, CHU 82, EPST 78, HEVN 79. KERS 82, ONUE 83, PERR 84, WONG 83, YAO 79, YU 831 have been validated by comparison with actual performance. The only known validations, for Distributed INGRES [STON 821 and the Crystal multicomputer [LU 853, have assumed only a high-speed local-area network linking the distributed systems. Also, the Distributed INGRES study focused primarily on reducing response time by exploiting parallelism using table partitioning and broadcast messages. In contrast, R* seeks to minimize total resonrces consumed, has not implemented table partitionings, and does not presume a network broadcast capability. There are many important questions that a thorough validation should answer: . Under what circumstances (regions of the parameter space) does the optimizer choose a suboptimal plan, or, worse, a particularly bad plan? . To which parameters is the actual performance most sensitive? . Are these parameters being modeled accurately by the optimizer? . What is the impact of variations from the optimizer’s simplifying as. Is it possible to simplify the optimizer’s model (by using heuristics, for

272 citations


Book
01 Jun 1994
TL;DR: A perspective on ML and SML/NJ and how to program with Datatypes and Solutions to Selected Exercises are presented.
Abstract: 0. A Perspective on ML and SML/NJ. I. INTRODUCTION TO PROGRAMMING IN ML. 1. Expressions. 2. Type Consistency. 3. Variables and Environments. 4. Tuples and Lists. 5. It's Easy It's "fun." 6. Patterns in Function Definitions. 7. Local Environments Using "let." 8. Exceptions. 9. Side Effects: Input and Output. II. ADVANCED FEATURES OF ML. 10. Polymorphic Functions. 11. Higher-Order Functions. 12. Defining New Types. 13. Programming with Datatypes. 14. The ML Module System. 15. Software Design Using Modules. 16. Arrays. 17. References. III. ADDITIONAL DETAILS AND FEATURES. 18. Record Structures. 19. Matches and Patterns. 20. More About Exceptions. 21. Counting with Functions as Values. 22. More About Input and Output. 23. Creating Executable Files. 24. Controlling Operator Grouping. 25. Built-In Functions of SML/NJ. 26. Summary of ML Syntax. Solutions to Selected Exercises. Index.

206 citations


Journal ArticleDOI
TL;DR: A new temporal data model designed with the single purpose of capturing the time-dependent semantics of data is described, using the notion of snapshot equivalence to map temporal relation instances and temporal operators of one existing model to equivalent instances and operators of another.

106 citations


Proceedings ArticleDOI
24 May 1994
TL;DR: This paper proposes a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments.
Abstract: The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans.Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations.

105 citations


01 Jan 1994
TL;DR: In this article, a shift in the intuition behind outer join is proposed, where instead of computing the join while also preserving its arguments, outer join delivers tuples that come either from the join or from the arguments.
Abstract: The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans.Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations.

91 citations


Journal ArticleDOI
TL;DR: This approach implements a mechanism of transaction processing to ensure that tuples are properly handled in the event of a node or communications failure, and implements a fault‐tolerance mechanism at the system level and requires little programmer effort or expertise.
Abstract: To simplify the difficult task of writing fault-tolerant parallel software, we implemented extensions to the basic functionality of the LINDA or tuple-space programming model. Our approach implements a mechanism of transaction processing to ensure that tuples are properly handled in the event of a node or communications failure. If a process retrieving a tuple fails to complete processing or a tuple posting or retrieval message is lost, the system is automatically rolled back to a previous stable state. Processing failures and lost messages are detected by time-out alarms. Roll-back is accomplished by reposting pertinent tuples. Intermediate tuples produced during partial processing are not committed or made available until a process completes. In the absence of faults, system overhead is low. The fault-tolerance mechanism is implemented at the system level and requires little programmer effort or expertise. Two implementations of the model are discussed, one using a UNIX network of workstations and one using a Transputer network. Data measuring model overhead and some aspects of system performance in the presence of faults is presented for an example system.

79 citations


Journal ArticleDOI
TL;DR: This paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation, and the presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded.
Abstract: A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions: valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. Although there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit restricted interrelationships that define several types of specialized temporal relations. This paper examines areas where different specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing. >

78 citations


Proceedings ArticleDOI
14 Feb 1994
TL;DR: A new temporal-join algorithm based on tuple partitioning is introduced that avoids the quadratic cost of nested-loop evaluation methods; it also avoids sorting.
Abstract: Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors require new techniques to efficiently evaluate valid-time joins. The authors address this need for efficient join evaluation in databases supporting valid-time. A new temporal-join algorithm based on tuple partitioning is introduced. This algorithm avoids the quadratic cost of nested-loop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. While the paper focuses on the important valid-time natural join, the techniques presented are also applicable to other valid-time joins. >

76 citations


Journal ArticleDOI
TL;DR: This paper shows that ESP is based on a new programming model called PoliS, that extends Linda with Multiple Tuple Spaces, and shows how the distributed implementation of ESP uses the network version of Linda’s tuple space.
Abstract: From the point of view of multiparadigm distributed programming one of the most interesting communication mechanisms is associative communication based on a shared dataspace, as exemplified in the Linda coordination language. In fact, Linda has been used as coordination layer to parallelize several sequential programming languages, like C and Scheme. In this paper we study the combination of Linda with a logic language, whose result is the language Extended Shared Prolog (ESP). We show that ESP is based on a new programming model called PoliS, that extends Linda with Multiple Tuple Spaces. A class of applications for ESP is discussed, introducing the concept of "open multiple tuple spaces". Finally, we show how the distributed implementation of ESP effectively uses the network version of Linda''s tuple space.

68 citations


Patent
17 Oct 1994
TL;DR: In this paper, a method for generating an indexed database stored in a computer system is presented, where each data object is defined by a respective tuple of attributes and each attribute includes at least one attribute having a domain of values that includes handwritten objects.
Abstract: A method is provided for generating an indexed database stored in a computer system. A database is established. The database includes a plurality of data objects. Each data object is defined by a respective tuple of attributes. The attributes include at least one attribute having a domain of values that includes handwritten objects. Each handwritten object includes a plurality of symbols ordered in an output sequence. An index is established, having a root node and a plurality of leaf nodes. Each leaf node is connected to the root node by a respective path, such that each path from the root node to one of the plurality of leaf nodes corresponds to a respective input sequence of symbols. The input sequence for the respective leaf node includes a set of pointers to a subset of the tuples. A respective Hidden Markov Model (HMM) is executed to analyze the output sequence of each handwritten object and to determine a respective probability that each input sequence matches the output sequence. A pointer to any tuple for which the respective output sequence has at least a threshold probability of matching the input sequence (corresponding to the leaf node) is included in the respective set of pointers in each respective leaf node. The probability is determined by the respective HMM for the output sequence of each handwritten object.

Proceedings ArticleDOI
14 Feb 1994
TL;DR: This paper proposes a complete extended relational algebra with multi-set semantics, having a clear formal background and a close connection to the standard relational algebra, that includes constructs that extend the algebra to a complete sequential database manipulation language.
Abstract: The relational data model is based on sets of tuples, i.e. it does not allow duplicate tuples an a relation. Many database languages and systems do require multi-set semantics though, either because of functional requirements or because of the high costs of duplicate removal in database operations. Several proposals have been presented that discuss multi-set semantics. As these proposals tend to be either rather practical, lacking the formal background, or rather formal, lacking the connection to database practice, the gap between theory and practice has not been spanned yet. This paper proposes a complete extended relational algebra with multi-set semantics, having a clear formal background and a close connection to the standard relational algebra. It includes constructs that extend the algebra to a complete sequential database manipulation language that can either be used as a formal background to other multi-set languages like SQL, or as a database manipulation language on its own. The practical usability of the latter option has been demonstrated in the PRISMA/DB database project, where a variant of the language has been used as the primary database language. >

Journal ArticleDOI
TL;DR: A rigorous system model is developed to facilitate the mapping between an object-oriented model and the relational model and reduces the number of left outer joins and the filters so that the query can be processed more efficiently.
Abstract: One of the approaches for integrating object-oriented programs with databases is to instantiate objects from relational databases by evaluating view queries. In that approach, it is often necessary to evaluate some joins of the query by left outer joins to prevent information loss caused by the tuples discarded by inner joins. It is also necessary to filter some relations with selection conditions to prevent the retrieval of unwanted nulls. The system should automatically prescribe joins as inner or left outer joins and generate the filters, rather than letting them be specified manually for every view definition. We develop such a mechanism in this paper. We first develop a rigorous system model to facilitate the mapping between an object-oriented model and the relational model. The system model provides a well-defined context for developing a simple mechanism. The mechanism requires only one piece of information from users: null options on an object attribute. The semantics of these options are mapped to non-null constraints on the query result. Then the system prescribes joins and generates filters accordingly. We also address reducing the number of left outer joins and the filters so that the query can be processed more efficiently. >

Proceedings ArticleDOI
24 May 1994
TL;DR: This work presents one possible architecture for performing complex object reclustering in an on-line manner that is adaptive to changing usage patterns and shows that the average miss rate for object accesses can be effectively reduced using a combination of rules developed for deciding when cluster analyses and reorganizations should be performed.
Abstract: A likely trend in the development of future CAD, CASE and office information systems will be the use of object-oriented database systems to manage their internal data stores. The entities that these applications will retrieve, such as electronic parts and their connections or customer service records, are typically large complex objects composed of many interconnected heterogeneous objects, not thousands of tuples. These applications may exhibit widely shifting usage patterns due to their interactive mode of operation. Such a class of applications would demand clustering methods that are appropriate for clustering large complex objects and that can adapt on-line to the shifting usage patterns. While most object-oriented clustering methods allow grouping of heterogeneous objects, they are usually static and can only be changed off-line. We present one possible architecture for performing complex object reclustering in an on-line manner that is adaptive to changing usage patterns. Our architecture involves the decomposition of a clustering method into concurrently operating components that each handle one of the fundamental tasks involved in reclustering, namely statistics collection, cluster analysis, and reorganization. We present the results of an experiment performed to evaluate its behavior. These results show that the average miss rate for object accesses can be effectively reduced using a combination of rules that we have developed for deciding when cluster analyses and reorganizations should be performed.

Patent
Rakesh Agrawal1, Arun N. Swami1
14 Apr 1994
TL;DR: In this paper, a database management system determines, in a single pass over an unordered database, the quantile information, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison.
Abstract: A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query plan.

Proceedings Article
12 Sep 1994
TL;DR: In this paper, the authors propose a general strategy for the optimization of nested OOSQL queries in the algebraic language ADL, and by means of algebraic rewriting nested queries are transformed into join queries as far as possible.
Abstract: Most declarative SQL-like query languages for object-oriented database systems are orthogonal languages allowing for arbitrary nesting of expressions in the select-, from-, and where-clause. Expressions in the from-clause may be base tables as well as set-valued attributes. In this paper, we propose a general strategy for the optimization of nested OOSQL queries. As in the relational model, the translation/optimization goal is to move from tuple- to set-oriented query processing. Therefore, OOSQL is translated into the algebraic language ADL, and by means of algebraic rewriting nested queries are transformed into join queries as far as possible. Three different optimization options are described, and a strategy to assign priorities to options is proposed.

Patent
19 Oct 1994
TL;DR: In this paper, the authors describe a method for performing a right outer join of database tables without sorting the inner table (T2), where the processing of each tuple in T1 includes the preservation in the joint output of all tuples in T2 which are in its responsibility region.
Abstract: A computer database system utilizes a method for performing a right outer join of database tables without sorting the inner table (T2). The processing of each tuple in the outer table (T1) includes the preservation in the joint output of all tuples in T2 which are in its responsibility region. The initialization step of the process preserves in the join output all of the tuples in T2 which have column set values less than the lowest column set value in T1, i.e. the first tuple in T1, since T1 is sorted or accessed using a sorted index. The responsibility region for tuples in T1, other than the last tuple, is defined as those tuples which have column set values less than the column set value for the next tuple in T1 and greater than or equal to the column set value for the current T1 tuple. The last tuple in T1 must preserve all of the tuples in T2 which have not already been preserved in T2, i.e. all tuples greater than or equal to its column set value. If T1 has duplicate values for the column set value, only the last one preserves the associated T2 tuples. Additional methods for parallel execution of the outer join methods and methods for applying the outer join methods to subqueries (i.e., an All (or universal) Right Join (ARJOIN) and an Existential Right Join (ERJOIN)) are described.

Journal ArticleDOI
TL;DR: A self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during query processing to minimize redistribution cost and shows that under the authors' assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.

Proceedings ArticleDOI
24 May 1994
TL;DR: This work examines a wide range of acyclic graphs with varying density and “locality” of arcs in the graph, measuring a number of different cost metrics, giving a good understanding of the predictive power of these metrics with respect to I/O cost.
Abstract: We present a comprehensive performance evaluation of transitive closure (reachability) algorithms for databases. The study is based upon careful implementations of the algorithms, measures page I/O, and covers algorithms for full transitive closure as well as partial transitive closure (finding all successors of each node in a set of given source nodes). We examine a wide range of acyclic graphs with varying density and “locality” of arcs in the graph. We also consider query parameters such as the selectivity of the query, and system parameters such as the buffer size and the page and successor list replacement policies. We show that significant cost tradeoffs exist between the algorithms in this spectrum and identify the factors that influence the performance of the algorithms.An important aspect of our work is that we measure a number of different cost metrics, giving us a good understanding of the predictive power of these metrics with respect to I/O cost. This is especially significant since metrics such as number of tuples generated or number of successor list operations have been widely used to compare transitive closure algorithms in the literature. Our results strongly suggest that these other metrics cannot be reliability used to predict I/O cost of transitive closure evaluation.

Journal ArticleDOI
TL;DR: A new partitioning strategy, multiattribute grid declustering (MAGIC), which can use two or more attributes of a relation to decluster its tuples across multiple processors and disks, unlike other multiattribute partitioning mechanisms that have been proposed.
Abstract: During the past decade, parallel database systems have gained increased popularity due to their high performance, scalability, and availability characteristics. With the predicted future database sizes and complexity of queries, the scalability of these systems to hundreds and thousands of processors is essential for satisfying the projected demand. Several studies have repeatedly demonstrated that both the performance and scalability of a parallel database system are contingent on the physical layout of the data across the processors of the system. If the data are not declustered appropriately, the execution of an operation might waste system resources, reducing the overall processing capability of the system. With earlier, single-attribute partitioning mechanisms such as those found in the Tandem, Teradata, Gamma, and Bubba parallel database systems, range selections on any attribute other than the partitioning attribute must be sent to all processors containing tuples of the relation, while range selections on the partitioning attribute can be directed to only a subset of the processors. Although using all the processors for an operation is reasonable for resource intensive operations, directing a query with minimal resource requirements to processors that contain no relevant tuples wastes CPU cycles, communication bandwidth, and I/O bandwidth. As a solution, this paper describes a new partitioning strategy, multiattribute grid declustering (MAGIC), which can use two or more attributes of a relation to decluster its tuples across multiple processors and disks. In addition, MAGIC declustering, unlike other multiattribute partitioning mechanisms that have been proposed, is able to support range selections as well as exact match selections on each of the partitioning attributes. This capability enables a greater variety of selection operations to be directed to a restricted subset of the processors in the system. Finally, MAGIC partitions each relation based on the resource requirements of the queries that constitute the workload for the relation and the processing capacity of the system in order to ensure that the proper number of processors are used to execute queries that reference the relation. >

Journal ArticleDOI
TL;DR: This paper describes I-BOL-an application-specific high level programming language intended for implementing low-level image processing applications on parallel architectures, designed to be capable of implementation on distributed memory parallel machines such as transputer networks.

Proceedings ArticleDOI
09 Oct 1994
TL;DR: In order to combine the classification and the syntactic analysis steps, the authors propose an augmentation technique for carrying out decision tree classification.
Abstract: A new method is presented for analyzing document images and constructing a database from them. This method is implemented in an electronic library system named CyberMagazine, where document images are sequentially converted into database tuples by block segmentation, rough classification, and syntactic analysis. CyberMagazine's image understanding method combines the use of decision tree classification and syntactic analysis using a newly presented matrix grammar. In order to combine the classification and the syntactic analysis steps, the authors propose an augmentation technique for carrying out decision tree classification. Experimental results subsequently demonstrate that high understanding accuracy is obtained by adequate augmentation.

Proceedings ArticleDOI
16 May 1994
TL;DR: This work identifies common classes of constraints whose enforcement is free of both static and dynamic inference channels, and extends the integrity checking mechanism by proper update semantics to removeynamic inference channels in the enforcement of more general classes of constraint.
Abstract: A multilevel relational database with tuple-level labeling is a relational database together with a mapping, which associates a set of levels in a security lattice with every tuple in every relation in the database. Integrity constraints represent the invariant properties of data in a multilevel relational database. We study the relationship between integrity and secrecy, and show that they are not necessarily in fundamental conflict. We identify common classes of constraints whose enforcement is free of both static and dynamic inference channels. We also extend the integrity checking mechanism by proper update semantics to remove dynamic inference channels in the enforcement of more general classes of constraints. >

Book ChapterDOI
16 Aug 1994
TL;DR: A data structure (UDS) for supporting database retrieval, inference and machine learning that attempts to unify and extend previous work in relational databases, semantic networks, conceptual graphs, RETE, neural networks and case-based reasoning is given.
Abstract: This paper gives a data structure (UDS) for supporting database retrieval, inference and machine learning that attempts to unify and extend previous work in relational databases, semantic networks, conceptual graphs, RETE, neural networks and case-based reasoning. Foundational to this view is that all data can be viewed as a primitive set of objects and mathematical relations (as sets of tuples) over those objects. The data is stored in three partially-ordered hierarchies: a node hierarchy, a relation hierarchy, and a conceptual graphs hierarchy. All three hierarchies can be stored as "levels" in the conceptual graphs hierarchy. These multiple hierarchies support multiple views of the data with advantages over any of the individual methods. In particular, conceptual graphs are stored in a relation-based compact form that facilitates matching. UDS is currently being implemented in the Peirce conceptual graphs workbench and is being used as a domain-independent monitor for state-space search domains at a level that is faster than previous implementations designed specifically for those domains.In addition it provides a useful environment for pattern-based machine learning.

Proceedings ArticleDOI
29 Nov 1994
TL;DR: The study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for discovering decision rules in database systems.
Abstract: We develop an attribute-oriented rough set approach for the discovery of decision rules in relational databases. Our approach combines machine learning techniques and rough set theory. We consider a learning procedure to consist of the two phases data generalization and data reduction. In the data generalization phase, utilizing knowledge about concept hierarchies and relevance of the data, an attribute-oriented induction is performed attribute by attribute. Some undesirable attributes of the discovery task are removed and the primitive data in the databases are generalized to the desirable level; this process greatly decreases the number of tuples which must be examined for the discovery task and substantially reduces the computational complexity of the database learning processes. Subsequently, in data reduction phase, rough set theory is applied to the generalized relation; the cause-effect relationships among the condition and decision attributes in the databases are analyzed and the non-essential or irrelevant attributes to the discovery task are eliminated without losing information of the original database system. This process further reduces the generalized relation. Thus very concise and more accurate decision rules for each class in the decision attribute with little or no redundancy information, can be extracted automatically from the reduced relation during the learning process. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for discovering decision rules in database systems.

Proceedings ArticleDOI
01 Jul 1994
TL;DR: An algebra of infinitary tuples is presented and its unification problem is solved and the resulting type discipline preserves principal typings and has a terminating type reconstruction algorithm.
Abstract: An ML-style type system with variable-arity procedures is defined that supports both optional arguments and arbitrarily-long argument sequences. A language with variable-arity procedures is encoded in a core-ML variant with infinitary tuples. We present an algebra of infinitary tuples and solve its unification problem. The resulting type discipline preserves principal typings and has a terminating type reconstruction algorithm. The expressive power of infinitary tuples is illustrated.

Book ChapterDOI
16 Oct 1994
TL;DR: It is claimed that, whenever possible, the training/test examples should be represented as ground Horn clauses, rather than as tuples of a relational database or facts of a Prolog database.
Abstract: In the paper, we present some learning tasks that cannot be solved by two wellknown systems, FOIL and FOCL. Two kinds of explanations can be provided for these failures. For some tasks, the failures can be ascribed to a wrong definition of the space in which these systems perform the search for logical definitions. By moving from θ-subsumption to a weaker, but more mechanizable and manageable, model of generalization, called θOI-subsumption, a new search space is defined in which such tasks can be solved. Such a solution has been implemented in a new version of FOCL, called FOCL-OI. However, other learning tasks cannot be solved by changing the search space. For these tasks, the conceptual problem detected both in FOIL and in FOCL concerns the generation of meaningless rules, which do not mirror at all the structure of the training instances. We claim that, whenever possible, the training/test examples should be represented as ground Horn clauses, rather than as tuples of a relational database or facts of a Prolog database.

01 Jan 1994
TL;DR: An abstract type is obtained by a tuple type which contains a type and a set of constants and operations over that type and can be implemented by providing a type, a value of that type, and an operation from that type to Int.
Abstract: tuples An abstract type is obtained by a tuple type which contains a type and a set of constants and operations over that type. . Let T = Tuple A::TYPE a:A f(x:A):Int end; ‡ Let T::TYPE = Tuple A::TYPE a:A f(x:A):Int end This abstract type can be implemented by providing a type, a value of that type, and an operation from that type to Int:

01 Jan 1994
TL;DR: A functional database language OR-SML for handling disjunctive information in database queries, and its implementation on top of Standard ML, which has the power of the nested relational algebra and is augmented with or-sets, which gives the language a greater flexibility.
Abstract: We describe a functional database language OR-SML for handling disjunctive information in database queries, and its implementation on top of Standard ML. The core language has the power of the nested relational algebra, and it is augmented with or-sets which are used to deal with disjunctive information. Sets, or-sets and tuples can be freely combined to create objects, which gives the language a greater flexibility. We give examples of queries which require disjunctive information (such as querying incomplete or independent databases) and show how to use the language to answer these queries. Since the system runs on top of Standard ML and all database objects are values in the latter, the system benefits from combining a sophisticated query language with the full power of a programming language. OR-SML includes a number of primitives that deal with bags and aggregate functions. It is also configurable by user-defined base types. The language has been implemented as a library of modules in Standard ML. This allows the user to build just the database language as an independent system, or to interface it to other systems built in Standard ML. We give an example of connecting OR-SML with an already existing interactive theorem prover.

Proceedings ArticleDOI
12 Oct 1994
TL;DR: The design of a parallel library for MPI based on the Linda programming paradigm, called Eilean, provides a scalable distribution of the tuple space through a hierarchical (or cluster) partitioning scheme, and tuple type specific access/distribution policies.
Abstract: We introduce the design of a parallel library for MPI based on the Linda programming paradigm, called Eilean. It provides a scalable distribution of the tuple space through a hierarchical (or cluster) partitioning scheme, and tuple type specific access/distribution policies. Portability of the library is achieved using the message passing standard MPI as the underlying communication system. The hierarchical distribution provides a static, yet general partition of the tuple space through the available processors. With such general structure, the run-time system, aided by programmer hints, can map tuples closely to the processes which use them. To accomplish the tuple mapping task, Eilean treats tuples differently according to their use within the application. The required information about tuple access patterns is explicitly given by the programmer and used during run-time. >