scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2001"


Proceedings ArticleDOI
29 Nov 2001
TL;DR: The empirical results show that the algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though it has to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
Abstract: As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets is to use graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs.The authors present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.

1,181 citations


Proceedings ArticleDOI
27 Mar 2001
TL;DR: This work describes techniques for compressing the graph structure of the Web, and gives experimental results of a prototype implementation and attempts to exploit a variety of different sources of compressibility of these graphs and of the associated set of URLs in order to obtain good compression performance on a large Web graph.
Abstract: A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the Web. However, since the graphs in these scenarios involve hundreds of millions of nodes and even more edges, highly space-efficient data structures are needed to fit the data in memory. A first step in this direction was done by the DEC connectivity server, which stores the graph in compressed form. We describe techniques for compressing the graph structure of the Web, and give experimental results of a prototype implementation. We attempt to exploit a variety of different sources of compressibility of these graphs and of the associated set of URLs in order to obtain good compression performance on a large Web graph.

140 citations


Proceedings ArticleDOI
31 Mar 2001
TL;DR: QuerySketch, a financial database application in which graphs are used for query input as well as output, allows users to sketch a graph freehand, then view stocks whose price histories match the sketch.
Abstract: Sequential data is easily understood through a simple line graph, yet systems to search such data typically rely on complex interfaces or query languages. This paper presents QuerySketch, a financial database application in which graphs are used for query input as well as output. QuerySketch allows users to sketch a graph freehand, then view stocks whose price histories match the sketch. Using the same graphical format for both input and output results in an interface that is powerful, flexible, yet easy to use.

117 citations


Patent
11 Dec 2001
TL;DR: In this article, a database query optimizer constructs a graph comprising nodes, relations, and expressions, and then constructs execution plans for sub-parts of the graph, making up the overall execution plan for the query.
Abstract: A database query optimizer constructs a graph comprising nodes, relations, and expressions. The query optimizer then constructs execution plans for sub-parts of the graph. The combination of execution plans make up the overall execution plan for the query. The execution plan information is appended to the graph itself, allowing changing an execution plan in one portion of the graph without necessarily changing execution plans in other portions of the graph. By representing a query using the graph of the preferred embodiments that includes execution plan information, the query optimizer is able to evaluate the execution plans of different options quickly and efficiently, thereby enhancing the performance of the query optimizer.

103 citations


Journal ArticleDOI
TL;DR: This work proposes a graph-based approach to generate various types of association rules from a large database of customer transactions, and shows that its algorithms outperform other algorithms which need to make multiple passes over the database.
Abstract: Mining association rules is an important task for knowledge discovery. We can analyze past transaction data to discover customer behaviors such that the quality of business decisions can be improved. Various types of association rules may exist in a large database of customer transactions. The strategy of mining association rules focuses on discovering large item sets, which are groups of items which appear together in a sufficient number of transactions. We propose a graph-based approach to generate various types of association rules from a large database of customer transactions. This approach scans the database once to construct an association graph and then traverses the graph to generate all large item sets. Empirical evaluations show that our algorithms outperform other algorithms which need to make multiple passes over the database.

84 citations


Proceedings ArticleDOI
25 Jul 2001
TL;DR: This paper characterize the logical graph using various metrics and identifies the presence of power laws in the number of customers that a provider has and defines a structural model of the AS graph, which highlights the hierarchical nature of logical relationships and the preferential connection to larger providers.
Abstract: The study of the Internet topology has recently received much attention from the research community. In particular, the observation that the network graph has interesting properties, such as power laws, that might be explored in a myriad of ways. Most of the work in characterizing the Internet graph is based on the physical network graph, i.e., the connectivity graph. In this paper we investigate how logical relationships between nodes of the AS graph can be used to gain insight to its structure. We characterize the logical graph using various metrics and identify the presence of power laws in the number of customers that a provider has. Using these logical relationships we define a structural model of the AS graph. The model highlights the hierarchical nature of logical relationships and the preferential connection to larger providers. We also investigate the consistency of this model over time and observe interesting properties of the hierarchical structure.

73 citations


Journal ArticleDOI
01 Jun 2001
TL;DR: The problem of finding traversal patterns from collections of frequently occurring access sequences is examined and three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented.
Abstract: In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, the problem of finding traversal patterns from such collections is examined. The determination of patterns is based on the graph structure of the model. For this purpose, three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.

66 citations


01 Dec 2001
TL;DR: This paper demonstrates the application of graph drawing and information visualisation techniques to the visualisation of information which can be modelled as an attributed graph, and proposes the novel Composable Layouts and Visual Sets (CLOVIS) class of views.
Abstract: This paper demonstrates the application of graph drawing and information visualisation techniques to the visualisation of information which can be modelled as an attributed graph. An attributed graph can be used to model a wide range of different types of information, including system descriptions and database content. We propose the novel Composable Layouts and Visual Sets (CLOVIS) class of views, and describe supporting software component infrastructure, including a user interface for creating and interacting with CLOVIS views. A framework for composing graph vertex layouts is presented, including the division of responsibilities between the layout strategies being composed and the mechanism for coordinating their execution. Three broad classes of layout strategy are identified, and opportunities for novel hybrid layouts highlighted. The definition of sets of graph elements and the allocation or overlaying of distinctive visual attributes to members of the set are combined under the notion of a visual set. A visual querying mechanism for the allocation of graph elements to visual sets and for the clustering of graph vertices in preparation for layout composition is described. The versatility of the CLOVIS view family is demonstrated through its application to a variety of problem domains, and future research directions are identified.

26 citations


Journal ArticleDOI
TL;DR: The paper discusses the formulation, evaluation, expressiveness, and optimization of Hyperlog queries and programs and compares and contrast the approach with work in a number of related areas, including visual database languages, graph based data models, database update languages, and production rule systems.
Abstract: Hyperlog is a declarative, graph based language that supports database querying and update. It visualizes schema information, data, and query output as sets of nested graphs, which can be stored, browsed, and queried in a uniform way. Thus, the user need only be familiar with a very small set of syntactic constructs. Hyperlog queries consist of a set of graphs that are matched against the database. Database updates are supported by means of programs consisting of a set of rules. The paper discusses the formulation, evaluation, expressiveness, and optimization of Hyperlog queries and programs. We also describe a prototype implementation of the language and we compare and contrast our approach with work in a number of related areas, including visual database languages, graph based data models, database update languages, and production rule systems.

22 citations


Proceedings ArticleDOI
02 Oct 2001
TL;DR: This paper implemented a system that builds an understanding of a given conventional database by taking these characteristics as input and produces the corresponding object-oriented database as output, and derives a graph that summarizes the conceptual model.
Abstract: The object-oriented data model is predicted to be the heart of the next generation of database systems. Users want to move from old legacy databases into applying this new technology that provides extensibility and flexibility in maintenance. However, a major limitation on the wide acceptance of object-oriented databases is the amount of time and money invested on existing database applications, which are based on conventional legacy systems. Users do not want to loose the huge amounts of data present in conventional databases. This paper presents a novel approach to transform a given conventional database into an object-oriented database. It is assumed that the necessary characteristics of the conventional database to be re-engineered are known and available. The source of these characteristics might be the data dictionary and/or an expert in the given conventional database. We implemented a system that builds an understanding of a given conventional database by taking these characteristics as input and produces the corresponding object-oriented database as output. The system derives a graph that summarizes the conceptual model. Links in the graph are classified into inheritance links and aggregation links. This classification leads to the class hierarchy. Finally, we handle the migration of data from the conventional database to the constructed object-oriented database.

18 citations


01 Dec 2001
TL;DR: This paper introduces a new model for Attributed Information Visualization that can be used to visualize the relational data with many associated attributes, and split the visualization model into two mappings between the abstract data and physical pictures, Geometric Mapping and Graphical Mapping.
Abstract: Traditional graph drawing is only concerned with viewing of abstract data and relations amount data items. It uses a graph model to present the data items and the relations and tries to geometrically convert the abstract graph into a 2D plane for visualization. There are many applications in this area, such as family trees, software design diagrams and web site-maps.The real world data that we want to visualize, however, is more complex than those that can be simply presented with traditional techniques, because it contains many domain-specific attributes.For example, in a web site-map (visualization) a simple graphical node can be used to represent a web page. However, this node is unable to represent some domain-specific attributes associated with the web page, such as the access frequency and the connectivity of the page in the web locality.This paper introduces a new model for Attributed Information Visualization that can be used to visualize the relational data with many associated attributes. We split the visualization model into two mappings between the abstract data and physical pictures, Geometric Mapping and Graphical Mapping.

Journal ArticleDOI
TL;DR: JGAP can be viewed as a visual graph calculator for helping experiment with and teach graph algorithm design and includes a performance meter to measure the execution time of implemented algorithms.
Abstract: We describe JGAP, a web-based platform for designing and implementing Java-coded graph algorithms. The platform contains a library of common data structures for implementing graph algorithms, features a "plug-and-play" modular design for adding new algorithm modules, and includes a performance meter to measure the execution time of implemented algorithms. JGAP is also equipped with a graph editor to generate and modify graphs to have specific properties. JGAP"s graphic user interface further allows users to compose, in a functional way, computation sequences from existing algorithm modules so that output from an algorithm is used as input for another algorithm. Hence, JGAP can be viewed as a visual graph calculator for helping experiment with and teach graph algorithm design. Copyright 2001 John Wiley & Sons, Lt.

01 Jan 2001
TL;DR: A data representation based on graph theory which captures the highly interconnected structure of genome data is developed which serves as the foundation of a graph database management system.
Abstract: Genome databases have specific requirements which limit the usefulness of some database management systems. By using more appropriate database technology, a database system can be developed for genome data. We have developed a data representation based on graph theory which captures the highly interconnected structure of genome data. Graphs are a language which can be tailored for describing genomic information, and we develop a data model based on graphs which serves as the foundation of a graph database management system. IEEE Engineering in Medicine and Biology special issue on Managing Data for the Human Genome Project.

Journal ArticleDOI
TL;DR: It is found that the proposed graph indexing technique is a promising approach for significantly reducing costs of spatial queries and can improve significantly on the efficiency of constrained queries on spatial data.

Book ChapterDOI
TL;DR: A 3-D graphic data model based on XML that accommodates the semantics of3-D scenes that offers content-based retrievals of scenes containing a particular object or those satisfying certain spatial constraints on them is developed.
Abstract: Supporting the semantics of 3-D objects and their spatial relations in database systems has been little addressed in the literature. Despite its importance, most 3-D graphic systems lack this capability, mainly focusing on the visualization aspects of 3-D images. We have developed a 3-D graphic data model based on XML that accommodates the semantics of 3-D scenes. This model offers content-based retrievals of scenes containing a particular object or those satisfying certain spatial constraints on them. The model represents scenes as compositions of 3-D graphic objects with associated spatial relations. Complex 3-D objects are modeled using a set of primitive 3-D objects rather than the lines and polygons that are found in traditional graphic systems. This paper presents the data model and its implementation called 3DGML, an XML vocabulary that we developed for modeling 3-D graphic data. This paper also describes a Web-based prototype database system that we developed to support the data model.

Book ChapterDOI
16 Apr 2001
TL;DR: This paper presents a novel graph-based algorithm for solving the semi-supervised learning problem that makes use of the recent advances in stochastic graph sampling technqiue and a modeling of the labeling consistency in semi- supervised learning.
Abstract: This paper presents a novel graph-based algorithm for solving the semi-supervised learning problem. The graph-based algorithm makes use of the recent advances in stochastic graph sampling technqiue and a modeling of the labeling consistency in semi-supervised learning. The quality of the algorithm is empirically evaluated on a synthetic clustering problem. The semi-supervised clustering is also applied to the problem of symptoms classification in medical image database and shows promising results.

Book ChapterDOI
25 Sep 2001
TL;DR: The abstract database machine is designed to meet two goals: to be expressive enough to implement queries and updates, as considered for schema design, and to be simple enough to allow cost estimations.
Abstract: The process of designing an object-oriented database schema consists of several phases. During the phase of abstract logical formalisation one of many possible abstract object-oriented database schemas must be chosen. This choice can be driven by the costs of the ultimately implemented schema: How much space is needed? How long does it take to compute queries and updates including enforcement of semantic constraints? Because abstract logical formalisation is done independently of an actual database management system, we need an abstract database machine. Queries and updates are formulated as programs for this database machine. Such programs are composed of steps which are connected by channels for typed streams of value lists. In each step, a basic or compound operation is executed, accepting input streams and further parameters, delivering output streams for subsequent steps, and accessing the persistent database state. The abstract database machine is designed to meet two goals: to be expressive enough to implement queries and updates, as considered for schema design, and to be simple enough to allow cost estimations.

01 Jan 2001
TL;DR: A cluster analysis-based approach to semi-automate the IRI process, which is typically very time-consuming and requires extensive human interaction, and initial experimental results indicate that this approach performs better than existing approaches in the accuracy of identified interschema relationships.
Abstract: Interschema Relationship Identification (IRI), i.e., determining the relationships between objects in heterogeneous database schemas, is critical to both the classical schema integration problem and the data cleansing and consolidation phase that precedes data warehouse development. In this paper we propose a cluster analysis-based approach to semi-automate the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and Self-Organizing Map (SOM), to identify similar database objects from heterogeneous databases based on a combination of features such as object names, documentation, schematic information, data contents, and usage patterns. Initial experimental results indicate that our approach performs better than existing approaches in the accuracy of identified interschema relationships. In addition, a prototype system we have developed provides users a visualization tool for the display of clustering results as well as for the incremental evaluation of candidate solutions.

Proceedings ArticleDOI
Simon M. Lucas1, A.C. Tams1, S.J. Cho1, S. Ryu2, A.C. Downton1 
10 Sep 2001
TL;DR: A novel robust approach to enable efficient searching of the type-written text on museum archive cards by sliding a classifier over the entire word or card image, such that a set of recognition hypotheses for each possible window position gives rise to a large character hypothesis graph.
Abstract: We describe a novel robust approach to enable efficient searching of the type-written text on museum archive cards. Depending on such factors as the state of the typewriter and its ribbon, these text images may be faint with parts of the character missing, or be in heavy type with adjacent characters merging together. Both these problems can make this kind of text hard to read with conventional OCR methods that rely on the use of a limited number of segmentation hypotheses prior to recognition. Our method involves sliding a classifier over the entire word or card image, such that we get a set of recognition hypotheses for each possible window position which gives rise to a large character hypothesis graph. We then apply a graph reduction followed by an efficient graph search method to search for words in the reduced graph. Results so far are promising, with our system achieving 45% word recognition accuracy compared to the 25% achieved by a leading commercial package. However, searching the original larger graphs is much slower but yields 85% accuracy; so further work is needed either in improving the graph reduction method, or in improving the efficiency with which we can search the larger graph.

Book ChapterDOI
23 Sep 2001
TL;DR: The Graph Drawing Server (GDS) seeks to remove many obstacles by providing a graph drawing and translation service with an easy-to-use webbased interface, and a user needs only a commonly-available web browser to access a variety of algorithms.
Abstract: There are many obstacles in the way of someone wishing to make use of existing graph drawing technology— software installation and data conversion can be time-consuming and may be prohibitively difficult for the casual or novice user, and software may be limited to a particular platform or provided interface. The Graph Drawing Server (GDS) [2] seeks to remove many of these obstacles by providing a graph drawing and translation service with an easy-to-use webbased interface1. A user needs only a commonly-available web browser to access a variety of algorithms, without having to install any additional software or do any format translations (once the data is in one of many supported formats). GDS has received over 62,000 requests from 43 countries since June 1996.

Proceedings ArticleDOI
02 Apr 2001
TL;DR: The proposed GOOM aims at extracting the positive features of both object and relational data models besides providing a structured graphic representation and maintains a very similar interface to that of widely accepted SQL.
Abstract: In an effort to design and develop a suitable platform for the next generation information systems, a new Graph Object Oriented database Model (GOOM) is proposed. In fact, the proposed model in its present definition is an improvement of our earlier work (S. Choudhury et al., 1998; 2000). The evolution is found to be absolutely essential in the interest of implementing the model and towards building a reliable Graph Query Language (GQL). We have introduced the concept of encapsulating homogeneous entities into semantic group while permitting the declaration of relationships amongst different semantic groups from various user-defined views. The proposed model aims at extracting the positive features of both object and relational data models besides providing a structured graphic representation. The paper emphasizes the mathematical foundation of the proposed graph language. In fact, the proposed GQL maintains a very similar interface to that of widely accepted SQL.

Journal Article
TL;DR: In this paper, the authors extend FARGs single mode attribute to multiple attributes for real image application and present a new CBIR using FMARG(Fuzzy Multiple Attribute Relational Graph), which can handle queries involving multiple attributes, not only object label, but also color, texture and spatial relation.
Abstract: In this paper, we extend FARGs single mode attribute to multiple attributes for real image application and present a new CBIR using FMARG(Fuzzy Multiple Attribute Relational Graph), which can handle queries involving multiple attributes, not only object label, but also color, texture and spatial relation. In the experiment using the synthetic image database of 1,024 images and the natural image database of 1.026 images built from NETRA database and Corel Draw, the proposed approach shows 6~30% recall increase in the synthetic image database and a good performance, at the displacements and the retrieved number of similar images in the natural image database, compared with the single attribute approach.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the requirements of a specification method for mobile code applications and analyze to what extent graph transformation systems can be used to meet these requirements, and suggest some extensions to the theory of graph transformation which seem to be desirable to cope with this kind of applications.

Book ChapterDOI
23 Oct 2001
TL;DR: This paper presents a web-based information retrieval system for 3-D graphic data that offers a content-based retrieval for3-D scenes that few graphic database systems are capable of.
Abstract: This paper presents a web-based information retrieval system for 3-D graphic data. We describe a 3-D database system and its web-based user interface supporting semantics of 3-D objects. Our system offers a content-based retrieval for 3-D scenes that few graphic database systems are capable of. The user can pose a visual query involving 3-D shapes and spatial relations on the web interface. The data model underlying the retrieval system models 3-D scenes using domain objects and their spatial relations. An XML-based data modeling language called 3DGML has been designed to support the data model. It offers an object-oriented 3-D image modeling mechanism that separates low level implementation details of 3-D objects from their semantic roles in a 3-D scene. We discuss the retrieval system and the data modeling technique in detail. We believe our work is one of the earliest efforts to take advantage of XML for 3-D graphics.

Patent
06 Dec 2001
TL;DR: In this paper, a method for packaging an object graph including receiving a usage variable specification that includes a set of usages each usage specifying an attribute of an object in the object graph, creating a transient object graph representation containing the attribute specified in the variable usage specification, and packaging the transient object graphs representation.
Abstract: A method for packaging an object graph including receiving a usage variable specification that includes a set of usages each usage specifying an attribute of an object in the object graph, creating a transient object graph representation containing the attribute specified in the variable usage specification, and packaging the transient object graph representation.

Journal Article
TL;DR: This paper compares the knowledge graph ontology with other ontology, such as Aristotle's?Kant's and Peirce's, and the classification of logic words in natural language processing, which shows that knowledge graph theory is more primitive than others.
Abstract: Knowledge graph theory is a new method of knowledge representation.In this paper,we compare the knowledge graph ontology with other ontology,such as Aristotle's?Kant's and Peirce's.As a result,knowledge graph theory is more primitive than others.On the base of the comparing,the classification of logic words in natural language processing is also studied.The logic words are classified into two kinds,according to their different structures in knowledge graphs.For each kind of the logic words,we analysis the word graphs in the form of the knowledge graph,respectively.So,the idea of"structure is meaning"is expressed more clearly.

Proceedings ArticleDOI
17 Dec 2001
TL;DR: Fault tolerance of functional programming based on graph reduction is proposed, which stores the received graph as a message log and an erroneous task is recovered by using the checkpoint and the stored graph.
Abstract: Recently, parallel computing has been applied to many systems. Functional programming is suitable for parallel programming because of referential transparency and is applied to symbol processing systems and parallel database systems. Programs with some functional programming can be regarded as graphs and are processed in terms of reduction of the corresponding graph. The paper proposes fault tolerance of functional programming based on graph reduction. The proposed method stores the received graph as a message log and an erroneous task is recovered by using the checkpoint and the stored graph. Computer simulations reveal that the time overhead of the proposed method is small. If the checkpoint interval is 30 seconds and the number of tasks is 3, for example, the time overhead is less than 10%.

Proceedings Article
01 Jan 2001
TL;DR: In this article, the authors propose an information structure graph (ISG) to represent meta-data that are not managed as a database schema, which is an abstraction of a data creation structure and may be applied to enhance our understanding of data.
Abstract: For an effective management of data, we need various kinds of meta-data. This article proposes a scheme--an information structure graph (ISG)--to represent meta-data that are not managed as a database schema. An ISG is a directed graph, where nodes represent data objects. It is built on a database schema and extends it to include data creation structures. For each data object in a database schema, an ISG shows its input data objects and a data creation type. An ISG is an abstraction of a data creation structure and may be applied to enhance our understanding of data.