scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2006"


Book
01 Sep 2006
TL;DR: A comparison to Other Graph-Based Mining Algorithms and Comparison to Frequent Substructure Mining Approaches shows that the former is superior to the latter, and the latter is less effective than the former.
Abstract: Preface. Acknowledgments. Contributors. 1 INTRODUCTION (Lawrence B. Holder and Diane J. Cook). 1.1 Terminology. 1.2 Graph Databases. 1.3 Book Overview. References. Part I GRAPHS. 2 GRAPH MATCHING-EXACT AND ERROR-TOLERANT METHODS AND THE AUTOMATIC LEARNING OF EDIT COSTS (Horst Bunke and Michel Neuhaus). 2.1 Introduction. 2.2 Definitions and Graph Matching Methods. 2.3 Learning Edit Costs. 2.4 Experimental Evaluation. 2.5 Discussion and Conclusions. References. 3 GRAPH VISUALIZATION AND DATA MINING (Walter Didimo and Giuseppe Liotta). 3.1 Introduction. 3.2 Graph Drawing Techniques. 3.3 Examples of Visualization Systems. 3.4 Conclusions. References. 4 GRAPH PATTERNS AND THE R-MAT GENERATOR (Deepayan Chakrabarti and Christos Faloutsos). 4.1 Introduction. 4.2 Background and Related Work. 4.3 NetMine and R-MAT. 4.4 Experiments. 4.5 Conclusions. References. Part II MINING TECHNIQUES. 5 DISCOVERY OF FREQUENT SUBSTRUCTURES (Xifeng Yan and Jiawei Han). 5.1 Introduction. 5.2 Preliminary Concepts. 5.3 Apriori-based Approach. 5.4 Pattern Growth Approach. 5.5 Variant Substructure Patterns. 5.6 Experiments and Performance Study. 5.7 Conclusions. References. 6 FINDING TOPOLOGICAL FREQUENT PATTERNS FROM GRAPH DATASETS (Michihiro Kuramochi and George Karypis). 6.1 Introduction. 6.2 Background Definitions and Notation. 6.3 Frequent Pattern Discovery from Graph Datasets-Problem Definitions. 6.4 FSG for the Graph-Transaction Setting. 6.5 SIGRAM for the Single-Graph Setting. 6.6 GREW-Scalable Frequent Subgraph Discovery Algorithm. 6.7 Related Research. 6.8 Conclusions. References. 7 UNSUPERVISED AND SUPERVISED PATTERN LEARNING IN GRAPH DATA (Diane J. Cook, Lawrence B. Holder, and Nikhil Ketkar). 7.1 Introduction. 7.2 Mining Graph Data Using Subdue. 7.3 Comparison to Other Graph-Based Mining Algorithms. 7.4 Comparison to Frequent Substructure Mining Approaches. 7.5 Comparison to ILP Approaches. 7.6 Conclusions. References. 8 GRAPH GRAMMAR LEARNING (Istvan Jonyer). 8.1 Introduction. 8.2 Related Work. 8.3 Graph Grammar Learning. 8.4 Empirical Evaluation. 8.5 Conclusion. References. 9 CONSTRUCTING DECISION TREE BASED ON CHUNKINGLESS GRAPH-BASED INDUCTION (Kouzou Ohara, Phu Chien Nguyen, Akira Mogi, Hiroshi Motoda, and Takashi Washio). 9.1 Introduction. 9.2 Graph-Based Induction Revisited. 9.3 Problem Caused by Chunking in B-GBI. 9.4 Chunkingless Graph-Based Induction (Cl-GBI). 9.5 Decision Tree Chunkingless Graph-Based Induction (DT-ClGBI). 9.6 Conclusions. References. 10 SOME LINKS BETWEEN FORMAL CONCEPT ANALYSIS AND GRAPH MINING (Michel Liquiere). 10.1 Presentation. 10.2 Basic Concepts and Notation. 10.3 Formal Concept Analysis. 10.4 Extension Lattice and Description Lattice Give Concept Lattice. 10.5 Graph Description and Galois Lattice. 10.6 Graph Mining and Formal Propositionalization. 10.7 Conclusion. References. 11 KERNEL METHODS FOR GRAPHS (Thomas Gartner, Tamas Horvath, Quoc V. Le, Alex J. Smola, and Stefan Wrobel). 11.1 Introduction. 11.2 Graph Classification. 11.3 Vertex Classification. 11.4 Conclusions and Future Work. References. 12 KERNELS AS LINK ANALYSIS MEASURES (Masashi Shimbo and Takahiko Ito). 12.1 Introduction. 12.2 Preliminaries. 12.3 Kernel-based Unified Framework for Importance and Relatedness. 12.4 Laplacian Kernels as a Relatedness Measure. 12.5 Practical Issues. 12.6 Related Work. 12.7 Evaluation with Bibliographic Citation Data. 12.8 Summary. References. 13 ENTITY RESOLUTION IN GRAPHS (Indrajit Bhattacharya and Lise Getoor). 13.1 Introduction. 13.2 Related Work. 13.3 Motivating Example for Graph-Based Entity Resolution. 13.4 Graph-Based Entity Resolution: Problem Formulation. 13.5 Similarity Measures for Entity Resolution. 13.6 Graph-Based Clustering for Entity Resolution. 13.7 Experimental Evaluation. 13.8 Conclusion. References. Part III APPLICATIONS. 14 MINING FROM CHEMICAL GRAPHS (Takashi Okada). 14.1 Introduction and Representation of Molecules. 14.2 Issues for Mining. 14.3 CASE: A Prototype Mining System in Chemistry. 14.4 Quantitative Estimation Using Graph Mining. 14.5 Extension of Linear Fragments to Graphs. 14.6 Combination of Conditions. 14.7 Concluding Remarks. References. 15 UNIFIED APPROACH TO ROOTED TREE MINING: ALGORITHMS AND APPLICATIONS (Mohammed Zaki). 15.1 Introduction. 15.2 Preliminaries. 15.3 Related Work. 15.4 Generating Candidate Subtrees. 15.5 Frequency Computation. 15.6 Counting Distinct Occurrences. 15.7 The SLEUTH Algorithm. 15.8 Experimental Results. 15.9 Tree Mining Applications in Bioinformatics. 15.10 Conclusions. References. 16 DENSE SUBGRAPH EXTRACTION (Andrew Tomkins and Ravi Kumar). 16.1 Introduction. 16.2 Related Work. 16.3 Finding the densest subgraph. 16.4 Trawling. 16.5 Graph Shingling. 16.6 Connection Subgraphs. 16.7 Conclusions. References. 17 SOCIAL NETWORK ANALYSIS (Sherry E. Marcus, Melanie Moy, and Thayne Coffman). 17.1 Introduction. 17.2 Social Network Analysis. 17.3 Group Detection. 17.4 Terrorist Modus Operandi Detection System. 17.5 Computational Experiments. 17.6 Conclusion. References. Index.

455 citations


Proceedings ArticleDOI
11 Dec 2006
TL;DR: A new type of attack graph, the multiple-prerequisite graph, is created that scales nearly linearly as the size of a typical network increases and a prototype system is built using this graph type.
Abstract: Attack graphs are a valuable tool to network defenders, illustrating paths an attacker can use to gain access to a targeted network. Defenders can then focus their efforts on patching the vulnerabilities and configuration errors that allow the attackers the greatest amount of access. We have created a new type of attack graph, the multiple-prerequisite graph, that scales nearly linearly as the size of a typical network increases. We have built a prototype system using this graph type. The prototype uses readily available source data to automatically compute network reachability, classify vulnerabilities, build the graph, and recommend actions to improve network security. We have tested the prototype on an operational network with over 250 hosts, where it helped to discover a previously unknown configuration error. It has processed complex simulated networks with over 50,000 hosts in under four minutes.

404 citations


Proceedings ArticleDOI
03 Apr 2006
TL;DR: The concept of a graph closure, a generalized graph that represents a number of graphs, is introduced and the indexing technique, called Closure-tree, organizes graphs hierarchically where each node summarizes its descendants by a graphclosure.
Abstract: Graphs have become popular for modeling structured data. As a result, graph queries are becoming common and graph indexing has come to play an essential role in query processing. We introduce the concept of a graph closure, a generalized graph that represents a number of graphs. Our indexing technique, called Closure-tree, organizes graphs hierarchically where each node summarizes its descendants by a graph closure. Closure-tree can efficiently support both subgraph queries and similarity queries. Subgraph queries find graphs that contain a specific subgraph, whereas similarity queries find graphs that are similar to a query graph. For subgraph queries, we propose a technique called pseudo subgraph isomorphism which approximates subgraph isomorphism with high accuracy. For similarity queries, we measure graph similarity through edit distance using heuristic graph mapping methods. We implement two kinds of similarity queries: K-NN query and range query. Our experiments on chemical compounds and synthetic graphs show that for subgraph queries, Closuretree outperforms existing techniques by up to two orders of magnitude in terms of candidate answer set size and index size. For similarity queries, our experiments validate the quality and efficiency of the presented algorithms.

332 citations


Proceedings ArticleDOI
22 Apr 2006
TL;DR: GUESS is a novel system for graph exploration that combines an interpreted language with a graphical front end that allows researchers to rapidly prototype and deploy new visualizations and contains a novel, interactive interpreter.
Abstract: As graph models are applied to more widely varying fields, researchers struggle with tools for exploring and analyzing these structures. We describe GUESS, a novel system for graph exploration that combines an interpreted language with a graphical front end that allows researchers to rapidly prototype and deploy new visualizations. GUESS also contains a novel, interactive interpreter that connects the language and interface in a way that facilities exploratory visualization tasks. Our language, Gython, is a domain-specific embedded language which provides all the advantages of Python with new, graph specific operators, primitives, and shortcuts. We highlight key aspects of the system in the context of a large user survey and specific, real-world, case studies ranging from social and knowledge networks to distributed computer network analysis.

229 citations


Proceedings ArticleDOI
06 Aug 2006
TL;DR: This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability, based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items.
Abstract: Automatic classification of data items, based on training samples, can be boosted by considering the neighborhood of data items in a graph structure (e.g., neighboring documents in a hyperlink environment or co-authors and their publications for bibliographic data entries). This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability. Our approach is based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items. The graph neighborhood is taken into consideration to exploit locality patterns while at the same time avoiding overfitting. In contrast to prior work along these lines, our approach employs a number of novel techniques: dynamically inferring the link/class pattern in the graph in the run of the iterative relaxation labeling, judicious pruning of edges from the neighborhood graph based on node dissimilarities and node degrees, weighting the influence of edges based on a distance metric between the classification labels of interest and weighting edges by content similarity measures. Our techniques considerably improve the robustness and accuracy of the classification outcome, as shown in systematic experimental comparisons with previously published methods on three different real-world datasets.

177 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: A general model, the relation summary network, is proposed to find the hidden structures (the local cluster structures and the global community structures) from a k-partite graph to provide a principal framework for unsupervised learning on k- partite graphs of various structures.
Abstract: Various data mining applications involve data objects of multiple types that are related to each other, which can be naturally formulated as a k-partite graph. However, the research on mining the hidden structures from a k-partite graph is still limited and preliminary. In this paper, we propose a general model, the relation summary network, to find the hidden structures (the local cluster structures and the global community structures) from a k-partite graph. The model provides a principal framework for unsupervised learning on k-partite graphs of various structures. Under this model, we derive a novel algorithm to identify the hidden structures of a k-partite graph by constructing a relation summary network to approximate the original k-partite graph under a broad range of distortion measures. Experiments on both synthetic and real datasets demonstrate the promise and effectiveness of the proposed model and algorithm. We also establish the connections between existing clustering approaches and the proposed model to provide a unified view to the clustering approaches.

165 citations



Proceedings ArticleDOI
20 Aug 2006
TL;DR: This paper proposes several novel optimization techniques, which can prune the unpromising and redundant sub-search spaces effectively and develops a coherent closed quasi-clique mining algorithm, Cocain, which is very efficient and scalable for large dense graph databases.
Abstract: Frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has been witnessed several applications and received considerable attention in the graph mining community recently. In this paper, we study how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the downward-closure property no longer holds. By fully exploring some properties of quasi-cliques, we propose several novel optimization techniques, which can prune the unpromising and redundant sub-search spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of only closed quasi-cliques. We also develop a coherent closed quasi-clique mining algorithm, Cocain1 Thorough performance study shows that Cocain is very efficient and scalable for large dense graph databases.

159 citations


Book ChapterDOI
18 Sep 2006
TL;DR: This paper study efficient and provably secure methods for queries on encrypted data stored in an outsourced database that may be susceptible to compromise, and shows that, in this system, even if an intruder breaks into the database, he only learns very little about the data storage in the database and the queries performed on the data.
Abstract: Data confidentiality is a major concern in database systems. Encryption is a useful tool for protecting the confidentiality of sensitive data. However, when data is encrypted, performing queries becomes more challenging. In this paper, we study efficient and provably secure methods for queries on encrypted data stored in an outsourced database that may be susceptible to compromise. Specifically, we show that, in our system, even if an intruder breaks into the database and observes some interactions between the database and its users, he only learns very little about the data stored in the database and the queries performed on the data. Our work consists of several components. First, we consider databases in which each attribute has a finite domain and give a basic solution for certain kinds of queries on such databases. Then, we present two enhanced solutions, one with a stronger security guarantee and the other with accelerated queries. In addition to providing proofs of our security guarantees, we provide empirical performance evaluations. Our experiments demonstrate that our solutions are fast on large-sized real data.

149 citations


Patent
29 Dec 2006
TL;DR: In this paper, an interface is used to determine which portions of the graph structure satisfy the filter criteria and replace nodes and relations that do not satisfy filter criteria with skip nodes or functions.
Abstract: Systems and processes may apply a filter to data in a graph structure using an interface. The filter may be applied upon request from a business application. The interface may determine which portions of the graph structure satisfy the filter criteria. The interface may replace nodes and/or relations that do not satisfy filter criteria with skip nodes or functions. For example, software can be operable to apply a filter to a graph structure that includes nodes and relations between the nodes and evaluating the graph structure according to the filter. The software then replaces a first of the nodes that does not satisfy the filter with a first skip node.

112 citations


Proceedings ArticleDOI
24 Apr 2006
TL;DR: This work introduces a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network.
Abstract: Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this "memory wall," we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading-activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

Journal ArticleDOI
TL;DR: A new graph grammar formalism which integrates both the spatial and structural specification mechanisms in a single framework is proposed, equipped with a parser that performs in polynomial time with an improved parsing complexity over its nonspatial predecessor, that is, the Reserved Graph Grammar.
Abstract: In a graphical user interface, physical layout and abstract structure are two important aspects of a graph. This article proposes a new graph grammar formalism which integrates both the spatial and structural specification mechanisms in a single framework. This formalism is equipped with a parser that performs in polynomial time with an improved parsing complexity over its nonspatial predecessor, that is, the Reserved Graph Grammar. With the extended expressive power, the formalism is suitable for many user interface applications. The article presents its application in adaptive Web design and presentation.

Journal ArticleDOI
01 Dec 2006
TL;DR: This article investigates the issues of substructure similarity search using indexed features in graph databases and proves that the complexity of optimal feature set selection is Ω(2m) in the worst case, where m is the number of features for selection.
Abstract: Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed feature misses, our structural filtering algorithm can filter graphs without performing pairwise similarity computation. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy that could maximize the filtering capability. We prove that the complexity of optimal feature set selection is Ω(2m) in the worst case, where m is the number of features for selection. In practice, we identify several criteria to build effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly within a multifilter composition framework. The proposed feature-based filtering concept can be generalized and applied to searching approximate nonconsecutive sequences, trees, and other structured data as well.

Journal ArticleDOI
TL;DR: This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it, and takes advantage of the local neighborhoods of the nodes in the citation graph to demonstrate the complementarity of link-based and text-based retrieval.
Abstract: Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it. Our metrics take advantage of the local neighborhoods of the nodes in the citation graph. Two variants of link-based similarity estimation between two nodes are described, one based on the separate local neighborhoods of the nodes, and another based on the joint local neighborhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on a subgraph of the citation graph of computer science in a retrieval context. The results are compared with text-based similarity, and demonstrate the complementarity of link-based and text-based retrieval.

Proceedings ArticleDOI
Anand Ranganathan1, Zhen Liu1
06 Nov 2006
TL;DR: This paper extends relational databases with the ability to answer semantic queries that are represented in SPARQL, an emerging Semantic Web query language, with a system that bridges this semantic gap using domain knowledge contained in ontologies.
Abstract: Relational databases are widely used today as a mechanism for providing access to structured data. They, however, are not suitable for typical information finding tasks of end users. There is often a semantic gap between the queries users want to express and the queries that can be answered by the database. In this paper, we propose a system that bridges this semantic gap using domain knowledge contained in ontologies. Our system extends relational databases with the ability to answer semantic queries that are represented in SPARQL, an emerging Semantic Web query language. Users express their queries in SPARQL, based on a semantic model of the data, and they get back semantically relevant results. We define different categories of results that are semantically relevant to the users' query and show how our system retrieves these results. We evaluate the performance of our system on sample relational databases, using a combination of standard and custom ontologies.

Patent
18 Oct 2006
TL;DR: In this article, the claimed subject matter provides a system and/or a method that facilitates handling a change associated with a database by an interface that can receive data associated with the change to data via an object graph.
Abstract: The claimed subject matter provides a system and/or a method that facilitates handling a change associated with a database. An interface that can receive data associated with a change to data via an object graph. A state transition logic component that can maintain the change related to the object graph utilizing a context and a respective set of rules, the context employs metadata to view the object graph with an abstraction of at least one of an entity and a relationship.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: This paper proposes the first approach to detect events from the click-through data, which is the log data of web search engines, and demonstrates that the proposed approach produces high quality results.
Abstract: Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The intuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set ofquery-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on theuser-defined time granularity. Next, the sequence of bipartite graphs is represented as a vector-based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph, where each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query-page pairs are clustered based on thesemantic-based similarity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evolution pattern-based similarity such that each cluster is expected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed approach produces high quality results.

Proceedings ArticleDOI
27 Jun 2006
TL;DR: The Tuple Graph (TUG) synopses are introduced, a new class of data summaries that enable accurate selectivity estimates for complex relational queries and an efficient and scalable construction algorithm for building accurate TUGs within a specific storage budget is described.
Abstract: This paper introduces the Tuple Graph (TUG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a "semi-structured" view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the structure of the induced data graph in a concise synopsis, and to estimate the selectivity of a query by performing the corresponding traversal over the summarized graph. We detail the TUG synopsis model that is based on this novel approach, and we describe an efficient and scalable construction algorithm for building accurate TUGs within a specific storage budget. We validate the performance of TUGs with an extensive experimental study on real-life and synthetic data sets. Our results verify the effectiveness of TUGs in generating accurate selectivity estimates for complex join queries, and demonstrate their benefits over existing summarization techniques.

Proceedings ArticleDOI
03 Apr 2006
TL;DR: A method to support similarity search on substructures with superimposed distance constraints called PIS (Partition-based Graph Index and Search) is developed, which selects discriminative fragments in a query graph and uses an index to prune the graphs that violate the distance constraints.
Abstract: Efficient indexing techniques have been developed for the exact and approximate substructure search in large scale graph databases. Unfortunately, the retrieval problem of structures with categorical or geometric distance constraints is not solved yet. In this paper, we develop a method called PIS (Partition-based Graph Index and Search) to support similarity search on substructures with superimposed distance constraints. PIS selects discriminative fragments in a query graph and uses an index to prune the graphs that violate the distance constraints. We identify a criterion to distinguish the selectivity of fragments in multiple graphs and develop a partition method to obtain a set of highly selective fragments, which is able to improve the pruning performance. Experimental results show that PIS is effective in processing real graph queries.

Book ChapterDOI
01 Jan 2006
TL;DR: This chapter implemented an RDF data format plug-in for GViz (Telea et al., 2002), a general-purpose visual environment for browsing and editing graph data and advocates the use of a highly customizable, interactive visualization system for the understanding of different RDFData structures.
Abstract: The foundation language for the Semantic Web is the Resource Description Framework (RDF). RDF is intended to describe the Web metadata so that the Web content is not only machine readable but also machine understandable. In this way one can better support the interoperability of Web applications. RDF Schema (RDFS) is used to describe different RDF vocabularies (schemas), that is, the classes and properties associated to a particular application domain. An instantiation of these classes and properties form an RDF instance. It is important to note that both an RDF schema and an RDF instance have RDF graph representations. Realizing the advantages that RDF offers, in the last couple of years, many tools were built in order to support the browsing and editing of RDF data. Among these tools we mention Protégé (Noy et al., 2001), OntoEdit (Sure et al., 2003), and RDF Instance Creator (RIC) (Grove, 2002). Most of the text-based environments are unable to cope with large amounts of data in the sense of presenting them in a way that is easy to understand and navigate (Card et al., 1999). The RDF data we have to deal with describes a large number of Web resources, and can thus easily reach tens of thousands of instances and attributes. We advocate the use of visual tools for browsing RDF data, as visual presentation and navigation enables users to effectively understand the complex structure and interrelationships of such data. Existing visualization tools for RDF data are: IsaViz (Pietriga, 2002), OntoRAMA (Eklund et al., 2002), and the Protégé visualization plug-ins like OntoViz (Sintek, 2004) and Jambalaya (Storey et al., 2001). The most popular textual RDF browser/editor is Protégé (Noy et al., 2001). The generic modeling primitives of Protégé enable the export of the built model in different data formats, among which is also RDF/XML. Protégé distinguishes between schema and instance information, allowing for an incremental view of the instances based on the selected schema elements. One of the disadvantages of Protégé is that it displays the information in a hierarchical way, that is, using a tree layout (Sugiyama et al., 1981), which makes it difficult to grasp the inherent graph structure of RDF data. In this chapter, we advocate the use of a highly customizable, interactive visualization system for the understanding of different RDF data structures. We implemented an RDF data format plug-in for GViz (Telea et al., 2002), a general-purpose visual environment for browsing and editing graph data. The largest advantage that GViz provides in comparison with other RDF visualization tools is the fact that it is easily

ReportDOI
30 Mar 2006
TL;DR: A survey of existing work on graph-based pattern matching is presented, describing variations among graph matching problems, general and specific solution approaches, evaluation techniques, and directions for further research.
Abstract: The task of searching for patterns in graph-structured data has applications in such diverse areas as computer vision, biology, electronics, computer aided design, social networks, and intelligence analysis. As such, work on graph-based pattern matching spans a wide range of research communities. Due to variations in graph characteristics and problem requirements, graph-based pattern matching is not a single problem, but a set of related problems. This paper presents a survey of existing work on graph-based pattern matching, describing variations among graph matching problems, general and specific solution approaches, evaluation techniques, and directions for further research. An emphasis is given to techniques that apply to general graphs with semantic characteristics. The survey also discusses techniques for graph mining, an extension of the graph matching problem.

Proceedings ArticleDOI
01 Sep 2006
TL;DR: GMine as discussed by the authors partitions a given graph into a hierarchy of communities within communities and stores it into a novel R-treelike structure which is called G-Tree, which is then used for multi-resolution graph exploration.
Abstract: Several graph visualization tools exist. However, they are not able to handle large graphs, and/or they do not allow interaction. We are interested on large graphs, with hundreds of thousands of nodes. Such graphs bring two challenges: the first one is that any straightforward interactive manipulation will be prohibitively slow. The second one is sensory overload: even if we could plot and replot the graph quickly, the user would be overwhelmed with the vast volume of information because the screen would be too cluttered as nodes and edges overlap each other.Our GMine system addresses both these issues, by using summarization and multi-resolution. GMine offers multi-resolution graph exploration by partitioning a given graph into a hierarchy of communities-within-communities and storing it into a novel R-treelike structure which we name G-Tree. GMine offers summarization by implementing an innovative subgraph extraction algorithm and then visualizing its output.

Journal ArticleDOI
TL;DR: The essence of the approach is to create database views for each rule and to handle pattern matching by inner join operations while handling negative application conditions by left outer join operations to obtain a robust and fast transformation engine.
Abstract: We present a novel approach to implement a graph transformation engine based on standard relational database management systems (RDBMSs). The essence of the approach is to create database views for each rule and to handle pattern matching by inner join operations while handling negative application conditions by left outer join operations. Furthermore, the model manipulation prescribed by the application of a graph transformation rule is also implemented using elementary data manipulation statements (such as insert, delete). As a result, we obtain a robust and fast transformation engine especially suitable for (1) extending modeling tools with an underlying RDBMS repository and (2) embedding model transformations into large distributed applications where models are frequently persisted in a relational database and transaction handling is required to handle large models consistently.

Patent
07 Dec 2006
TL;DR: In this article, a scene graph (40) is presented which represents data and a set of processes, thus providing an enhanced approach to the previously known scene graph concept, where the scene graph becomes a rendering description of the data rather than a world description.
Abstract: A scene graph (40) is provided which represents data and a set of processes thus providing an enhanced approach to the previously known scene graph concept. With this approach the scene graph (40) becomes a rendering description of the data rather than a world description. Previously known scene graphs represent a structure of object and their attributes. The scene graph (40) has a notation of the traversing order, which together with the types of nodes, the nodes position, node functionality and node state determine the rendering order. Thus, any effects supported by the underlying rendering pipeline (1) can be expressed directly in the scene graph (40) by the user. An API is provided for the scene graph (40), controlling the actual rendering order and optimization to the user. The scene graph (40) is extensible allowing the user to experiment and express new rendering algorithms in the scene graph semantic.

Proceedings ArticleDOI
11 Dec 2006
TL;DR: A formal innovation on the use of graph hierarchies that leads to GMine system, which promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.
Abstract: Given a large social or computer network, how can we visualize it, find patterns, outliers, communities? Although several graph visualization tools exist, they cannot handle large graphs with hundred thousand nodes and possibly million edges. Such graphs bring two challenges: interactive visualization demands prohibitive processing power and, even if we could interactively update the visualization, the user would be overwhelmed by the excessive number of graphical items. To cope with this problem, we propose a formal innovation on the use of graph hierarchies that leads to GMine system. GMine promotes scalability using a hierarchy of graph partitions, promotes concomitant presentation for the graph hierarchy and for the original graph, and extends analytical possibilities with the integration of the graph partitions in an interactive environment.

Patent
03 Feb 2006
TL;DR: In this paper, a dependency graph that represents an image composition is obtained, and metadata for each element of the dependency graph is stored in a database that is accessible across a network to multiple users.
Abstract: A method, apparatus, system, and article of manufacture provide the ability to track the processing of image data in a collaborative environment. A dependency graph that represents an image composition is obtained. Metadata for each element of the dependency graph are stored in a database that is accessible across a network to multiple users. Access to the database is controlled to allow the multiple users to access the dependency graph via the database simultaneously.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: A graph representation of metabolic pathways to contain all features is presented, and the application of graph-based relational learning algorithms in both supervised and unsupervised scenarios are described.
Abstract: We present a method for finding biologically meaningful patterns on metabolic pathways using the SUBDUE graph-based relational learning system. A huge amount of biological data that has been generated by long-term research encourages us to move our focus to a systems-level understanding of bio-systems. A biological network, containing various biomolecules and their relationships, is a fundamental way to describe bio-systems. Multi-relational data mining finds the relational patterns in both the entity attributes and relations in the data. A graph consisting of vertices and edges between these vertices is a natural data structure to represent biological networks. This paper presents a graph representation of metabolic pathways to contain all features, and describes the application of graph-based relational learning algorithms in both supervised and unsupervised scenarios. Supervised learning finds the unique substructures in a specific type of pathway, which help us understand better how pathways differ. Unsupervised learning shows hierarchical clusters that describe the common substructures in a specific type of pathway, which allow us to better understand the common features in pathways.

Journal ArticleDOI
B. Eckman1, P. G. Brown1
TL;DR: The Systems Biology Graph Extender is described, a research prototype that extends the IBM RDBMS--DB2® Universal Database software--with graph objects and operations to support declarative SQL queries over biological networks and other graph structures.
Abstract: As high-throughput biology begins to generate large volumes of systems biology data, the need grows for robust, efficient database systems to support investigations of metabolic and signaling pathways, chemical reaction networks, gene regulatory networks, and protein interaction networks. Network data is frequently represented as graphs, and researchers need to navigate, query and manipulate this data in ways that are not well supported by standard relational database management systems (RDBMSs). Current approaches to managing graphs in an RDBMS rely on either external procedural logic to execute the graph algorithms or clumsy and inefficient algorithms implemented in Structured Query Language (SQL). In this paper we describe the Systems Biology Graph Extender, a research prototype that extends the IBM RDBMS--DB2® Universal Database software--with graph objects and operations to support declarative SQL queries over biological networks and other graph structures. Supported operations include neighborhood queries, shortest path queries, spanning trees, graph transposition, and graph matching. In a federated database environment, graph operations may be applied to data stored in any format, whether remote or local, relational or nonrelational. A single federated query may include both graph-based predicates and predicates over related data sources, such as microarray expression levels, clinical prognosis and outcome, or the function of orthologous proteins (i.e., proteins that are evolutionarily related to those in another species) in mouse disease models.

Proceedings ArticleDOI
25 Sep 2006
TL;DR: Experimental evaluations on large real-world semantic graphs show that the MSSG framework scales well, and grDB outperforms widely used open-source out-of-core databases, such as BerkeleyDB and MySQL, in the storage and retrieval of scale-free graphs.
Abstract: This paper presents a middleware framework for storing, accessing and analyzing massive-scale semantic graphs. The framework, MSSG, targets scale-free semantic graphs with O(1012) (trillion) vertices and edges. Here, we present the overall architectural design of the framework, as well as a prototype implementation for cluster architectures. The sheer size of these massive-scale semantic graphs prohibits storing the entire graph in memory even on medium- to large-scale parallel architectures. We therefore propose a new graph database, grDB, for the efficient storage and retrieval of large scale-free semantic graphs on secondary storage. This new database supports the efficient and scalable execution of parallel out-of-core graph algorithms which are essential for analyzing semantic graphs of massive size. We have also developed a parallel out-of-core breadth-first search algorithm for performance study. To the best of our knowledge, it is the first of such algorithms reported in the literature. Experimental evaluations on large real-world semantic graphs show that the MSSG framework scales well, and grDB outperforms widely used open-source out-of-core databases, such as BerkeleyDB and MySQL, in the storage and retrieval of scale-free graphs.

Patent
29 Dec 2006
TL;DR: In this paper, a system and method for generating object graph data and transmitting the object graph over a network is described, which is based on the idea of analyzing relationships between objects within a network of objects to determine an object network structure.
Abstract: A system and method for generating object graph data and transmitting the object graph over a network. For example, a computer-implemented method according to one embodiment comprises: analyzing relationships between objects within a network of objects to determine an object network structure; generating object graph data representing the object network structure; serializing the object graph data and transmitting the object graph data over a network to a requesting computer; and interpreting the object graph data to render a view of the object network structure in a graphical user interface.