Showing papers on "Graph database published in 2003"

PDF

Open Access

Proceedings Article•DOI•

Efficient mining of frequent subgraphs in the presence of isomorphism

[...]

Jun Huan¹, Wei Wang¹, Jan F. Prins¹•Institutions (1)

University of North Carolina at Chapel Hill¹

19 Nov 2003

TL;DR: This work proposes a novel frequent subgraph mining algorithm: FFSM, which employs a vertical search scheme within an algebraic graph framework it has developed to reduce the number of redundant candidates proposed.

...read moreread less

Abstract: Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees, which have been studied extensively. We propose a novel frequent subgraph mining algorithm: FFSM, which employs a vertical search scheme within an algebraic graph framework we have developed to reduce the number of redundant candidates proposed. Our empirical study on synthetic and real datasets demonstrates that FFSM achieves a substantial performance gain over the current start-of-the-art subgraph mining algorithm gSpan.

...read moreread less

699 citations

Journal Article•DOI•

State of the art of graph-based data mining

[...]

Takashi Washio¹, Hiroshi Motoda¹•Institutions (1)

Osaka University¹

01 Jul 2003-Sigkdd Explorations

TL;DR: This article introduces the theoretical basis of graph based data mining and surveys the state of the art of graph-based data mining.

...read moreread less

Abstract: The need for mining structured data has increased in the past few years. One of the best studied data structures in computer science and discrete mathematics are graphs. It can therefore be no surprise that graph based data mining has become quite popular in the last few years.This article introduces the theoretical basis of graph based data mining and surveys the state of the art of graph-based data mining. Brief descriptions of some representative approaches are provided as well.

...read moreread less

480 citations

Proceedings Article•DOI•

Keyword proximity search on XML graphs

[...]

Vagelis Hristidis¹, Yannis Papakonstantinou¹, Andrey Balmin¹•Institutions (1)

University of California, San Diego¹

05 Mar 2003

TL;DR: This work proposes and experimentally evaluates algorithms to minimize the number of queries sent to the database to output the top-K results and provides theoretical and experimental points for the selection of the appropriate set of precomputed path relations.

...read moreread less

Abstract: XKeyword provides efficient keyword proximity queries on large XML graph databases. A query is simply a list of keywords and does not require any schema or query language knowledge for its formulation. XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the graph's schema. In particular, XKeyword consists of two stages. In the preprocessing stage a set of keyword indices are built along with indexed path relations that describe particular patterns of paths in the graph. In the query processing stage plans are developed that use a near optimal set of path relations to efficiently locate the keyword query results. The results are presented graphically using the novel idea of interactive result graphs, which are populated on-demand according to the user's navigation and allow efficient information discovery. We provide theoretical and experimental points for the selection of the appropriate set of precomputed path relations. We also propose and experimentally evaluate algorithms to minimize the number of queries sent to the database to output the top-K results.

...read moreread less

286 citations

Journal Article•DOI•

A database system of mechanical components based on geometric and topological similarity. Part I: representation

[...]

Mohamed El-Mehalawi, R. Allen Miller¹•Institutions (1)

Ohio State University¹

01 Jan 2003-Computer-aided Design

TL;DR: The representation scheme of the CAD model in a database of mechanical components using attributed graphs, which describes the topology of the component completely together with some geometric data that are not dependent on any coordinate system such as surface type and curve type are given.

...read moreread less

Abstract: A database of mechanical components is an important issue for some manufacturing activities such as cost estimation, process planning, and design by case-based reasoning. In this paper, we give the representation scheme of the CAD model in such a database. Components are represented using attributed graphs in which the nodes correspond to the surfaces of the component and the links correspond to the edges of the component. The graph is based on the standard for the exchange of product information (STEP) physical file of the component. STEP file should be unique for a single component regardless of the underlying CAD system. The process of creating the graph of a component constitutes two sub-tasks: (i) importing the CAD model from the CAD system in STEP format and (ii) transforming the STEP data into attributed graph-based representation. The graph and its attributes describe the topology of the component completely together with some geometric data that are not dependent on any coordinate system such as surface type and curve type. These geometric data are helpful in the retrieval and matching processes in the database.

...read moreread less

182 citations

Journal Article•DOI•

Extracting the extended entity-relationship model from a legacy relational database

[...]

Reda Alhajj¹•Institutions (1)

University of Calgary¹

01 Sep 2003-Information Systems

TL;DR: Algorithms that investigate characteristics of an existing legacy database in order to identify candidate keys of all relations in the relational schema, to locate foreign keys, and to decide on the appropriate links between the given relations are developed.

...read moreread less

92 citations

Journal Article•DOI•

A large database of graphs and its use for benchmarking graph isomorphism algorithms

[...]

M. De Santo¹, Pasquale Foggia², Carlo Sansone², Mario Vento¹•Institutions (2)

University of Salerno¹, University of Naples Federico II²

01 May 2003-Pattern Recognition Letters

TL;DR: This paper presents an experimental comparative evaluation of the performance of four graph matching algorithms and builds and makes available a large database of graphs, which is described in detail in this article.

...read moreread less

88 citations

Proceedings Article•DOI•

The Java system dependence graph

[...]

Neil Walkinshaw¹, Marc Roper¹, Murray Wood¹•Institutions (1)

University of Strathclyde¹

26 Sep 2003

TL;DR: This work presents a Java system dependence graph which draws on the strengths of a range of earlier works and adapts them, if necessary, to the Java language and provides guidance on the construction of the graph.

...read moreread less

Abstract: The program dependence graph was introduced by Ottenstein and Ottenstein in 1984. It was suggested to be a suitable internal program representation for monolithic programs, for the purpose of carrying out certain software engineering operations such as slicing and the computation of program metrics. Since then, Horwitz et al. have introduced the multiprocedural equivalent system dependence graph. Several authors have proposed object-oriented dependence graph construction approaches. Every approach provides its own benefits, some of which are language specific. We present a Java system dependence graph which draws on the strengths of a range of earlier works and adapts them, if necessary, to the Java language. It also provides guidance on the construction of the graph, identifies potential research topics based on it and shows a completed graph with a slice highlighted for a small, but realistic example.

...read moreread less

78 citations

Patent•

Data referencing within a database graph

[...]

Robert W. Lord¹, Christopher Allen Suver¹•Institutions (1)

Microsoft¹

14 Oct 2003

TL;DR: In this article, a data structure called a spider is proposed to provide a higher degree of association between nodes and links in a graph by creating data structures (spiders) that provide views into graphs that transcend the relatively static association of a conventional graph.

...read moreread less

Abstract: The present invention is directed to providing a higher degree of association between nodes and links in a graph by creating data structures (spiders) that provide views into graphs that transcend the relatively static association of a conventional graph. A spider's variables bind to any number of nodes and links in the graph, enabling all of the bound nodes and links by addressing the spider. By adding constraints on the extent or degree of binding in a spider to a graph, a subset of the graph is identified. The spider can then used to address the subset of the graph as constrained by the spider. A spider can bind to a link in order to identify a parent/child structural subset of the graph. More specifically a spider is a collection of variables that create a template or pattern and bind to the nodes and links in the graph. A spider traverses a graph by binding its variables to various nodes and links in the graph.

...read moreread less

58 citations

Clustering of Web Documents using a Graph Model.

[...]

Adam Schenker, Horst Bunke, Abraham Kandel

01 Jan 2003

55 citations

Journal Article•DOI•

BilVideo: a video database management system

[...]

T. Catarci, Mehmet Emin Dönderler¹, Ediz Şaykol¹, Özgür Ulusoy¹, Uğur Güdükbay¹ - Show less +1 more•Institutions (1)

Bilkent University¹

01 Jan 2003-IEEE MultiMedia

TL;DR: The BilVideo video database management system provides integrated support for spatiotemporal and semantic queries for video and its fact-extractor and video-annotator tools populate the system's fact base and feature database to support both query types.

...read moreread less

Abstract: The BilVideo video database management system provides integrated support for spatiotemporal and semantic queries for video. A knowledge base, consisting of a fact base and a comprehensive rule set implemented in Prolog, handles spatio-temporal queries. These queries contain any combination of conditions related to direction, topology, 3D relationships, object appearance, trajectory projection, and similarity-based object trajectories. The rules in the knowledge base significantly reduce the number of facts representing the spatio-temporal relations that the system needs to store. A feature database stored in an object-relational database management system handles semantic queries. To respond to user queries containing both spatio-temporal and semantic conditions, a query processor interacts with the knowledge base and object-relational database and integrates the results returned from these two system components. Because of space limitations, we only discuss the Web-based visual query interface and its fact-extractor and video-annotator tools. These tools populate the system's fact base and feature database to support both query types.

...read moreread less

47 citations

Book Chapter•DOI•

Typing Graph-Manipulation Operations

[...]

Jan Hidders¹•Institutions (1)

University of Antwerp¹

08 Jan 2003

TL;DR: A graph-based data model called GDM is presented where database instances and database schemas are described by certain types of labeled graphs called instance graphs and schema graphs, and two graph-manipulation operations, an addition and a deletion, are introduced.

...read moreread less

Abstract: We present a graph-based data model called GDM where database instances and database schemas are described by certain types of labeled graphs called instance graphs and schema graphs. For this data model we introduce two graph-manipulation operations, an addition and a deletion, that are based on pattern matching and can be represented in a graphical way. For these operations it is investigated if they can be typed such that it is guaranteed for well-typed operations that the result belongs to a certain database schema graph, and what the complexity of deciding this well-typedness is.

...read moreread less

Journal Article•DOI•

Graph-based relational learning: current and future directions

[...]

Lawrence B. Holder¹, Diane J. Cook¹•Institutions (1)

University of Texas at Arlington¹

01 Jul 2003-Sigkdd Explorations

TL;DR: This paper focuses on identifying novel, not necessarily most frequent, patterns in a graph-theoretic representation of data, which provides both simplifications and challenges over frequency-based approaches to graph-based data mining.

...read moreread less

Abstract: Graph-based relational learning (GBRL) differs from logic-based relational learning, as addressed by inductive logic programming techniques, and differs from frequent subgraph discovery, as addressed by many graph-based data mining techniques. Learning from graphs, rather than logic, presents representational issues both in input data preparation and output pattern language. While a form of graph-based data mining, GBRL focuses on identifying novel, not necessarily most frequent, patterns in a graph-theoretic representation of data. This approach to graph-based data mining provides both simplifications and challenges over frequency-based approaches. In this paper we discuss these issues and future directions of graph-based relational learning.

...read moreread less

Journal Article•DOI•

A user-assisted approach to component clustering

[...]

Kamran Sartipi¹, Kostas Kontogiannis¹•Institutions (1)

University of Waterloo¹

01 Jul 2003-Journal of Software Maintenance and Evolution: Research and Practice

TL;DR: A user-assisted clustering technique for software architecture recovery based on a proximity measure that is called component association, which is computed on the shared properties among groups of highly related system entities.

...read moreread less

Abstract: In this paper, we present a user-assisted clustering technique for software architecture recovery based on a proximity measure that we call component association. The component association measure is computed on the shared properties among groups of highly related system entities. In this approach, the software system is modeled as an attributed relational graph with the software constructs (entities) represented as nodes and data/control dependencies represented as edges. The application of data mining techniques on the system graph allows us to generate a component graph where the edges are labeled by the association strength values among the components. An interactive partitioning technique is used to partition a system into cohesive components. Graph visualization tools and cluster quality evaluation metrics are applied by the user to assess and fine tune the partition result.

...read moreread less

Journal Article•DOI•

A hierarchical program representation for refactoring

[...]

Niels Van Eetvelde¹, Dirk Janssens¹•Institutions (1)

University of Antwerp¹

01 Jun 2003-Electronic Notes in Theoretical Computer Science

TL;DR: A graph representation of object-oriented programs that enables one to describe refactoring operations (behaviour-preserving changes in the structure of a program) in a formal, concise way by graph rewriting productions is presented.

...read moreread less

Patent•

Data organization for database optimization

[...]

Martin Kaiser, Volker Sauermann

11 Jul 2003

TL;DR: In this paper, the authors describe techniques for organizing and searching data in a directed graph, where data objects are stored in a table, and path information represents every path through the directed graph of which the corresponding object is a part.

...read moreread less

Abstract: Techniques for organizing and searching data are described. Specifically, data objects are stored in a table, where the data objects correspond to nodes on a directed graph. The directed graph may represent, for example, a hierarchical structure of a company's organizational model. Additionally, path information is stored in, or accessed through, the table for each object, where the path information represents every path through the directed graph of which the corresponding object is a part. In this way, queries against the table that require the path information may be answered quickly and efficiently.

...read moreread less

Journal Article•DOI•

Visual Ranking of Link Structures

[...]

Ulrik Brandes, Sabine Cornelsen

01 Jan 2003-Journal of Graph Algorithms and Applications

TL;DR: In this article, the authors show that layouts for effective visualization of an underlying link structure can be computed in sync with the iterative computation utilized in all popular Web resource ranking methods.

...read moreread less

Abstract: Methods for ranking World Wide Web resources according to their position in the link structure of the Web are receiving considerable attention, because they provide the first effective means for search engines to cope with the explosive growth and diversification of the Web. We show that layouts for effective visualization of an underlying link structure can be computed in sync with the iterative computation utilized in all popular such rankings. Our visualizations provide valuable insight into the link structure and the ranking mechanism alike. Therefore, they are useful for the analysis of query results, maintenance of search engines, and evaluation of Web graph models.

...read moreread less

Proceedings Article•DOI•

Identification of clusters in the Web graph based on link topology

[...]

Xiaodi Huang¹, Wei Lai¹•Institutions (1)

Swinburne University of Technology¹

16 Jul 2003

TL;DR: A new approach to clustering the Web graph is proposed, which identifies a small subset of the graph as "core" members of clusters, and then incrementally constructs the clusters by a selection criterion.

...read moreread less

Abstract: The Web graph has recently been used to model the link structure of the Web. The studies of such graphs can yield valuable insights into Web algorithms for crawling, searching and discovery of Web communities. This paper proposes a new approach to clustering the Web graph. The proposed algorithm identifies a small subset of the graph as "core" members of clusters, and then incrementally constructs the clusters by a selection criterion. Two qualitative criteria are proposed to measure the quality of graph clustering. We have implemented our algorithm and tested a set of arbitrary graphs with good results. Applications of our approach include graph drawing and Web visualization.

...read moreread less

Journal Article•DOI•

An efficient distributed algorithm for detection of knots and cycles in a distributed graph

[...]

Dakshnamoorthy Manivannan¹, Mukesh Singhal¹•Institutions (1)

University of Kentucky¹

01 Oct 2003-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents an efficient distributed algorithm to detect if a node is part of a knot in a distributed graph and finds exactly which nodes are involved in the knot.

...read moreread less

Abstract: Knot detection in a distributed graph is an important problem and finds applications in deadlock detection in several areas such as store-and-forward networks, distributed simulation, and distributed database systems. This paper presents an efficient distributed algorithm to detect if a node is part of a knot in a distributed graph. The algorithm requires 2e messages and a delay of 2(d+1) message hops to detect if a node in a distributed graph is in a knot (here, e is the number of edges in the reachable part of the distributed graph and d is its diameter). A significant advantage of this algorithm is that it not only detects if a node is involved in a knot, but also finds exactly which nodes are involved in the knot. Moreover, if the node is not involved in a knot, but is only involved in a cycle, then it finds the nodes that are in a cycle with that node. We illustrate the working of the algorithm with examples. The paper ends with a discussion on how the information about the nodes involved in the knot can be used for deadlock resolution and also on the performance of the algorithm.

...read moreread less

Proceedings Article•DOI•

Symbol recognition using graphs

[...]

Josep Lladós, Gemma Sánchez

24 Nov 2003

TL;DR: This paper proposes two strategies to recognize symbols depending on the type of their substructures, including a graph isomorphism approach and a syntactic approach based on graph grammars.

...read moreread less

Abstract: Symbol recognition is a well-known challenge in the field of graphics recognition. A symbol can be defined as a structure within a document that has a particular meaning in the context of the application. Due to their representational power, graph structures are usually used to represent line drawings images. Thus, a number of graph comparison approaches are required to answer whether a known symbol appears in a document and under which degree of confidence. In this paper we propose two strategies to recognize symbols depending on the type of their substructures. For those symbols that can be defined by a prototype pattern, we propose a graph isomorphism approach. On the other hand, for those structures consisting of repetitive patterns, we propose a syntactic approach based on graph grammars.

...read moreread less

Book Chapter•DOI•

Different Kinds of Comparisons between Fuzzy Conceptual Graphs

[...]

Rallou Thomopoulos¹, Patrice Buche¹, Ollivier Haemmerlé¹, Ollivier Haemmerlé²•Institutions (2)

Institut national de la recherche agronomique¹, University of Paris-Sud²

20 Jul 2003

TL;DR: The Conceptual Graph Model is extended to allow one to represent imprecise data and queries that include preferences, by using fuzzy sets in concept vertices by extending the projection operation to fuzzy concepts and defining a comparison operation characterised by two matching degrees.

...read moreread less

Abstract: In the context of a microbiological application, our study proposes to extend the Conceptual Graph Model in order to allow one: (i) to represent imprecise data and queries that include preferences, by using fuzzy sets (from fuzzy set theory) in concept vertices, in order to describe either an imprecise concept type or an imprecise referent; (ii) to query a conceptual graph that may include imprecise data (factual graph) using a conceptual graph that may include preferences (query graph). This is performed in two steps: firstly by extending the projection operation to fuzzy concepts, secondly by defining a comparison operation characterised by two matching degrees: the possibility degree of matching and the necessity degree of matching between two graphs, and particularly between a query graph and a factual graph.

...read moreread less

Journal Article•DOI•

The role of declarative querying in bioinformatics.

[...]

Jignesh M. Patel¹•Institutions (1)

University of Michigan¹

01 Jan 2003-Omics A Journal of Integrative Biology

TL;DR: Current DBMSs have largely ignored supporting life sciences applications, and consequently, the life sciences researches have been forced to write tools and scripts to perform these tasks.

...read moreread less

Abstract: THE RECENT PUBLICATION of a draft of the entire human genome (McPherson et al, 2001; Venter et al, 2001) has served to fuel an already explosive area of research in bioinformatics that is involved in deriving meaningful knowledge from proteins and DNA sequences (Alberts et al, 2002) Even with the full human genome sequence now in hand, scientists still face the challenges of determining exact gene locations and functions, observing interactions between proteins in complex molecular machines, and learning the structure and function of proteins, just to name a few The progress of this scientific research is closely connected to the research in the database community in that analyzing large volumes of biological data sets involves being able to maintain and query large databases (Moussouni et al, 1999; Davidson, 2002) Database management systems (DBMSs) could help support life sciences applications, in a number of different ways A partial list of tasks that such applications require is: querying large structured databases (such as sequence and graph databases), querying semi-structured (such as published manuscripts), managing data replication, querying distributed data sources, and managing parallelism in high-throughput bioinformatics Unfortunately, current DBMSs have largely ignored supporting life sciences applications, and consequently, the life sciences researches have been forced to write tools and scripts to perform these tasks An interesting parallel can be drawn between the state of data management tools in life sciences, and the state of data management tools for business applications, such as a banking application, about three decades ago Prior to the advent of the relational data model, business data was managed and queried using customized programs/scripts that were developed for each application Reusing programs, and the algorithms for querying the data, involved rewriting application program and logic, which was very time consuming and expensive In addition, the querying programs were closely tied to the format that was used to represent the data Any change in the format of the data representation often would break the querying programs Furthermore, writing complex queries, such as querying over multiple data sets or posing complex analytical queries, was a daunting task One of the critical contributions of the relational data model (Codd, 1970) was the introduction of a declarative querying paradigm for business data management, instead of the previously used procedural paradigm In a declarative querying paradigm, the user expresses the query in a high-level language, like SQL, and the DBMS determines the best strategy for evaluating the query In addition, the DBMS only presents to the user a logical view of the data against which queries are posed The physical representation of the data, either on disk or in-memory, can be very different from the logical view For example, in a relational database management system (RDBMS), indices may be created, and the user doesn’t have to query against the index The user still queries against logical relations, and the system automatically determines if it is faster to use the indices to answer a query The user is thus insulated from worrying about various details such as physical organization of data on disk, the exact location of the data, tuning the representation for better performance, and choosing the best plan for evaluating a query This declarative querying paradigm has been a huge success for relational DBMSs, and today commercial RDBMSs manage terabytes of data, and allow very complex querying on these databases Database management systems can provide similar benefits to the life sciences community, just as it did three decades ago to the business data management community Many of the data sets that are used in life sciences are growing at an astonishing rate (such as sequence data at NCBI’s GenBank (NCBI, 2002)), and the queries

...read moreread less

Book Chapter•DOI•

Theoretical analysis and experimental comparison of graph matching algorithms for database filtering

[...]

Christophe Irniger¹, Horst Bunke¹•Institutions (1)

University of Bern¹

30 Jun 2003-Lecture Notes in Computer Science

TL;DR: The results of a basic study on the relation between filtering efficiency and graph matching algorithm performance are reported, using different graph matching algorithms for isomorphism and subgraph-isomorphism.

...read moreread less

Abstract: In structural pattern recognition, an unknown pattern is often transformed into a graph that is matched against a database in order to find the most similar prototype in the database. Graph matching is a powerful yet computationally expensive procedure. If the sample graph is matched against a large database of model graphs, the size of the database is introduced as an additional factor into the overall complexity of the matching process. Database filtering procedures are used to reduce the impact of this additional factor. In this paper we report the results of a basic study on the relation between filtering efficiency and graph matching algorithm performance, using different graph matching algorithms for isomorphism and subgraph-isomorphism.

...read moreread less

Journal Article•DOI•

Multi-relational data mining: the current frontiers

[...]

Sašo Džeroski¹, Luc De Raedt•Institutions (1)

Jožef Stefan Institute¹

01 Jul 2003

TL;DR: RDM approaches, which look for patterns that involve multiple tables (relations) from a relational database, are often referred to as multi-relational data mining (MRDM) and will be adopted in the present special issue.

...read moreread less

Abstract: Data mining algorithms look for patterns in data . Most existing data mining approaches are propositional and look for patterns in a single data table . Most real-world databases, however, store information in multiple tables. Relational data mining (RDM) approaches (D2eroski and Lavrac 2001), look for patterns that involve multiple tables (relations) from a relational database. To emphasize this fact, RDM is often referred to as multi-relational data mining (MRDM) (D2eroski et al . 2002) . We will adopt this term in the present special issue. When we are looking for patterns in multi-relational data, it is natural that the patterns involve multiple relations. They are typically stated in a more expressive language than patterns defined on a single data table . The major types of multi-relational patterns extend the types of propositional patterns considered in single table data mining. We can thus have multi-relational classification rules, multi-relational regression trees, and multi-relational association rules, among others . Just as many data mining algorithms come from the field of machine learning, many MRDM algorithms come form the field of inductive logic programming (ILP, Muggleton 1992 ; Lavrac and Dzeroski 1994) . Situated at the intersection of machine learning and logic programming, ILP has been concerned with finding patterns expressed as logic programs . Initially, ILP focussed on automated program synthesis from examples, formulated as a binary classification task . In recent years, however, the scope of ILP has broadened to cover the whole spectrum ofdata mining tasks (classification, regression, clustering, association analysis) . The most common types of patterns have been extended to their multi-relational versions and so have the major data mining algorithms (decision tree induction, distance-based clustering and prediction, etc.) . There is also a growing interest in the development of data mining algorithms for various types of structured data . These include, for example, graph-based data mining . There is also an increasing body of work on mining tree-structured and XML documents. Mining data which consists of complex/structured objects also falls within the scope ofMRDM, as the normalized representation of such objects in a relational database requires multiple tables . The rise of several KDD application areas that are intrinsically relational has provided and continues to provide a strong motivation for the development of MRDM approaches . Luc De Raedt Institut fur Informatik, Albert-Ludwigs-University, Georges-Koehler-Allee, Building 079, D-79110 Freiburg, Germany

...read moreread less

Applications of Graph Probing to Web Document Analysis.

[...]

Daniel P. Lopresti¹, Gordon Wilfong¹•Institutions (1)

Bell Labs¹

01 Jan 2003

TL;DR: The first steps towards adapting the graph probing paradigm to allow pre-computation of a compact, efficient probe set for databases of graphstructured documents in general, and Web pages coded in HTML in particular are described.

...read moreread less

Abstract: Graphs are a fundamental representation in much of computer science, including the analysis of both traditional and Web documents. Algorithms for higher-level document understanding tasks often use graphs to encode logical structure. HTML pages are usually regarded as treestructured, while the WWW itself is an enormous, dynamic multigraph. Much work on attempting to extract information from Web pages makes explicit or implicit use of graph representations [1, 3, 4, 7, 11]. It follows, then, that the ability to compare two graphs is basic functionality, as demonstrated in such applications as query-by-structure, wrapper generation for information extraction, performance evaluation, etc. Because most problems relating to graph comparison have no known efficient, guaranteed-optimal solution, researchers have developed a wide range of heuristics. For the problem of determining isomorphism, for example, many heuristics rely on the existence of certain vertex invariants, which consist of a value f(v) assigned to each vertex v, so that under any isomorphism I, if I(v) = v then f(v) = f(v). One commonly used invariant is the degree of a vertex. In fact nauty, a successful software package for determining graph isomorphism (see [9]), relies on such vertex invariants. This observation can be seen as forming the basis for graph probing, a paradigm we have recently begun exploring for graph comparison [5, 8]. However, we desire more than a simple “yes/no” answer; we are interested in quantifying the similarity between two graphs, not just in whether they may be isomorphic. Conceptually, the idea of probing is to place each of the two graphs under study inside a “black box” capable of evaluating a set of graph-oriented operations (e.g., returning a list of all the leaf vertices, or all vertices labeled in a certain way). We then pose a series of probes and correlate the responses of the two systems. Our past work in the area treats graph probing as an online process; both the query graph and the database graph are available for synthesizing the probe set. While this is an appropriate assumption when one is comparing, say, the output of a recognition algorithm with its associated ground-truth, it is not a workable model for retrieval applications when the database contains anything other than a small number of documents. In this paper, we describe our first steps towards adapting the graph probing paradigm to allow pre-computation of a compact, efficient probe set for databases of graphstructured documents in general, and Web pages coded in HTML in particular. This new model is shown in Figure 1, where the portion of the computation bounded by dashed lines is performed off-line. We consider both comparing two graphs in their entirety, as well as determining whether one graph contains a subgraph that closely matches the other. We present an overview of work in progress, as well as some preliminary experimental results.

...read moreread less

Inferring the Structure of Graph Grammars from Data

[...]

Shailesh P. Doshi, Fang Huang

01 Jan 2003

TL;DR: An algorithm for inferring stochastic graph grammars from data, which uncovers the structure shared by the graphs and represents it in the form of a stoChastic graph grammar.

...read moreread less

Abstract: Graphs can be used to represent such diverse entities as chemical compounds, transportation networks, and the world wide web. Stochastic graph grammars are compact representations of probability distributions over graphs. We present an algorithm for inferring stochastic graph grammars from data. That is, given a set of graphs that, for example, correspond to a set of chemical compounds, all of which have some desirable property, the algorithm uncovers the structure shared by the graphs and represents it in the form of a stochastic graph grammar. The inferred grammar assigns high probability to the graphs from which it was learned and low probability to other graphs. We report results of preliminary experiments in which inferred graph grammars are compared to target grammars used to generated training data.

...read moreread less

Proceedings Article•DOI•

Spatial graph grammars for Web information transformation

[...]

Meikang Qiu, Guang Lei Song, Jun Kong, Kang Zhang

28 Oct 2003

TL;DR: The paper illustrates a detailed example that applies the SGG to transform a XML Web document to a WML structure for the display on mobile devices.

...read moreread less

Abstract: This paper presents an approach to spatial specifications for Web information transformation. Extended from the reserved graph grammar (RGG), a spatial graph grammar (SGG) is proposed. The paper illustrates a detailed example that applies the SGG to transform a XML Web document to a WML structure for the display on mobile devices. The SGG formalism is general enough for a wide range of applications such as multimedia interfaces, electronic publishing and XML document conversion.

...read moreread less

Book Chapter•DOI•

Computing reading trees for constraint diagrams

[...]

Andrew Fish¹, John Howse¹•Institutions (1)

University of Brighton¹

27 Sep 2003-Lecture Notes in Computer Science

TL;DR: A ‘tree-construction’ algorithm, which utilizes graph transformations in order to produce all possible reading trees from a dependence graph, is described, which will aid the production of tools which will allow an advanced user to choose from a range of semantic interpretations of a diagram.

...read moreread less

Abstract: Constraint diagrams are a visual notation designed to complement the Unified Modeling Language in the development of software systems. They generalize Venn diagrams and Euler circles, and include facilities for quantification and navigation of relations. Their design emphasizes scalability and expressiveness while retaining intuitiveness. Due to subtleties concerned with the ordering of symbols in this visual language, the formalization of constraint diagrams is non-trivial; some constraint diagrams have more than one intuitive reading. A ‘reading’ algorithm, which associates a unique semantic interpretation to a constraint diagram, with respect to a reading tree, has been developed. A reading tree provides a partial ordering for syntactic elements of the diagram. Reading trees are obtainable from a partially directed graph, called the dependence graph of the diagram. In this paper we describe a ‘tree-construction’ algorithm, which utilizes graph transformations in order to produce all possible reading trees from a dependence graph. This work will aid the production of tools which will allow an advanced user to choose from a range of semantic interpretations of a diagram.

...read moreread less

Journal Article•DOI•

Using attributed plex grammars for the generation of image and graph databases

[...]

Markus Hagenbuchner¹, Marco Gori², Horst Bunke³, Ah Chung Tsoi¹, Christophe Irniger³ - Show less +1 more•Institutions (3)

University of Wollongong¹, University of Siena², University of Bern³

01 May 2003-Pattern Recognition Letters

TL;DR: It is shown that the generated patterns are particularly suitable for the extraction of graph-based representations, which makes them very appropriate for benchmarks in the area of structural pattern recognition.

...read moreread less

Journal Article•DOI•

On the graph traversal and linear binary-chain programs

[...]

Yangjun Chen¹•Institutions (1)

University of Winnipeg¹

01 Mar 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A new algorithm which requires less time and achieves a linear time complexity for both acyclic and cyclic data by generating most answers directly in terms of the answers already found and the associated "path information" instead of traversing the corresponding paths as usual.

...read moreread less

Abstract: Grahne et al. have presented a graph algorithm for evaluating a subset of recursive queries. This method consists of two phases. In the first phase, the method transforms a linear binary-chain program into a set of equations over expressions containing predicate symbols. In the second phase, a graph is constructed from the equations and the answers are produced by traversing the relevant paths. In this paper, we describe a new algorithm which requires less time than Grahne's. The key idea of the improvement is to reduce the search space that will be traversed when a query is invoked. Furthermore, we speed up the evaluation of cyclic data by generating most answers directly in terms of the answers already found and the associated "path information" instead of traversing the corresponding paths as usual. In this way, our algorithm achieves a linear time complexity for both acyclic and cyclic data.

...read moreread less

Patent•

Search engine having navigation path and orphan file features

[...]

Chang Ching-Chung, Frank Sung, Cheng-Hui Chiu

06 Aug 2003

TL;DR: A collection building utility as mentioned in this paper assembles a batch collection of solely the active objects for retrieval by a search query, which prevents retrieval of an orphan file that would provide a website visitor with incorrect information.

...read moreread less

Abstract: A search engine (100) has a top down transversal algorithm (112) that distinguishes active objects of a website from orphan files depicted in graphs of HTML files of a graph database of the objects and their HTML relations. A collection building utility (120) assembles a batch collection of solely the active objects for retrieval by a search query, which prevents retrieval of an orphan file that would provide a website visitor with incorrect information.

...read moreread less