scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2002"


Proceedings ArticleDOI
24 Aug 2002
TL;DR: This paper presents an unsupervised method for assembling semantic knowledge from a part-of-speech tagged corpus using graph algorithms and focuses on the symmetric relationship between pairs of nouns which occur together in lists.
Abstract: This paper presents an unsupervised method for assembling semantic knowledge from a part-of-speech tagged corpus using graph algorithms. The graph model is built by linking pairs of words which participate in particular syntactic relationships. We focus on the symmetric relationship between pairs of nouns which occur together in lists. An incremental cluster-building algorithm using this part of the graph achieves 82% accuracy at a lexical acquisition task, evaluated against WordNet classes. The model naturally realises domain and corpus specific ambiguities as distinct components in the graph surrounding an ambiguous word.

269 citations


Proceedings ArticleDOI
10 Dec 2002
TL;DR: The algorithm uses hash-based fingerprinting to represent the graphs in an abstract form and to filter the database, and has been tested on databases of size up to 16,000 molecules and performs well in this entire range.
Abstract: GraphGrep is an application-independent method for querying graphs, finding all the occurrences of a subgraph in a database of graphs. The interface to GraphGrep is a regular expression graph query language Glide that combines features from Xpath and Smart. Glide incorporates both single node and variable-length wildcards. Our algorithm uses hash-based fingerprinting to represent the graphs in an abstract form and to filter the database. GraphGrep has been tested on databases of size up to 16,000 molecules and performs well in this entire range.

254 citations


Proceedings ArticleDOI
09 Dec 2002
TL;DR: A new algorithm for mining graph data is proposed, based on a novel definition of support, that can be useful for many applications, including: compact representation of source information and a road-map for browsing and querying information sources.
Abstract: Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data the emphasis is on frequent labels and common topologies. Here, the structure of the data is just as important as its content. We study the problem of discovering typical patterns of graph data. The discovered patterns can be useful for many applications, including: compact representation of source information and a road-map for browsing and querying information sources. Difficulties arise in the discovery task from the complexity of some of the required sub-tasks, such as sub-graph isomorphism. This paper proposes a new algorithm for mining graph data, based on a novel definition of support. Empirical evidence shows practical, as well as theoretical, advantages of our approach.

182 citations


Proceedings ArticleDOI
02 Apr 2002
TL;DR: These techniques for compressing the Link Database are described, performance numbers for compression ratios and decompression speed are provided, and they reduce space requirements to under 6 bits per link.
Abstract: The Connectivity Server is a special-purpose database whose schema models the Web as a graph: a set of nodes (URL) connected by directed edges (hyperlinks). The Link Database provides fast access to the hyperlinks. To support easy implementation of a wide range of graph algorithms we have found it important to fit the Link Database into RAM. In the first version of the Link Database, we achieved this fit by using machines with lots of memory (8 GB), and storing each hyperlink in 32 bits. However, this approach was limited to roughly 100 million Web pages. This paper presents techniques to compress the links to accommodate larger graphs. Our techniques combine well-known compression methods with methods that depend on the properties of the Web graph. The first compression technique takes advantage of the fact that most hyperlinks on most Web pages point to other pages on the same host as the page itself. The second technique takes advantage of the fact that many pages on the same host share hyperlinks, that is, they tend to point to a common set of pages. Together, these techniques reduce space requirements to under 6 bits per link. While (de)compression adds latency to the hyperlink access time, we can still compute the strongly connected components of a 6 billion-edge graph in 22 minutes and run applications such as Kleinberg's HITS in real time. This paper describes our techniques for compressing the Link Database, and provides performance numbers for compression ratios and decompression speed.

141 citations


Book ChapterDOI
28 May 2002
TL;DR: This paper developed a coarse-scale approximate similarly measure which is a good first-order approximation of the similarly metric and is two orders of magnitude more efficient to compute, and developed an exemplar-based indexing scheme which discards a large number of non-matching shapes solely based on distance to exemplars, coarse scale representatives of each category.
Abstract: This paper examines issues arising in applying a previously developed edit-distance shock graph matching technique to indexing into large shape databases. This approach compares the shock graph topology and attributes to produce a similarity metric, and results in 100% recognition rate in querying a database of approximately 200 shapes. However, indexing into a significantly larger database is faced with both the lack of a suitable database, and more significantly with the expense related to computing the metric. We have thus (i) gathered shapes from a variety of sources to create a database of over 1000 shapes from forty categories as a stage towards developing an approach for indexing into a much larger database; (ii) developed a coarse-scale approximate similarly measure which relies on the shock graph topology and a very coarse sampling of link attributes. We show that this is a good first-order approximation of the similarly metric and is two orders of magnitude more efficient to compute. An interesting outcome of using this efficient but approximate similarity measure is that the approximation naturally demands a notion of categories to give high precision; (iii) developed an exemplar-based indexing scheme which discards a large number of non-matching shapes solely based on distance to exemplars, coarse scale representatives of each category. The use of a coarse-scale matching measure in conjunction with a coarse-scale sampling of the database leads to a significant reduction in the computational effort without discarding correct matches, thus paving the way for indexing into databases of tens of thousands of shapes.

114 citations


Patent
30 Dec 2002
TL;DR: In this article, a system for incrementally maintaining non-distributive aggregate functions in a relational database includes a data storage device in which relational database is stored and includes a database maintenance module.
Abstract: A system for incrementally maintaining non-distributive aggregate functions in a relational database includes a data storage device in which a relational database is stored. A processor communicates with the data storage device and includes a database maintenance module. The database maintenance module includes a program for incrementally maintaining non-distributive aggregate functions in a relational database. The method embodied in the program includes determining whether all functions in a relational database query are distributive. Based on this determination, a basic propagate phase graph is selectively altered to yield a new propagate phase graph. Changes to an automatic summary table are then applied thereto based on the new propagate phase graph.

64 citations


Book ChapterDOI
24 Nov 2002
TL;DR: This work overcome the inherent computational complexity of the problem of finding frequent structures in semistructured data by using a summary data structure to prune the search space and to provide interactive feedback.
Abstract: We study the problem of finding frequent structures in semistructured data (represented as a directed labeled graph). Frequent structures are graphs that are isomorphic to a large number of subgraphs in the data graph. Frequent structures form building blocks for visual exploration and data mining of semistructured data. We overcome the inherent computational complexity of the problem by using a summary data structure to prune the search space and to provide interactive feedback. We present an experimental study of our methods operating on real datasets. The implementation of our methods is capable of operating on datasets that are two to three orders of magnitude larger than those described in prior work.

57 citations


Proceedings ArticleDOI
11 Aug 2002
TL;DR: This work extends previous work in both shock graph matching and hierarchical structure indexing to propose the first unified framework for view-based 3-D object recognition using shock graphs, with an improved spectral characterization of shock graph structure that drives a powerful indexing mechanism and drives a matching algorithm that can accommodate noise and occlusion.
Abstract: The shock graph is an emerging shape representation for object recognition, in which a 2-D silhouette is decomposed into a set of qualitative parts, captured in a directed acyclic graph. Although a number of approaches have been proposed for shock graph matching, these approaches do not address the equally important indexing problem. We extend our previous work in both shock graph matching and hierarchical structure indexing to propose the first unified framework for view-based 3-D object recognition using shock graphs. The heart of the framework is an improved spectral characterization of shock graph structure that not only drives a powerful indexing mechanism (to retrieve similar candidates from a large database), but also drives a matching algorithm that can accommodate noise and occlusion. We describe the components of our system and evaluate its performance using both unoccluded and occluded queries. The large set of recognition trials (over 25,000) from a large database (over 1400 views) represents one of the most ambitious shock graph-based recognition experiments conducted to date.

50 citations


Journal ArticleDOI
01 Dec 2002
TL;DR: This short paper is a gentle introduction to the theory of parameterized complexity theory, focusing on the results most relevant for database theory.
Abstract: Parameterized complexity theory provides a framework for a fine-grain complexity analysis of algorithmic problems that are intractable in general. In recent years, ideas from parameterized complexity theory have found their way into various areas of computer science, such as artificial intelligence [15], computational biology [1, 21], and, last but not least, database theory [16, 19]. This short paper is a gentle introduction to the theory, focusing on the results most relevant for database theory. Interested readers are referred to Downey and Fellow’s monograph [6] to learn more about parameterized complexity theory. The paper is organised as follows: In Section 2 we describe two simple fixed-parameter tractable algorithms in an informal way. Section 3 presents the formal framework of parameterized complexity theory. Section 4 is a brief survey of the parameterized complexity of database query evaluation.

44 citations


Journal ArticleDOI
TL;DR: In this article, the need for a similarity measure for comparing two drawings of graphs arises in problems such as interactive graph drawing and the indexing or browsing of large sets of graphs.
Abstract: The need for a similarity measure for comparing two drawings of graphs arises in problems such as interactive graph drawing and the indexing or browsing of large sets of graphs. This paper builds on our previous work [3] by defining some additional similarity measures, refining some existing ones, and presenting the results of a user study designed to evaluate the suitability of the measures.

29 citations


Journal ArticleDOI
TL;DR: The authors present an architectural overview of two toolkits that allow developers to easily integrate graph visualization capabilities into custom software applications and discuss the challenges encountered during implementation and integration of theory and research results into such tools.
Abstract: The authors have created two toolkits that allow developers to easily integrate graph visualization capabilities into custom software applications. The Graph Layout Toolkit (GLT) provides interfaces for modeling, drawing, and automatically laying out graphs. The Graph Editing Toolkit (GET) provides a customizable display and editing layer, which facilitates rapidly developing tools that visualize data in the form of graphs. The authors present an architectural overview of these tools and discuss the challenges encountered during implementation and integration of theory and research results into such tools, In particular, they discuss automatic graph layout and labeling algorithms and complexity management techniques. In addition, they present examples of applications using these tools.

Book ChapterDOI
11 Aug 2002
TL;DR: The aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory and to make the conversion between URLs and their identifiers as fast as possible.
Abstract: In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.

Patent
13 Feb 2002
TL;DR: In this paper, a visual discovery tool for graph generation is described, which has a database for storing a data set, rules, and graph types and a graph generator for selectively applying rules and graph type to the data set to generate graphs.
Abstract: A visual discovery tool for graph generation is described. The visual discovery tool has a database for storing a data set, rules, and graph types and a graph generator for selectively applying rules and graph types to the data set to generate graphs. In one embodiment, triggers and threshold values are stored in the database to determine the execution of the graph generator. In another embodiment, a user interface enables the customization of the rules and graph types.

Patent
Oliver Goldman1
25 Mar 2002
TL;DR: In this paper, the authors present a method for accessing text-based linearized graph data, where the node-traversal data identifies for each of a subset of nodes in the represented data structure one or more locations in the textbased linearised graph data corresponding to other nodes in a data structure.
Abstract: Methods and apparatus implementing systems and techniques for accessing text-based linearized graph data. In general, in one aspect, a method includes obtaining text-based linearized graph data representing a data structure having nodes, and generating node-traversal data for the text-based linearized graph data, where the node-traversal data identifies for each of a subset of nodes in the represented data structure one or more locations in the text-based linearized graph data corresponding to one or more other nodes in the represented data structure, and associating the node-traversal data with the text-based linearized graph data. For example, linear offsets can be added to a document including text-based linearized graph data, such as an XML document, to enable random access to the represented nodes without having to parse the entire document, and without interfering with the generally understood structure and content of the document.


01 Jan 2002
TL;DR: Zhang et al. as mentioned in this paper proposed a concentric-circle model to more accurately define communities, where the most important objects representing the concept of a whole community lie in the center and are called core objects.
Abstract: Discovering communities from a graph structure such as the Web has become an interesting research problem recently. In this paper, comparing with the state-of-the-art authority detecting and graph partitioning methods, we propose a concentric-circle model to more accurately define communities. With this model, a community could be described as a set of concentric-circles. The most important objects representing the concept of a whole community lie in the center and are called core objects. Affiliated objects, which are related to the core objects, surround the core with different ranks. Base on the concentric-circle model, a novel algorithm is developed to discover communities conforming to this model. We also conducted a case study to automatically discover research interest groups in the computer science domain from the Web. Experiments show that our method is very effective to generate high-quality communities with more clear structure and more tunable granularity.

Journal Article
TL;DR: A novel topology algorithm for network tracking based on node merging is implemented, and with this algorithm the applied programs such as dynamic colouring in SCADA and checkup of dispatching operation in DTS, etc.,are developed.
Abstract: Network topology is the basis of power system analysis software and it is also the kernel of visualized EMS to establish network topology by visualized methodIn this paper an abstract object oriented model for power network topology is put forward and a versatile and easily expandable representing method of power system topology is built upOn the integrated platform of graphic database a fast algorithm which can automatically and quickly generate topology by use of coordinate relations in vectorgram is realizedThe core of this method is that the calculation workload can be decreased by compressing the number of graph icons taken both part in the topology formation and by using a partitioning methodAdopting the presented object oriented network topology model,a novel topology algorithm for network tracking based on node merging is implemented,and with this algorithm the applied programs such as dynamic colouring in SCADA and checkup of dispatching operation in DTS,etc,are developed

Journal ArticleDOI
TL;DR: This paper presents a graphical query language for XML based on a simple form of graph grammars that permits us to extract data and reorganize information in a new structure, and provides an example-driven comparison of the language w.r.t. other XML query languages.
Abstract: In this paper we present a graphical query language for XML. The language, based on a simple form of graph grammars, permits us to extract data and reorganize information in a new structure. As with most of the current query languages for XML, queries consist of two parts: one extracting a subgraph and one constructing the output graph. The semantics of queries is given in terms of graph grammars. The use of graph grammars makes it possible to define, in a simple way, the structural properties of both the subgraph that has to be extracted and the graph that has to be constructed. We provide an example-driven comparison of our language w.r.t. other XML query languages, and show the effectiveness and simplicity of our approach.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: An algorithm for automatic mining of data dependencies in a relation that can be used to construct a granular database scheme that is able to present a coarse or refined view of the relations in the database.
Abstract: We introduce an algorithm for automatic mining of data dependencies in a relation The mined data dependencies can be used to construct a granular database scheme Unlike the traditional approach, the granular database scheme has the following advantages: (1) it is able to present a coarse or refined view of the relations in the database; (2) queries can be answered more efficiently using the granular database scheme than the flat database scheme

Proceedings ArticleDOI
17 Jul 2002
TL;DR: This paper presents a graphical query language for XML, based on a simple form of graph grammars, that permits us to extract data and reorganize information in a new structure.
Abstract: In this paper we present a graphical query language for XML. The language, based on a simple form of graph grammars, permits us to extract data and reorganize information in a new structure. As with most of the current query languages for XML, queries consist of two parts: one extracting a sub-graph and one constructing the output graph. The semantics of queries is given in terms of graph grammars. The use of graph grammars makes it possible to define, in a simple way, the structural properties of both the subgraph that has to be extracted and the graph that has to be constructed. By means of examples, we show the effectiveness and simplicity of our approach.


Proceedings ArticleDOI
07 Aug 2002
TL;DR: It is demonstrated that a graph database that considers isomorphisms can drastically reduce the number of evaluations in an evolutionary structure optimization process.
Abstract: Concepts from the graph theory and molecular evolution are proposed for analyzing effects of redundancy induced by graph isomorphisms on the structure optimization of neural networks. It is demonstrated that a graph database that considers isomorphisms can drastically reduce the number of evaluations in an evolutionary structure optimization process.

01 Jan 2002
TL;DR: This paper addresses graph databases as a domain in-dependent concept and proposes a concept called “structure vectorization” for re-trieval for indexing and retrieval from structure domi-nated graph databases.
Abstract: This paper addresses the problem of retrieval from graphdatabases. Graph databases store graph structures in-stead of tables. Typically, graph databases are appli-cable in domains that require storage and retrieval ofstructural information. One of the main issues in graphdatabases is retrieval of member graphs based on struc-ture matching. Structure matching of graphs is a knownNP-completeproblem. In graph databases, this is com-pounded by the fact that structure matching has to be per-formed against a large number of graphs in the database.This paper addresses graph databases as a domain in-dependent concept. They are shown to be defined by aproperty of dominance of either structure over attributesor vice versa. Retrieval from structure dominated graphdatabases are much more difficult than retrieval from at-tribute dominated graph databases. The paper also pro-poses a concept called “structure vectorization” for re-trieval for indexing and retrieval from structure domi-nated graph databases.Keywords: Graph databases, Architecture, Struc-ture vectorization, Information retrieval

Proceedings ArticleDOI
15 Apr 2002
TL;DR: The traditional linear model of graphics-driven visualization is modified by defining two mappings between the abstract data and physical pictures, the Geometric Mapping and the Graphical Mapping, which are then used to visualize a variety of domain-specific attributes associated with the web objects in a web graph.
Abstract: This paper introduces a new Graph Visualization technique that can be used to visualize web objects with many associated attributes. We modify the traditional linear model of graphics-driven visualization by defining two mappings between the abstract data and physical pictures, the Geometric Mapping and the Graphical Mapping.We then use this technique to visualize a variety of domain-specific attributes associated with the web objects in a web graph, such as the access frequency and the connectivity of the web page in the web locality.

Journal Article
TL;DR: In this article, a multi-agent model for constructing process control intelligent systems is discussed, which is based on three paradigms: pattern recognition, formal (string and graph) automata and rules.
Abstract: The multi-agent model for constructing process control intelligent systems is discussed in the paper. Agents of the model are based on three paradigms: pattern recognition, formal (string and graph) automata and rules. The efficient syntactic pattern recognition schemes are used for analysing string and graph structures that represent a structured knowledge. For string-like structures DPLL(k) quasi-context sensitive languages are applied. Graph structures are analysed with ETPL(k) graph parsers in a polynomial time. Grammatical inference algorithms can be used for both kinds of structures. It allows one to embed self-learning schemes in agents of the model.

Book ChapterDOI
16 Dec 2002
TL;DR: An experimental study on the query processing efficiency of a native-XML database system and an XML-enabled database system on a selected set of queries including operations from text-processing, DML and relation algebra.
Abstract: With XML becoming a standard for representing semi-structured documents on the web and a standard for data exchange between different systems, some database companies are adding XML support to their existing database systems, while some other companies coming out with pure or native database systems for XML. In this paper, we present an experimental study on the query processing efficiency of a native-XML database system and an XML-enabled database system on a selected set of queries including operations from text-processing, DML and relation algebra. The experiments are conducted on two well-known commercial database systems using the web interfaces based on HTTP. The cost metrics we used are CPU time, the numbers of physical and logical reads. The queries were run on identical machines for 3 different sizes of documents with and without indexing. A subset of experimental results is presented and overall results are discussed. Generally speaking the XML-enabled system performed better.

Journal ArticleDOI
TL;DR: This paper first investigates the conditions for recognizing some types of the represented graph, and shows how connectivity, 2-connectivity, Eulerian graphs, etc., can be characterized using just one relation of the database.
Abstract: Among other representations, the relational databases are also widely used for storing spatial data. The model presented in this paper is a slightly modified version of the PLA database [1]. This spatial relational model serves to represent the topological properties of geographic data. In this paper, we first investigate the conditions for recognizing some types of the represented graph. We show how connectivity, 2-connectivity, Eulerian graphs, etc., can be characterized using just one relation of the database. Second, we point out redundancies in the representation and connections among the four relations of the database. Moreover, we design efficient (linear-time) algorithms for data retrieval/reconstruction of the stored spatial object, both in the planar and spherical cases. They also serve as constraints checking.

Book ChapterDOI
11 Aug 2002
TL;DR: The site is defined as follows to define the global-links and the local-links, instead of the ambiguous concept in daily life, to display graph structures in the Weblinks as Figure 1.
Abstract: Web -linkage Viewer is a system that draws the Web-links, dividing into the global-links and the local-links, and placing the top nodes of sites in the Web and the global-links on a spherical surface and the local-links as trees in cones emanating from the spherical surface, to display graph structures in the Weblinks understandably as Figure 1. We define the site as follows to define the global-links and the local-links, instead of the ambiguous concept in daily life.

Posted Content
TL;DR: This paper shows description applicable to a wide range of data models that have some notion of object (-identity), and proposes to turn it into a data model primitive much like, say, inheritance, to boost query performance and to reduce the redundancy of data.
Abstract: Graph simulation (using graph schemata or data guides) has been successfully proposed as a technique for adding structure to semistructured data. Design patterns for description (such as meta-classes and homomorphisms between schema layers), which are prominent in the object-oriented programming community, constitute a generalization of this graph simulation approach. In this paper, we show description applicable to a wide range of data models that have some notion of object (-identity), and propose to turn it into a data model primitive much like, say, inheritance. We argue that such an extension fills a practical need in contemporary data management. Then, we present algebraic techniques for query optimization (using the notions of described and description queries). Finally, in the semistructured setting, we discuss the pruning of regular path queries (with nested conditions) using description meta-data. In this context, our notion of meta-data extends graph schemata and data guides by meta-level values, allowing to boost query performance and to reduce the redundancy of data.

Patent
06 Dec 2002
TL;DR: In this paper, the authors proposed a new condition where the first production lines and columns are only connected to the second production line and column when the first line and rows are in the normal shape.
Abstract: PROBLEM TO BE SOLVED: To further improve the efficiency of AGM algorithm. SOLUTION: In the AGM algorithm where graph data (high frequency graph) provided with a degree of support which is more than the minimum degree of support can be efficiently extracted from a graph database consisting of graph construction data, 'relabel' is executed of a function carrying out ordering of a graph peak label and a side label (step 1). Furthermore, in the function 'Newjoin' producing an adjacent ling and column aggregation Ck+1 which is candidate for the high frequency graph which is size k+1 from the adjacent line and column aggregation Fk which the high frequency graph which is size k, in addition to the three conditions of the AGM algorithm, a fourth condition is added where the first production lines and columns are only connected to the second production line and column when the first production line and rows are in the normal shape.