Showing papers on "Graph database published in 2007"

PDF

Open Access

Journal Article•DOI•

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

[...]

François Fouss¹, Alain Pirotte¹, Jean-Michel Renders², Marco Saerens•Institutions (2)

Université catholique de Louvain¹, Analysis Group²

01 Mar 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and could be applied to machine-learning and pattern-recognition tasks involving a relational database.

...read moreread less

Abstract: This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database

...read moreread less

1,276 citations

Proceedings Article•DOI•

Extracting semantic relations from query logs

[...]

Ricardo Baeza-Yates¹, Alessandro Tiberi²•Institutions (2)

Yahoo!¹, Sapienza University of Rome²

12 Aug 2007

TL;DR: A novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph is proposed, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws.

...read moreread less

Abstract: In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.

...read moreread less

386 citations

Proceedings Article•DOI•

Fg-index: towards verification-free query processing on graph databases

[...]

James Cheng¹, Yiping Ke¹, Wilfred Ng¹, An Lu¹•Institutions (1)

Hong Kong University of Science and Technology¹

11 Jun 2007

TL;DR: A novel indexing technique that constructs a nested inverted-index, called FG- index, based on the set of Frequent subGraphs (FGs), which returns the exact set of query answers without performing candidate verification and is orders of magnitude more efficient than using the state-of-the-art graph index.

...read moreread less

Abstract: Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NP-complete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large. In this paper, we propose a novel indexing technique that constructs a nested inverted-index, called FG-index, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FG-index returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FG-index produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δ-Tolerance Closed Frequent Graphs (δ-TCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FG-index is orders of magnitude more efficient than using the state-of-the-art graph index.

...read moreread less

289 citations

Proceedings Article•DOI•

Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity

[...]

R. Sinha¹, Rada Mihalcea¹•Institutions (1)

University of North Texas¹

17 Sep 2007

TL;DR: The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.

...read moreread less

Abstract: This paper describes an unsupervised graph-based method for word sense disambiguation, and presents comparative evaluations using several measures of word semantic similarity and several algorithms for graph centrality. The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.

...read moreread less

275 citations

Journal Article•DOI•

SAGA: a subgraph matching tool for biological graphs

[...]

Yuanyuan Tian¹, Richard C. McEachin¹, Carlos Santos¹, David J. States¹, Jignesh M. Patel¹ - Show less +1 more•Institutions (1)

University of Michigan¹

05 Jan 2007-Bioinformatics

TL;DR: SAGA employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences, and is orders of magnitude faster than existing methods.

...read moreread less

Abstract: Motivation: With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching. This paper presents a novel approximate graph matching technique called SAGA. This technique employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences. SAGA employs an indexing technique that allows it to efficiently evaluate queries even against large graph datasets. Results: SAGA has been used to query biological pathways and literature datasets, which has revealed interesting similarities between distinct pathways that cannot be found by existing methods. These matches associate seemingly unrelated biological processes, connect studies in different sub-areas of biomedical research and thus pose hypotheses for new discoveries. SAGA is also orders of magnitude faster than existing methods. Availability: SAGA can be accessed freely via the web at http://www.eecs.umich.edu/saga. Binaries are also freely available at this website. Contact: jignesh@eecs.umich.edu Supplementary material: Supplementary material is available at http://www.eecs.umich.edu/periscope/publ/saga-suppl.pdf.

...read moreread less

251 citations

Proceedings Article•

Graph indexing: tree + delta <= graph

[...]

Peixiang Zhao¹, Jeffrey Xu Yu¹, Philip S. Yu²•Institutions (2)

The Chinese University of Hong Kong¹, IBM²

23 Sep 2007

TL;DR: This study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree-Δ ≥Graph), to address the graph containment query problem and achieves an order of magnitude better performance in index construction.

...read moreread less

Abstract: Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query raph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of graphs in G and the nature of complexity for subgraph isomorphism testing, it is desirable to make use of high-quality graph indexing mechanisms to reduce the overall query processing cost. In this paper, we propose a new cost-effective graph indexing method based on frequent tree-features of the graph database. We analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects: feature size, feature selection cost, and pruning power. In order to achieve better pruning ability than existing graph-based indexing methods, we select, in addition to frequent tree-features (Tree), a small number of discriminative graphs (Δ) on demand, without a costly graph mining process beforehand. Our study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree+Δ ≥Graph), to address the graph containment query problem. It has two implications: (1) the index construction by (Tree+Δ) is efficient, and (2) the graph containment query processing by (Tree+Δ) is efficient. Our experimental studies demonstrate that (Tree+Δ) has a compact index structure, achieves an order of magnitude better performance in index construction, and most importantly, outperforms up-to-date graph-based indexing methods: gIndex and C-Tree, in graph containment query processing.

...read moreread less

232 citations

Proceedings Article•DOI•

TreePi: A Novel Graph Indexing Method

[...]

Shijie Zhang¹, Meng Hu¹, Jiong Yang¹•Institutions (1)

Case Western Reserve University¹

15 Apr 2007

TL;DR: A new algorithm which utilizes the location information of indexing structures is used to perform subgraph isomorphism tests and this method is applied on a wide range of real and synthetic data to demonstrate the usefulness and effectiveness of this approach.

...read moreread less

Abstract: Graphs are widely used to model complex structured data such as XML documents, protein networks, and chemical compounds. One of the fundamental problems in graph databases is efficient search and retrieval of graphs using indexing techniques. In this paper, we study the problem of indexing graph databases using frequent subtrees as indexing structures. Trees can be manipulated efficiently while preserving a lot of structural information of the original graphs. In our proposed method, frequent subtrees of a database are selected as the feature set. To save memory, the set of feature trees is shrunk based on a support threshold function and their discriminative power. A tree-partition based query processing scheme is proposed to perform graph queries. The concept of center distance constraints is introduced to prune the search space. Furthermore, a new algorithm which utilizes the location information of indexing structures is used to perform subgraph isomorphism tests. We apply our method on a wide range of real and synthetic data to demonstrate the usefulness and effectiveness of this approach.

...read moreread less

207 citations

Proceedings Article•DOI•

Graph Database Indexing Using Structured Graph Decomposition

[...]

D. W. Williams¹, Jun Huan², Wei Wang¹•Institutions (2)

University of North Carolina at Chapel Hill¹, University of Kansas²

15 Apr 2007

TL;DR: This work introduces a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries and demonstrates its effectiveness in answering queries for two practical datasets.

...read moreread less

Abstract: We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which cross-indexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a code-based canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude.

...read moreread less

185 citations

Proceedings Article•

Analysis of the Wikipedia Category Graph for NLP Applications

[...]

Torsten Zesch, Iryna Gurevych

18 Mar 2007

TL;DR: A graphtheoretic analysis of the category graph is performed, and it is shown that it is a scale-free, small world graph like other well-known lexical semantic networks.

...read moreread less

Abstract: In this paper, we discuss two graphs in Wikipedia (i) the article graph, and (ii) the category graph. We perform a graphtheoretic analysis of the category graph, and show that it is a scale-free, small world graph like other well-known lexical semantic networks. We substantiate our findings by transferring semantic relatedness algorithms defined on WordNet to the Wikipedia category graph. To assess the usefulness of the category graph as an NLP resource, we analyze its coverage and the performance of the transferred semantic relatedness algorithms.

...read moreread less

155 citations

Patent•

Social network aware pattern detection

[...]

Timothy Darr

12 Feb 2007

TL;DR: In this paper, a Social Network Aware Pattern Detection (SNAP) system and method utilizes a highly-scalable, computationally efficient integration of social network analysis (SNA) and graph pattern matching.

...read moreread less

Abstract: Enabling dynamic, computer-driven, context-based detection of social network patterns within an input graph representing a social network. A Social Network Aware Pattern Detection (SNAP) system and method utilizes a highly-scalable, computationally efficient integration of social network analysis (SNA) and graph pattern matching. Social network interaction data is provided as an input graph having nodes and edges. The graph illustrates the connections and/or interactions between people, objects, events, and activities, and matches the interactions to a context. A sample graph pattern of interest is identified and/or defined by the user of the application. With this sample graph pattern and the input graph, a computational analysis is completed to (1) determine when a match of the sample graph pattern is found, and more importantly, (2) assign a weight (or score) to the particular match, according to a pre-defined criteria or context.

...read moreread less

149 citations

Proceedings Article•DOI•

GString: A Novel Approach for Efficient Search in Graph Databases

[...]

Haoliang Jiang¹, Haixun Wang², Philip S. Yu², Shuigeng Zhou¹•Institutions (2)

Fudan University¹, IBM²

15 Apr 2007

TL;DR: A novel sequencing method is introduced to capture the semantics of the underlying graph data and it not only reduces the size of resulting sequences, but also enables semantic-based searching.

...read moreread less

Abstract: Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution to the problem is essential for many applications. A popular approach is to represent both graphs and queries on graphs by sequences, thus converting graph search to subsequence matching. State-of-the-art sequencing methods work at the finest granularity - each node (or edge) in the graph will appear as an element in the resulting sequence. Clearly, such methods are not semantic conscious, and the resulting sequences are not only bulky but also prone to complexities arising from graph isomorphism and other problems in searching. In this paper, we introduce a novel sequencing method to capture the semantics of the underlying graph data. We find meaningful components in graph structures and use them as the most basic units in sequencing. It not only reduces the size of resulting sequences, but also enables semantic-based searching. In this paper, we base our approach on chemical compound databases, although it can be applied to searching other complicated graphs, such as protein structures. Experiments demonstrate that our approach outperforms state-of-the-art graph search methods.

...read moreread less

SPARQL/Update: A language for updating RDF graphs

[...]

Andy Seaborne, Geetha Manjunath

01 Jan 2007

TL;DR: SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs, uses a syntax derived form SPARQL to perform update operations on a collection of graphs in a Graph Store.

...read moreread less

Abstract: This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs. It uses a syntax derived form SPARQL. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs with the Graph Store. This document does not discuss protocol issues. Status of This Document Published for discussion. Table of

...read moreread less

Proceedings Article•DOI•

Software and Algorithms for Graph Queries on Multithreaded Architectures

[...]

Jonathan W. Berry¹, Bruce Hendrickson¹, Simon Kahan², Petr Konecny²•Institutions (2)

Sandia National Laboratories¹, Cray²

26 Mar 2007

TL;DR: A multithreaded algorithm for connected components and a new heuristic for inexact subgraph isomorphism are introduced and explored, and the performance of these and other basic graph algorithms on large scale-free graphs is explored.

...read moreread less

Abstract: Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster-based computing platforms can run these queries. However, the lack of compute-bound processing at each vertex of the input graph and the constant need to retrieve neighbors implies low processor utilization. Furthermore, graph classes such as scale-free social networks lack the locality to make partitioning clearly effective. Massive multithreading is an alternative architectural paradigm, in which a large shared memory is combined with processors that have extra hardware to support many thread contexts. The processor speed is typically slower than normal, and there is no data cache. Rather than mitigating memory latency, multithreaded machines tolerate it. This paradigm is well aligned with the problem of graph search, as the high ratio of memory requests to computation can be tolerated via multithreading. In this paper, we introduce the multithreaded graph library (MTGL), generic graph query software for processing semantic graphs on multithreaded computers. This library currently runs on serial machines and the Cray MTA-2, but Sandia is developing a run-time system that will make it possible to run MTGL-based code on symmetric multiprocessors. We also introduce a multithreaded algorithm for connected components and a new heuristic for inexact subgraph isomorphism We explore the performance of these and other basic graph algorithms on large scale-free graphs. We conclude with a performance comparison between the Cray MTA-2 and Blue Gene/Light for s-t connectivity.

...read moreread less

Proceedings Article•

GRIN: a graph based RDF index

[...]

Octavian Udrea¹, Andrea Pugliese², V. S. Subrahmanian¹•Institutions (2)

University of Maryland, College Park¹, University of Calabria²

22 Jul 2007

TL;DR: GRIN outperforms Jena, Sesame and RDFBroker on all three measures for graph based queries (for other types of queries, it may be worth building one of these other indexes and using it), at the expense of using a larger amount of memory when answering queries.

...read moreread less

Abstract: RDF ("Resource Description Framework") is now a widely used World Wide Web Consortium standard. However, methods to index large volumes of RDF data are still in their infancy. In this paper, we focus on providing a very lightweight indexing mechanism for certain kinds of RDF queries, namely graph-based queries where there is a need to traverse edges in the graph determined by an RDF database. Our approach uses the idea of drawing circles around selected "center" vertices in the graph where the circle would encompass those vertices in the graph that are within a given distance of the "center" vertex. We come up with methods of finding such "center" vertices and identifying the radius of the circles and then leverage this to build an index called GRIN. We compare GRIN with three existing RDF indexex: Jena, Sesame. and RDFBroker. We compared (i) the time to answer graph based queries, (ii) memory needed to store the index, and (iii) the time to build the index. GRIN outperforms Jena, Sesame and RDFBroker on all three measures for graph based queries (for other types of queries, it may be worth building one of these other indexes and using it), at the expense of using a larger amount of memory when answering queries.

...read moreread less

Journal Article•DOI•

Out-of-core coherent closed quasi-clique mining from large dense graph databases

[...]

Zhiping Zeng¹, Jianyong Wang¹, Lizhu Zhou¹, George Karypis²•Institutions (2)

Tsinghua University¹, University of Minnesota²

01 Jun 2007-ACM Transactions on Database Systems

TL;DR: This article studies how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the fact that the downward-closure property no longer holds.

...read moreread less

Abstract: Due to the ability of graphs to represent more generic and more complicated relationships among different objects, graph mining has played a significant role in data mining, attracting increasing attention in the data mining community. In addition, frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has witnessed several applications and received considerable attention in the graph mining community recently. In this article, we study how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the fact that the downward-closure property no longer holds. By fully exploring some properties of quasi-cliques, we propose several novel optimization techniques which can prune the unpromising and redundant subsearch spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of closed quasi-cliques only. Since large databases cannot be held in main memory, we also design an out-of-core solution with efficient index structures for mining coherent closed quasi-cliques from large dense graph databases. We call this Cocaina. Thorough performance study shows that Cocaina is very efficient and scalable for large dense graph databases.

...read moreread less

Proceedings Article•DOI•

Dex: high-performance exploration on large graphs for information retrieval

[...]

Norbert Martínez-Bazan¹, Victor Muntés-Mulero¹, Sergio Gómez-Villamor¹, Jordi Nin², Mario-A. Sánchez-Martínez¹, Josep-L. Larriba-Pey¹ - Show less +2 more•Institutions (2)

Polytechnic University of Catalonia¹, Spanish National Research Council²

06 Nov 2007

TL;DR: DEX is proposed and evaluated, a high performance graph database querying system that allows for the integration of multiple data sources and makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search.

...read moreread less

Abstract: Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories.In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database (IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.

...read moreread less

Patent•

Impact propagation in a directed acyclic graph having restricted views

[...]

Geert De Peuter¹, David Bonnell¹•Institutions (1)

BMC Software¹

14 Dec 2007

TL;DR: In this article, service impact data is efficiently propagated in a directed acyclic graph with restricted views, which allows a system or business administrator to view and receive real-time notification of the impacted state of all nodes in the graph that are available to their permitted view.

...read moreread less

Abstract: Service impact data is efficiently propagated in a directed acyclic graph with restricted views. One or more service components, impact rules and business rules are grouped together into a directed acyclic graph and a related metadata array. Impact propagation uses related metadata array to minimize traversal of the graph. As nodes of the graph are updated to propagate impact data, a determination is made as to when no further impact propagation is required. Subsequently, calculations are terminated without having to traverse the entire graph. This method allows a system or business administrator to view and receive real-time notification of the impacted state of all nodes in the graph that are available to their permitted view. Restricted views ensure that available service impact data is only displayed to end users having the proper authorization to view the underlying impact model data.

...read moreread less

Patent•

Managing computing resources in graph-based computations

[...]

Joseph Skeffington Wholey, Igor Sherb, Ephraim Meriwether Vishniac

15 May 2007

TL;DR: In this article, the authors propose a graph-based computation, in which data processing elements are joined by linking elements, and the data processing element sets are divided into sets, at least one of the sets including multiple data elements.

...read moreread less

Abstract: Executing graph-based computations includes: accepting a specification of a computation graph in which data processing elements are joined by linking elements; dividing the data processing elements into sets, at least one of the sets including multiple of the data processing elements; assigning to each set a different computing resource; and processing data according to the computation graph, including performing computations corresponding to the data processing elements using the assigned computing resources.

...read moreread less

Patent•

Distributing services in graph-based computations

[...]

Igor Sherb, Joseph Skeffington Wholey, Larry W. Allen

09 Aug 2007

TL;DR: In this article, a service request is processed according to a computation graph associated with the service by receiving inputs for the computation graph from a service client, providing the inputs to the computations as records of a data flow, and providing the output to the service client.

...read moreread less

Abstract: A service request is processed according to a computation graph associated with the service by receiving inputs for the computation graph from a service client, providing the inputs to the computation graph as records of a data flow, receiving output from the computation graph, and providing the output to the service client. Data flows are processed concurrently in a graph-based computation by potentially concurrent execution of different types of requests, potentially concurrent execution of similar request types, and/or potentially concurrent execution of work elements within a request.

...read moreread less

Patent•

Processing relational database problems using analog processors

[...]

William G. Macready¹, M. Coury¹, Ivan Sham¹•Institutions (1)

D-Wave Systems¹

31 Oct 2007

TL;DR: In this article, an association graph may be formed based on a query graph and a database graph, providing the results to a query or problem and/or an indication of a level of responsiveness of the results.

...read moreread less

Abstract: Systems, methods and articles solve queries or database problems through the use of graphs. An association graph may be formed based on a query graph and a database graph. The association graph may be solved for a clique, providing the results to a query or problem and/or an indication of a level of responsiveness of the results. Thus, unlimited relaxation of constraint may be achieved. Analog processors such as quantum processors may be used to solve for the clique.

...read moreread less

Patent•

Method for creating a scalable graph database

[...]

Jannes Aasman

29 Mar 2007

TL;DR: In this article, the authors describe an approach for storing or processing data in the form of graph tuples comprising n-parts, where each tuple-part is encoded into a unique part identifier (hereinafter called a UPI), each UPI comprises a tag at a fixed position within the UPI.

...read moreread less

Abstract: Embodiments of a method for creating a graph database which is arranged to store or process data in the form of graph tuples comprising n-parts, are described. In an embodiment, each tuple-part is encoded into a unique part identifier (hereinafter called a UPI), each UPI comprises a tag at a fixed position within the UPI. The tag indicates the datatype of the encoded tuple-part. The content data for the tuple-part is encoded in a code that is configured to reflect the ranking or order of the content data, corresponding to each datatype, relative to other tuples in a set of tuples. For content data that comprises a character-string, the code comprises a hashcode; and for content data that comprises or includes a numeric value, the code comprises an immediate value that directly stores the numeric value without encoding.

...read moreread less

Journal Article•DOI•

Nested Containment List (NCList)

[...]

Alexander V. Alekseyenko¹, Christopher Lee¹•Institutions (1)

University of California, Los Angeles¹

01 Jun 2007-Bioinformatics

TL;DR: A new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set, appears to provide a useful foundation for highly scalable interval database applications.

...read moreread less

Abstract: Motivation: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database. Results: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5–500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are ~100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. Availability: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr Contact: leec@chem.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Proceedings Article•DOI•

Correlation search in graph databases

[...]

Yiping Ke¹, James Cheng¹, Wilfred Ng¹•Institutions (1)

Hong Kong University of Science and Technology¹

12 Aug 2007

TL;DR: This paper proposes a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS), which adopts Pearson's correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs.

...read moreread less

Abstract: Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions which set bounds on the occurrence probability of a candidate in the database. With this result, we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a significantly smaller set of candidates. To further improve the efficiency, we develop three heuristic rules and apply them on the candidate set to further reduce the search space. Our extensive experiments demonstrate the effectiveness of our method on candidate reduction. The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.

...read moreread less

Proceedings Article•DOI•

Top-k subgraph matching query in a large graph

[...]

Lei Zou¹, Lei Chen², Yansheng Lu¹•Institutions (2)

Huazhong University of Science and Technology¹, Hong Kong University of Science and Technology²

09 Nov 2007

TL;DR: This paper addresses top-k sub-graph matching query problem and proposes an efficient query algorithm (that is Ranked Matching algorithm) based on G-Tree, which outperforms the alternative method by orders of magnitude.

...read moreread less

Abstract: Recently, due to its wide applications, subgraph search has attracted a lot of attention from database and data mining community. Sub-graph search is defined as follows: given a query graph Q, we report all data graphs containing Q in the database. However, there is little work about sub-graph search in a single large graph, which has been used in many applications, such as biological network and social network.In this paper, we address top-k sub-graph matching query problem, which is defined as follows: given a query graph Q, we locate top-k matchings of Q in a large data graph G according to a score function. The score function is defined as the sum of the pairwise similarity between a vertex in Q and its matching vertex in G. Specifically, we first design a balanced tree (that is G-Tree) to index the large data graph. Then, based on G-Tree, we propose an efficient query algorithm (that is Ranked Matching algorithm). Our extensive experiment results show that, due to efficiency of pruning strategy, given a query with up to 20 vertices, we can locate the top-100 matchings in less than 10 seconds in a large data graph with 100K vertices. Furthermore, our approach outperforms the alternative method by orders of magnitude.

...read moreread less

Proceedings Article•

Parallel structured duplicate detection

[...]

Rong Zhou¹, Eric A. Hansen²•Institutions (2)

PARC¹, Mississippi State University²

22 Jul 2007

TL;DR: It is shown that structured duplicate detection can also be used to reduce the number of slow synchronization operations needed in parallel graph search, and several techniques for integrating parallel and external-memory graph search in an efficient way are described.

...read moreread less

Abstract: We describe a novel approach to parallelizing graph search using structured duplicate detection. Structured duplicate detection was originally developed as an approach to external-memory graph search that reduces the number of expensive disk I/O operations needed to check stored nodes for duplicates, by using an abstraction of the search graph to localize memory references. In this paper, we show that this approach can also be used to reduce the number of slow synchronization operations needed in parallel graph search. In addition, we describe several techniques for integrating parallel and external-memory graph search in an efficient way. We demonstrate the effectiveness of these techniques in a graph-search algorithm for domain-independent STRIPS planning.

...read moreread less

Patent•

Automated Video-To-Text System

[...]

Hui Cheng¹, Darren Butler¹•Institutions (1)

Sarnoff Corporation¹

03 Apr 2007

TL;DR: In this article, a method for transforming video-to-text is presented that automatically generates text descriptions of the content of a video using a mixture-of-experts blob segmentation algorithm.

...read moreread less

Abstract: A method for transforming Video-To-Text is disclosed that automatically generates text descriptions of the content of a video. The present invention first segments an input video sequence according to predefined semantic classes using a Mixture-of-Experts blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text and/or speech descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions.

...read moreread less

Book Chapter•DOI•

What-if analysis for data warehouse evolution

[...]

George Papastefanatos¹, Panos Vassiliadis², Alkis Simitsis³, Yannis Vassiliou¹•Institutions (3)

National Technical University of Athens¹, University of Ioannina², IBM³

03 Sep 2007

TL;DR: This paper abstracts software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions and uniformly modeled as a graph that is annotated with policies for the management of evolution events.

...read moreread less

Abstract: In this paper, we deal with the problem of performing what-if analysis for changes that occur in the schema/structure of the data warehouse sources. We abstract software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions. Queries and relations are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and indicates the way they are tuned to respond to it.

...read moreread less

Book Chapter•DOI•

Implementation of SPARQL Query Language Based on Graph Homomorphism

[...]

Olivier Corby¹, Catherine Faron-Zucker²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Nice Sophia Antipolis²

22 Jul 2007

TL;DR: This paper is dedicated to the implementation of the sparql query language and its pattern matching mechanism which is reformulated into a graph homomorphism checking constrained by filter evaluation.

...read moreread less

Abstract: The sparql query language is a W3C candidate recommendation for asking and answering queries against RDF data. It offers capabilities for querying by graph patternsand retrieval of solutions is based on graph pattern matching. This paper is dedicated to the implementation of the sparql query language and its pattern matching mechanism which is reformulated into a graph homomorphism checking constrained by filter evaluation.

...read moreread less

Proceedings Article•DOI•

Design Pattern Evolution and Verification Using Graph Transformation

[...]

Chunying Zhao, Jun Kong¹, Kang Zhang²•Institutions (2)

North Dakota State University¹, University of Texas at Dallas²

03 Jan 2007

TL;DR: This paper focuses on the automated evolution of design patterns using graph transformation, and proposes a graph grammar based syntax parser to check the structural integrity of the evolved design patterns.

...read moreread less

Abstract: This paper presents a graph transformation based approach to design pattern evolution. An evolution of a design pattern includes modifications of pattern elements, such as classes, attributes, operations and relationships between classes. Compared with other techniques, graphical notation, as a natural and intuitive way in software modeling, is suitable to be used at the transformation stage. In this paper we focus on the automated evolution of design patterns using graph transformation. The rules for the potential design evolutions are defined. After the evolution process, a graph grammar based syntax parser is proposed to check the structural integrity of the evolved design patterns

...read moreread less

Proceedings Article•DOI•

Monkey: Approximate Graph Mining Based on Spanning Trees

[...]

Shijie Zhang¹, Jiong Yang¹, V. Cheedella¹•Institutions (1)

Case Western Reserve University¹

15 Apr 2007

TL;DR: This paper will study the problem of approximate graph mining and propose an optimized solution which uses frequent trees and a spanning tree based pre-verification check in the mining process.

...read moreread less

Abstract: In the recent past, many exact graph mining algorithms have been developed to find frequent patterns in a graph database. However, many networks or graphs generated from biological data and other applications may be incomplete or inaccurate. Hence, it is necessary to design approximate graph mining techniques. In this paper, we will study the problem of approximate graph mining and propose an optimized solution which uses frequent trees and a spanning tree based pre-verification check in the mining process.

...read moreread less