scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2007"


Journal ArticleDOI
TL;DR: The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and could be applied to machine-learning and pattern-recognition tasks involving a relational database.
Abstract: This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database

1,276 citations


Proceedings ArticleDOI
12 Aug 2007
TL;DR: A novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph is proposed, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws.
Abstract: In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.

386 citations


Proceedings ArticleDOI
11 Jun 2007
TL;DR: A novel indexing technique that constructs a nested inverted-index, called FG- index, based on the set of Frequent subGraphs (FGs), which returns the exact set of query answers without performing candidate verification and is orders of magnitude more efficient than using the state-of-the-art graph index.
Abstract: Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NP-complete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large. In this paper, we propose a novel indexing technique that constructs a nested inverted-index, called FG-index, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FG-index returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FG-index produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δ-Tolerance Closed Frequent Graphs (δ-TCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FG-index is orders of magnitude more efficient than using the state-of-the-art graph index.

289 citations


Proceedings ArticleDOI
17 Sep 2007
TL;DR: The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.
Abstract: This paper describes an unsupervised graph-based method for word sense disambiguation, and presents comparative evaluations using several measures of word semantic similarity and several algorithms for graph centrality. The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.

275 citations


Journal ArticleDOI
TL;DR: SAGA employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences, and is orders of magnitude faster than existing methods.
Abstract: Motivation: With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching. This paper presents a novel approximate graph matching technique called SAGA. This technique employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences. SAGA employs an indexing technique that allows it to efficiently evaluate queries even against large graph datasets. Results: SAGA has been used to query biological pathways and literature datasets, which has revealed interesting similarities between distinct pathways that cannot be found by existing methods. These matches associate seemingly unrelated biological processes, connect studies in different sub-areas of biomedical research and thus pose hypotheses for new discoveries. SAGA is also orders of magnitude faster than existing methods. Availability: SAGA can be accessed freely via the web at http://www.eecs.umich.edu/saga. Binaries are also freely available at this website. Contact: jignesh@eecs.umich.edu Supplementary material: Supplementary material is available at http://www.eecs.umich.edu/periscope/publ/saga-suppl.pdf.

251 citations


Proceedings Article
23 Sep 2007
TL;DR: This study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree-Δ ≥Graph), to address the graph containment query problem and achieves an order of magnitude better performance in index construction.
Abstract: Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query raph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of graphs in G and the nature of complexity for subgraph isomorphism testing, it is desirable to make use of high-quality graph indexing mechanisms to reduce the overall query processing cost. In this paper, we propose a new cost-effective graph indexing method based on frequent tree-features of the graph database. We analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects: feature size, feature selection cost, and pruning power. In order to achieve better pruning ability than existing graph-based indexing methods, we select, in addition to frequent tree-features (Tree), a small number of discriminative graphs (Δ) on demand, without a costly graph mining process beforehand. Our study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree+Δ ≥Graph), to address the graph containment query problem. It has two implications: (1) the index construction by (Tree+Δ) is efficient, and (2) the graph containment query processing by (Tree+Δ) is efficient. Our experimental studies demonstrate that (Tree+Δ) has a compact index structure, achieves an order of magnitude better performance in index construction, and most importantly, outperforms up-to-date graph-based indexing methods: gIndex and C-Tree, in graph containment query processing.

232 citations


Proceedings ArticleDOI
15 Apr 2007
TL;DR: A new algorithm which utilizes the location information of indexing structures is used to perform subgraph isomorphism tests and this method is applied on a wide range of real and synthetic data to demonstrate the usefulness and effectiveness of this approach.
Abstract: Graphs are widely used to model complex structured data such as XML documents, protein networks, and chemical compounds. One of the fundamental problems in graph databases is efficient search and retrieval of graphs using indexing techniques. In this paper, we study the problem of indexing graph databases using frequent subtrees as indexing structures. Trees can be manipulated efficiently while preserving a lot of structural information of the original graphs. In our proposed method, frequent subtrees of a database are selected as the feature set. To save memory, the set of feature trees is shrunk based on a support threshold function and their discriminative power. A tree-partition based query processing scheme is proposed to perform graph queries. The concept of center distance constraints is introduced to prune the search space. Furthermore, a new algorithm which utilizes the location information of indexing structures is used to perform subgraph isomorphism tests. We apply our method on a wide range of real and synthetic data to demonstrate the usefulness and effectiveness of this approach.

207 citations


Proceedings ArticleDOI
15 Apr 2007
TL;DR: This work introduces a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries and demonstrates its effectiveness in answering queries for two practical datasets.
Abstract: We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which cross-indexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a code-based canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude.

185 citations


Proceedings Article
18 Mar 2007
TL;DR: A graphtheoretic analysis of the category graph is performed, and it is shown that it is a scale-free, small world graph like other well-known lexical semantic networks.
Abstract: In this paper, we discuss two graphs in Wikipedia (i) the article graph, and (ii) the category graph. We perform a graphtheoretic analysis of the category graph, and show that it is a scale-free, small world graph like other well-known lexical semantic networks. We substantiate our findings by transferring semantic relatedness algorithms defined on WordNet to the Wikipedia category graph. To assess the usefulness of the category graph as an NLP resource, we analyze its coverage and the performance of the transferred semantic relatedness algorithms.

155 citations


Patent
12 Feb 2007
TL;DR: In this paper, a Social Network Aware Pattern Detection (SNAP) system and method utilizes a highly-scalable, computationally efficient integration of social network analysis (SNA) and graph pattern matching.
Abstract: Enabling dynamic, computer-driven, context-based detection of social network patterns within an input graph representing a social network. A Social Network Aware Pattern Detection (SNAP) system and method utilizes a highly-scalable, computationally efficient integration of social network analysis (SNA) and graph pattern matching. Social network interaction data is provided as an input graph having nodes and edges. The graph illustrates the connections and/or interactions between people, objects, events, and activities, and matches the interactions to a context. A sample graph pattern of interest is identified and/or defined by the user of the application. With this sample graph pattern and the input graph, a computational analysis is completed to (1) determine when a match of the sample graph pattern is found, and more importantly, (2) assign a weight (or score) to the particular match, according to a pre-defined criteria or context.

149 citations


Proceedings ArticleDOI
15 Apr 2007
TL;DR: A novel sequencing method is introduced to capture the semantics of the underlying graph data and it not only reduces the size of resulting sequences, but also enables semantic-based searching.
Abstract: Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution to the problem is essential for many applications. A popular approach is to represent both graphs and queries on graphs by sequences, thus converting graph search to subsequence matching. State-of-the-art sequencing methods work at the finest granularity - each node (or edge) in the graph will appear as an element in the resulting sequence. Clearly, such methods are not semantic conscious, and the resulting sequences are not only bulky but also prone to complexities arising from graph isomorphism and other problems in searching. In this paper, we introduce a novel sequencing method to capture the semantics of the underlying graph data. We find meaningful components in graph structures and use them as the most basic units in sequencing. It not only reduces the size of resulting sequences, but also enables semantic-based searching. In this paper, we base our approach on chemical compound databases, although it can be applied to searching other complicated graphs, such as protein structures. Experiments demonstrate that our approach outperforms state-of-the-art graph search methods.

01 Jan 2007
TL;DR: SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs, uses a syntax derived form SPARQL to perform update operations on a collection of graphs in a Graph Store.
Abstract: This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs. It uses a syntax derived form SPARQL. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs with the Graph Store. This document does not discuss protocol issues. Status of This Document Published for discussion. Table of

Proceedings ArticleDOI
26 Mar 2007
TL;DR: A multithreaded algorithm for connected components and a new heuristic for inexact subgraph isomorphism are introduced and explored, and the performance of these and other basic graph algorithms on large scale-free graphs is explored.
Abstract: Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster-based computing platforms can run these queries. However, the lack of compute-bound processing at each vertex of the input graph and the constant need to retrieve neighbors implies low processor utilization. Furthermore, graph classes such as scale-free social networks lack the locality to make partitioning clearly effective. Massive multithreading is an alternative architectural paradigm, in which a large shared memory is combined with processors that have extra hardware to support many thread contexts. The processor speed is typically slower than normal, and there is no data cache. Rather than mitigating memory latency, multithreaded machines tolerate it. This paradigm is well aligned with the problem of graph search, as the high ratio of memory requests to computation can be tolerated via multithreading. In this paper, we introduce the multithreaded graph library (MTGL), generic graph query software for processing semantic graphs on multithreaded computers. This library currently runs on serial machines and the Cray MTA-2, but Sandia is developing a run-time system that will make it possible to run MTGL-based code on symmetric multiprocessors. We also introduce a multithreaded algorithm for connected components and a new heuristic for inexact subgraph isomorphism We explore the performance of these and other basic graph algorithms on large scale-free graphs. We conclude with a performance comparison between the Cray MTA-2 and Blue Gene/Light for s-t connectivity.

Proceedings Article
22 Jul 2007
TL;DR: GRIN outperforms Jena, Sesame and RDFBroker on all three measures for graph based queries (for other types of queries, it may be worth building one of these other indexes and using it), at the expense of using a larger amount of memory when answering queries.
Abstract: RDF ("Resource Description Framework") is now a widely used World Wide Web Consortium standard. However, methods to index large volumes of RDF data are still in their infancy. In this paper, we focus on providing a very lightweight indexing mechanism for certain kinds of RDF queries, namely graph-based queries where there is a need to traverse edges in the graph determined by an RDF database. Our approach uses the idea of drawing circles around selected "center" vertices in the graph where the circle would encompass those vertices in the graph that are within a given distance of the "center" vertex. We come up with methods of finding such "center" vertices and identifying the radius of the circles and then leverage this to build an index called GRIN. We compare GRIN with three existing RDF indexex: Jena, Sesame. and RDFBroker. We compared (i) the time to answer graph based queries, (ii) memory needed to store the index, and (iii) the time to build the index. GRIN outperforms Jena, Sesame and RDFBroker on all three measures for graph based queries (for other types of queries, it may be worth building one of these other indexes and using it), at the expense of using a larger amount of memory when answering queries.

Journal ArticleDOI
TL;DR: This article studies how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the fact that the downward-closure property no longer holds.
Abstract: Due to the ability of graphs to represent more generic and more complicated relationships among different objects, graph mining has played a significant role in data mining, attracting increasing attention in the data mining community. In addition, frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has witnessed several applications and received considerable attention in the graph mining community recently. In this article, we study how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the fact that the downward-closure property no longer holds. By fully exploring some properties of quasi-cliques, we propose several novel optimization techniques which can prune the unpromising and redundant subsearch spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of closed quasi-cliques only. Since large databases cannot be held in main memory, we also design an out-of-core solution with efficient index structures for mining coherent closed quasi-cliques from large dense graph databases. We call this Cocaina. Thorough performance study shows that Cocaina is very efficient and scalable for large dense graph databases.

Proceedings ArticleDOI
06 Nov 2007
TL;DR: DEX is proposed and evaluated, a high performance graph database querying system that allows for the integration of multiple data sources and makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search.
Abstract: Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories.In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database (IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.

Patent
14 Dec 2007
TL;DR: In this article, service impact data is efficiently propagated in a directed acyclic graph with restricted views, which allows a system or business administrator to view and receive real-time notification of the impacted state of all nodes in the graph that are available to their permitted view.
Abstract: Service impact data is efficiently propagated in a directed acyclic graph with restricted views. One or more service components, impact rules and business rules are grouped together into a directed acyclic graph and a related metadata array. Impact propagation uses related metadata array to minimize traversal of the graph. As nodes of the graph are updated to propagate impact data, a determination is made as to when no further impact propagation is required. Subsequently, calculations are terminated without having to traverse the entire graph. This method allows a system or business administrator to view and receive real-time notification of the impacted state of all nodes in the graph that are available to their permitted view. Restricted views ensure that available service impact data is only displayed to end users having the proper authorization to view the underlying impact model data.

Patent
15 May 2007
TL;DR: In this article, the authors propose a graph-based computation, in which data processing elements are joined by linking elements, and the data processing element sets are divided into sets, at least one of the sets including multiple data elements.
Abstract: Executing graph-based computations includes: accepting a specification of a computation graph in which data processing elements are joined by linking elements; dividing the data processing elements into sets, at least one of the sets including multiple of the data processing elements; assigning to each set a different computing resource; and processing data according to the computation graph, including performing computations corresponding to the data processing elements using the assigned computing resources.

Patent
09 Aug 2007
TL;DR: In this article, a service request is processed according to a computation graph associated with the service by receiving inputs for the computation graph from a service client, providing the inputs to the computations as records of a data flow, and providing the output to the service client.
Abstract: A service request is processed according to a computation graph associated with the service by receiving inputs for the computation graph from a service client, providing the inputs to the computation graph as records of a data flow, receiving output from the computation graph, and providing the output to the service client. Data flows are processed concurrently in a graph-based computation by potentially concurrent execution of different types of requests, potentially concurrent execution of similar request types, and/or potentially concurrent execution of work elements within a request.

Patent
31 Oct 2007
TL;DR: In this article, an association graph may be formed based on a query graph and a database graph, providing the results to a query or problem and/or an indication of a level of responsiveness of the results.
Abstract: Systems, methods and articles solve queries or database problems through the use of graphs. An association graph may be formed based on a query graph and a database graph. The association graph may be solved for a clique, providing the results to a query or problem and/or an indication of a level of responsiveness of the results. Thus, unlimited relaxation of constraint may be achieved. Analog processors such as quantum processors may be used to solve for the clique.

Patent
29 Mar 2007
TL;DR: In this article, the authors describe an approach for storing or processing data in the form of graph tuples comprising n-parts, where each tuple-part is encoded into a unique part identifier (hereinafter called a UPI), each UPI comprises a tag at a fixed position within the UPI.
Abstract: Embodiments of a method for creating a graph database which is arranged to store or process data in the form of graph tuples comprising n-parts, are described. In an embodiment, each tuple-part is encoded into a unique part identifier (hereinafter called a UPI), each UPI comprises a tag at a fixed position within the UPI. The tag indicates the datatype of the encoded tuple-part. The content data for the tuple-part is encoded in a code that is configured to reflect the ranking or order of the content data, corresponding to each datatype, relative to other tuples in a set of tuples. For content data that comprises a character-string, the code comprises a hashcode; and for content data that comprises or includes a numeric value, the code comprises an immediate value that directly stores the numeric value without encoding.

Journal ArticleDOI
TL;DR: A new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set, appears to provide a useful foundation for highly scalable interval database applications.
Abstract: Motivation: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database. Results: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5–500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are ~100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. Availability: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr Contact: leec@chem.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Proceedings ArticleDOI
12 Aug 2007
TL;DR: This paper proposes a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS), which adopts Pearson's correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs.
Abstract: Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions which set bounds on the occurrence probability of a candidate in the database. With this result, we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a significantly smaller set of candidates. To further improve the efficiency, we develop three heuristic rules and apply them on the candidate set to further reduce the search space. Our extensive experiments demonstrate the effectiveness of our method on candidate reduction. The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.

Proceedings ArticleDOI
09 Nov 2007
TL;DR: This paper addresses top-k sub-graph matching query problem and proposes an efficient query algorithm (that is Ranked Matching algorithm) based on G-Tree, which outperforms the alternative method by orders of magnitude.
Abstract: Recently, due to its wide applications, subgraph search has attracted a lot of attention from database and data mining community. Sub-graph search is defined as follows: given a query graph Q, we report all data graphs containing Q in the database. However, there is little work about sub-graph search in a single large graph, which has been used in many applications, such as biological network and social network.In this paper, we address top-k sub-graph matching query problem, which is defined as follows: given a query graph Q, we locate top-k matchings of Q in a large data graph G according to a score function. The score function is defined as the sum of the pairwise similarity between a vertex in Q and its matching vertex in G. Specifically, we first design a balanced tree (that is G-Tree) to index the large data graph. Then, based on G-Tree, we propose an efficient query algorithm (that is Ranked Matching algorithm). Our extensive experiment results show that, due to efficiency of pruning strategy, given a query with up to 20 vertices, we can locate the top-100 matchings in less than 10 seconds in a large data graph with 100K vertices. Furthermore, our approach outperforms the alternative method by orders of magnitude.

Proceedings Article
22 Jul 2007
TL;DR: It is shown that structured duplicate detection can also be used to reduce the number of slow synchronization operations needed in parallel graph search, and several techniques for integrating parallel and external-memory graph search in an efficient way are described.
Abstract: We describe a novel approach to parallelizing graph search using structured duplicate detection. Structured duplicate detection was originally developed as an approach to external-memory graph search that reduces the number of expensive disk I/O operations needed to check stored nodes for duplicates, by using an abstraction of the search graph to localize memory references. In this paper, we show that this approach can also be used to reduce the number of slow synchronization operations needed in parallel graph search. In addition, we describe several techniques for integrating parallel and external-memory graph search in an efficient way. We demonstrate the effectiveness of these techniques in a graph-search algorithm for domain-independent STRIPS planning.

Patent
03 Apr 2007
TL;DR: In this article, a method for transforming video-to-text is presented that automatically generates text descriptions of the content of a video using a mixture-of-experts blob segmentation algorithm.
Abstract: A method for transforming Video-To-Text is disclosed that automatically generates text descriptions of the content of a video. The present invention first segments an input video sequence according to predefined semantic classes using a Mixture-of-Experts blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text and/or speech descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions.

Book ChapterDOI
03 Sep 2007
TL;DR: This paper abstracts software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions and uniformly modeled as a graph that is annotated with policies for the management of evolution events.
Abstract: In this paper, we deal with the problem of performing what-if analysis for changes that occur in the schema/structure of the data warehouse sources. We abstract software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions. Queries and relations are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and indicates the way they are tuned to respond to it.

Book ChapterDOI
22 Jul 2007
TL;DR: This paper is dedicated to the implementation of the sparql query language and its pattern matching mechanism which is reformulated into a graph homomorphism checking constrained by filter evaluation.
Abstract: The sparql query language is a W3C candidate recommendation for asking and answering queries against RDF data. It offers capabilities for querying by graph patternsand retrieval of solutions is based on graph pattern matching. This paper is dedicated to the implementation of the sparql query language and its pattern matching mechanism which is reformulated into a graph homomorphism checking constrained by filter evaluation.

Proceedings ArticleDOI
03 Jan 2007
TL;DR: This paper focuses on the automated evolution of design patterns using graph transformation, and proposes a graph grammar based syntax parser to check the structural integrity of the evolved design patterns.
Abstract: This paper presents a graph transformation based approach to design pattern evolution. An evolution of a design pattern includes modifications of pattern elements, such as classes, attributes, operations and relationships between classes. Compared with other techniques, graphical notation, as a natural and intuitive way in software modeling, is suitable to be used at the transformation stage. In this paper we focus on the automated evolution of design patterns using graph transformation. The rules for the potential design evolutions are defined. After the evolution process, a graph grammar based syntax parser is proposed to check the structural integrity of the evolved design patterns

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This paper will study the problem of approximate graph mining and propose an optimized solution which uses frequent trees and a spanning tree based pre-verification check in the mining process.
Abstract: In the recent past, many exact graph mining algorithms have been developed to find frequent patterns in a graph database. However, many networks or graphs generated from biological data and other applications may be incomplete or inaccurate. Hence, it is necessary to design approximate graph mining techniques. In this paper, we will study the problem of approximate graph mining and propose an optimized solution which uses frequent trees and a spanning tree based pre-verification check in the mining process.