Showing papers on "Graph database published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Freebase: a collaboratively created graph database for structuring human knowledge

[...]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor - Show less +1 more

09 Jun 2008

TL;DR: MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

...read moreread less

Abstract: Freebase is a practical, scalable tuple database used to structure general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Freebase currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP-based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

...read moreread less

4,813 citations

Journal Article•DOI•

Survey of graph database models

[...]

Renzo Angles¹, Claudio Gutierrez¹•Institutions (1)

University of Chile¹

22 Feb 2008-ACM Computing Surveys

TL;DR: The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

...read moreread less

Abstract: Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

...read moreread less

1,669 citations

Book Chapter•DOI•

IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning

[...]

Kaspar Riesen¹, Horst Bunke¹•Institutions (1)

University of Bern¹

04 Dec 2008

TL;DR: A repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications is introduced, to make the different approaches in graph based machine learning better comparable.

...read moreread less

Abstract: In recent years the use of graph based representation has gained popularity in pattern recognition and machine learning. As a matter of fact, object representation by means of graphs has a number of advantages over feature vectors. Therefore, various algorithms for graph based machine learning have been proposed in the literature. However, in contrast with the emerging interest in graph based representation, a lack of standardized graph data sets for benchmarking can be observed. Common practice is that researchers use their own data sets, and this behavior cumbers the objective evaluation of the proposed methods. In order to make the different approaches in graph based machine learning better comparable, the present paper aims at introducing a repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications.

...read moreread less

484 citations

Proceedings Article•DOI•

Graphs-at-a-time: query language and access methods for graph databases

[...]

Huahai He¹, Ambuj K. Singh¹•Institutions (1)

University of California, Santa Barbara¹

09 Jun 2008

TL;DR: A graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs is presented and access methods of the selectionoperator are investigated.

...read moreread less

Abstract: With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

...read moreread less

433 citations

Journal Article•DOI•

An annotated bibliography on guaranteed graph searching

[...]

Fedor V. Fomin¹, Dimitrios M. Thilikos²•Institutions (2)

University of Bergen¹, National and Kapodistrian University of Athens²

10 Jun 2008-Theoretical Computer Science

TL;DR: This annotated bibliography gives an elementary classification of problems and results related to graph searching and provides a source of bibliographical references on this field.

...read moreread less

362 citations

Proceedings Article•DOI•

The query-flow graph: model and applications

[...]

Paolo Boldi¹, Francesco Bonchi², Carlos Castillo², Debora Donato², Aristides Gionis², Sebastiano Vigna¹ - Show less +2 more•Institutions (2)

University of Milan¹, Yahoo!²

26 Oct 2008

TL;DR: This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users.

...read moreread less

Abstract: Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search engine results. Mining the wealth of information available in the query logs has many important applications including query-log analysis, user profiling and personalization, advertising, query recommendation, and more.In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same "search mission". Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path.The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world query-flow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.

...read moreread less

346 citations

Proceedings Article•DOI•

Mining significant graph patterns by leap search

[...]

Xifeng Yan¹, Hong Cheng², Jiawei Han², Philip S. Yu³•Institutions (3)

IBM¹, University of Illinois at Urbana–Champaign², University of Illinois at Chicago³

09 Jun 2008

TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

...read moreread less

Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

...read moreread less

331 citations

Proceedings Article•DOI•

TALE: A Tool for Approximate Large Graph Matching

[...]

Yuanyuan Tian¹, Jignesh M. Patel¹•Institutions (1)

University of Michigan¹

07 Apr 2008

TL;DR: A novel indexing method that incorporates graph structural information in a hybrid index structure that achieves high pruning power and the index size scales linearly with the database size is proposed.

...read moreread less

Abstract: Large graph datasets are common in many emerging database applications, and most notably in large-scale scientific applications. To fully exploit the wealth of information encoded in graphs, effective and efficient graph matching tools are critical. Due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact, graph matching is required. Furthermore, many modern applications need to query large graphs, each of which has hundreds to thousands of nodes and edges. This paper presents a novel technique for approximate matching of large graph queries. We propose a novel indexing method that incorporates graph structural information in a hybrid index structure. This indexing technique achieves high pruning power and the index size scales linearly with the database size. In addition, we propose an innovative matching paradigm to query large graphs. This technique distinguishes nodes by their importance in the graph structure. The matching algorithm first matches the important nodes of a query and then progressively extends these matches. Through experiments on several real datasets, this paper demonstrates the effectiveness and efficiency of the proposed method.

...read moreread less

307 citations

Proceedings Article•DOI•

A scalable pattern mining approach to web graph compression with communities

[...]

Gregory Buehrer¹, Kumar Chellapilla²•Institutions (2)

Ohio State University¹, Microsoft²

11 Feb 2008

TL;DR: This work presents a compression scheme for the web graph specifically designed to accommodate community queries and other random access algorithms on link servers, and uses a frequent pattern mining approach to extract meaningful connectivity formations.

...read moreread less

Abstract: A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random access algorithms on link servers. We use a frequent pattern mining approach to extract meaningful connectivity formations. Our Virtual Node Miner achieves graph compression without sacrificing random access by generating virtual nodes from frequent itemsets in vertex adjacency lists. The mining phase guarantees scalability by bounding the pattern mining complexity to O(E log E). We facilitate global mining, relaxing the requirement for the graph to be sorted by URL, enabling discovery for both inter-domain as well as intra-domain patterns. As a consequence, the approach allows incremental graph updates. Further, it not only facilitates but can also expedite graph computations such as PageRank and local random walks by implementing them directly on the compressed graph. We demonstrate the effectiveness of the proposed approach on several publicly available large web graph data sets. Experimental results indicate that the proposed algorithm achieves a 10- to 15-fold compression on most real word web graph data sets

...read moreread less

258 citations

Proceedings Article•DOI•

BrowseRank: letting web users vote for page importance

[...]

Yuting Liu¹, Bin Gao², Tie-Yan Liu², Ying Zhang³, Zhi-Ming Ma⁴, Shuyuan He⁵, Hang Li² - Show less +3 more•Institutions (5)

Beijing Jiaotong University¹, Microsoft², Nankai University³, Chinese Academy of Sciences⁴, Peking University⁵

20 Jul 2008

TL;DR: Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.

...read moreread less

Abstract: This paper proposes a new method for computing page importance, referred to as BrowseRank. The conventional approach to compute page importance is to exploit the link graph of the web and to build a model based on that graph. For instance, PageRank is such an algorithm, which employs a discrete-time Markov process as the model. Unfortunately, the link graph might be incomplete and inaccurate with respect to data for determining page importance, because links can be easily added and deleted by web content creators. In this paper, we propose computing page importance by using a 'user browsing graph' created from user behavior data. In this graph, vertices represent pages and directed edges represent transitions between pages in the users' web browsing history. Furthermore, the lengths of staying time spent on the pages by users are also included. The user browsing graph is more reliable than the link graph for inferring page importance. This paper further proposes using the continuous-time Markov process on the user browsing graph as a model and computing the stationary probability distribution of the process as page importance. An efficient algorithm for this computation has also been devised. In this way, we can leverage hundreds of millions of users' implicit voting on page importance. Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.

...read moreread less

200 citations

Proceedings Article•DOI•

Efficiently answering reachability queries on very large directed graphs

[...]

Ruoming Jin¹, Yang Xiang¹, Ning Ruan¹, Haixun Wang²•Institutions (2)

Kent State University¹, IBM²

09 Jun 2008

TL;DR: This paper introduces a novel graph structure, referred to as path-tree, to help labeling very large graphs, which is a spanning subgraph of G in a tree shape and demonstrates both analytically and empirically the effectiveness of the new approaches.

...read moreread less

Abstract: Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.

...read moreread less

Journal Article•DOI•

GrouseFlocks: Steerable Exploration of Graph Hierarchy Space

[...]

Daniel Archambault¹, Tamara Munzner¹, David Auber•Institutions (1)

University of British Columbia¹

01 Jul 2008-IEEE Transactions on Visualization and Computer Graphics

TL;DR: GrouseFlocks provides a simple set of operations so that users can create and modify their graph hierarchies based on selections, and provides feedback to the user within seconds, allowing interactive exploration of this graph hierarchy space.

...read moreread less

Abstract: Several previous systems allow users to interactively explore a large input graph through cuts of a superimposed hierarchy. This hierarchy is often created using clustering algorithms or topological features present in the graph. However, many graphs have domain-specific attributes associated with the nodes and edges, which could be used to create many possible hierarchies providing unique views of the input graph. GrouseFlocks is a system for the exploration of this graph hierarchy space. By allowing users to see several different possible hierarchies on the same graph, the system helps users investigate graph hierarchy space instead of a single fixed hierarchy. GrouseFlocks provides a simple set of operations so that users can create and modify their graph hierarchies based on selections. These selections can be made manually or based on patterns in the attribute data provided with the graph. It provides feedback to the user within seconds, allowing interactive exploration of this space.

...read moreread less

Proceedings Article•DOI•

Metropolis Algorithms for Representative Subgraph Sampling

[...]

C. Hubler¹, Hans-Peter Kriegel¹, Karsten M. Borgwardt², Zoubin Ghahramani²•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Cambridge²

15 Dec 2008

TL;DR: Novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph are presented, with 'Representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph.

...read moreread less

Abstract: While data mining in chemoinformatics studied graph data with dozens of nodes, systems biology and the Internet are now generating graph data with thousands and millions of nodes. Hence data mining faces the algorithmic challenge of coping with this significant increase in graph size: Classic algorithms for data analysis are often too expensive and too slow on large graphs. While one strategy to overcome this problem is to design novel efficient algorithms, the other is to 'reduce' the size of the large graph by sampling. This is the scope of this paper: We will present novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph, with 'representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph. In our experiments, we improve over the pioneering work of Leskovec and Faloutsos (KDD 2006), by producing representative subgraph samples that are both smaller and of higher quality than those produced by other methods from the literature.

...read moreread less

Book Chapter•DOI•

nSPARQL: A Navigational Language for RDF

[...]

Jorge Pérez¹, Marcelo Arenas¹, Claudio Gutierrez²•Institutions (2)

Pontifical Catholic University of Chile¹, University of Chile²

26 Oct 2008

TL;DR: This paper shows that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph, and studies the expressiveness of the combination of nested regular expressions and SPARQL operators.

...read moreread less

Abstract: Navigational features have been largely recognized as fundamental for graph database query languages. This fact has motivated several authors to propose RDF query languages with navigational capabilities. In particular, we have argued in a previous paper that nested regular expressions are appropriate to navigate RDF data, and we have proposed the nSPARQL query language for RDF, that uses nested regular expressions as building blocks. In this paper, we study some of the fundamental properties of nSPARQL concerning expressiveness and complexity of evaluation. Regarding expressiveness, we show that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph. We also show that nesting is necessary to obtain this last result, and we study the expressiveness of the combination of nested regular expressions and SPARQL operators. Regarding complexity of evaluation, we prove that the evaluation of a nested regular expression E over an RDF graph G can be computed in time O (|G |·|E |).

...read moreread less

Journal Article•DOI•

Keyword search on external memory data graphs

[...]

Bhavana Dalvi, Meghana Kshirsagar, Sundararajarao Sudarshan

01 Aug 2008

TL;DR: This paper proposes a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation.

...read moreread less

Abstract: Keyword search on graph structured data has attracted a lot of attention in recent years. Graphs are a natural "lowest common denominator" representation which can combine relational, XML and HTML data. Responses to keyword queries are usually modeled as trees that connect nodes matching the keywords.In this paper we address the problem of keyword search on graphs that may be significantly larger than memory. We propose a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation. We propose two alternative approaches which extend existing search algorithms to exploit multigranular graphs; both approaches attempt to minimize IO by directing search towards areas of the graph that are likely to give good results. We compare our algorithms with a virtual memory approach on several real data sets. Our experimental results show significant benefits in terms of reduction in IO due to our algorithms.

...read moreread less

Book•

A course on the Web graph

[...]

Anthony Bonato¹•Institutions (1)

Ryerson University¹

01 Jan 2008

TL;DR: A Course on the Web Graph is the first mathematically rigorous textbook discussing both models of the web graph and algorithms for searching the web, and is based on a graduate course taught at the AARMS 2006 Summer School at Dalhousie University.

...read moreread less

Abstract: A Course on the Web Graph provides a comprehensive introduction to state-of-the-art research on the applications of graph theory to real-world networks such as the web graph. It is the first mathematically rigorous textbook discussing both models of the web graph and algorithms for searching the web. After introducing key tools required for the study of web graph mathematics, an overview is given of the most widely studied models for the web graph. A discussion of popular web search algorithms, e.g. PageRank, is followed by additional topics, such as applications of infinite graph theory to the web graph, spectral properties of power law graphs, domination in the web graph, and the spread of viruses in networks. The book is based on a graduate course taught at the AARMS 2006 Summer School at Dalhousie University. As such it is self-contained and includes over 100 exercises. The reader of the book will gain a working knowledge of current research in graph theory and its modern applications. In addition, the reader will learn first-hand about models of the web, and the mathematics underlying modern search engines.

...read moreread less

Journal Article•DOI•

A Graph Based Approach Toward Network Forensics Analysis

[...]

Wei Wang¹, Thomas E. Daniels¹•Institutions (1)

Iowa State University¹

01 Oct 2008-ACM Transactions on Information and System Security

TL;DR: A novel graph-based approach toward network forensics analysis that facilitates evidence presentation and automated reasoning based on the evidence graph is developed and a hierarchical reasoning framework that consists of two levels is proposed.

...read moreread less

Abstract: In this article we develop a novel graph-based approach toward network forensics analysis. Central to our approach is the evidence graph model that facilitates evidence presentation and automated reasoning. Based on the evidence graph, we propose a hierarchical reasoning framework that consists of two levels. Local reasoning aims to infer the functional states of network entities from local observations. Global reasoning aims to identify important entities from the graph structure and extract groups of densely correlated participants in the attack scenario. This article also presents a framework for interactive hypothesis testing, which helps to identify the attacker's nonexplicit attack activities from secondary evidence. We developed a prototype system that implements the techniques discussed. Experimental results on various attack datasets demonstrate that our analysis mechanism achieves good coverage and accuracy in attack group and scenario extraction with less dependence on hard-coded expert knowledge.

...read moreread less

Proceedings Article•DOI•

[...]

Liping Wang¹, Qing Li¹, Na Li¹, Guozhu Dong², Yu Yang¹ - Show less +1 more•Institutions (2)

City University of Hong Kong¹, Wright State University²

21 Apr 2008

TL;DR: This paper investigates the underlying features of Chinese recipes, and based on workflow-like cooking procedures, it model recipes as graphs, and proposes a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching.

...read moreread less

Abstract: Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to use keywords (e.g. recipe names) to search recipes, since the names can be literally so abstract that they do not bear much, if any, information on the underlying ingredients or cooking methods. In this paper, we investigate the underlying features of Chinese recipes, and based on workflow-like cooking procedures, we model recipes as graphs. We further propose a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching. Benefiting from the characteristics of graphs, frequent common patterns can be mined from a cooking graph database. So in our prototype system called RecipeView, we extend the subgraph mining algorithm FSG to cooking graphs and combine it with our proposed similarity measurement, resulting in an approach that well caters for specific users' needs. Our initial experimental studies show that the filtering algorithm can efficiently prune unrelated cooking graphs without affecting the retrieval performance and the similarity measurement gets a relatively higher precision/recall against its counterparts

...read moreread less

Patent•

method and system for generating and displaying an interactive dynamic selective view of multiply connected objects

[...]

Robert J. Breeds¹, Philip R. Taunton¹•Institutions (1)

IBM¹

01 Oct 2008

TL;DR: In this article, a method and system for generating views of data on a user interface in a computing environment is presented, which involves: at a server, generating coordinate data for a graph representing multiply connected objects; transmitting the coordinate data to a client as lightweight object data; at the client, based on the lightweight objects data, rendering an interactive dynamic graph view of the multiply-connected objects on user interface; and synchronizing the list view and the graph view.

...read moreread less

Abstract: A method and system for generating views of data on a user interface in a computing environment, is provided. One implementation involves: at a server, generating coordinate data for a graph representing multiply connected objects; transmitting the coordinate data to a client as lightweight object data; at the client, based on the lightweight object data, rendering an interactive dynamic graph view of the multiply connected objects on a user interface; at the client, based on the lightweight object data, rendering an interactive dynamic list view of the multiply connected objects on a user interface; and synchronizing the list view and the graph view. The order of objects in the list view reflects the order of objects in the graph view per a breadth-first traversal starting at a root object.

...read moreread less

Proceedings Article•DOI•

Generating targeted queries for database testing

[...]

Chaitanya Mishra¹, Nick Koudas¹, Calisto Zuzarte²•Institutions (2)

University of Toronto¹, IBM²

09 Jun 2008

TL;DR: This paper investigates the problem of generating queries that satisfy cardinality constraints on intermediate subexpressions when executed on a given test database, and develops a practical algorithm which utilizes sampling and space pruning techniques to quickly generate test queries that have desired properties.

...read moreread less

Abstract: Tools for generating test queries for databases do not explicitly take into account the actual data in the database. As a consequence, such tools cannot guarantee suitable coverage of test cases commonly required for database testing. In this paper, we investigate the problem of generating queries that satisfy cardinality constraints on intermediate subexpressions when executed on a given test database. Such queries are required to test the performance of a database system under different operating conditions.We formally analyze this problem, quantify its difficulty and follow up this analysis with a description of a practical algorithm which utilizes sampling and space pruning techniques to quickly generate test queries that have desired properties. We present the results of an experimental evaluation of our approach as implemented in an open source data manager, demonstrating the utility of our proposal.

...read moreread less

Patent•

Graph search system and method for querying loosely integrated data

[...]

Andrey Balmin¹, Heasoo Hwang¹, Mir Hamid Pirahesh¹, Berthold Reinwald¹•Institutions (1)

IBM¹

22 Mar 2008

TL;DR: In this paper, the authors present a system, method and computer program product for executing a query on linked data sources and generate an instance graph expressing relationships between objects in the linked data source and receive a query including at least first and second search terms.

...read moreread less

Abstract: A system, method and computer program product for executing a query on linked data sources. Embodiments of the invention generate an instance graph expressing relationships between objects in the linked data sources and receive a query including at least first and second search terms. The first search term is then executed on the instance graph and a summary graph is generated using the results of the executing step. A second search term is then executed on the summary graph.

...read moreread less

Patent•

Constructing an inference graph for a network

[...]

Paramvir Bahl¹, Srikanth Kandula¹, Ranveer Chandra¹, David A. Maltz¹, Ming Zhang¹, Albert Greenberg¹ - Show less +2 more•Institutions (1)

Microsoft¹

28 Feb 2008

TL;DR: In this paper, a service dependency analyzer is used to determine dependencies among components of a network, the components including services and hardware components, and the inference graph reflects cross-layer components including the services and the hardware components.

...read moreread less

Abstract: Constructing an inference graph relates to the creation of a graph that reflects dependencies within a network. In an example embodiment, a method includes determining dependencies among components of a network and constructing an inference graph for the network responsive to the dependencies. The components of the network include services and hardware components, and the inference graph reflects cross-layer components including the services and the hardware components. In another example embodiment, a system includes a service dependency analyzer and an inference graph constructor. The service dependency analyzer is to determine dependencies among components of a network, the components including services and hardware components. The inference graph constructor is to construct an inference graph for the network responsive to the dependencies, the inference graph reflecting cross-layer components including the services and the hardware components.

...read moreread less

Patent•

General Object Graph for Web Users

[...]

Zvi Schreiber

09 Mar 2008

TL;DR: In this article, a general object graph is described for sharing structured data between users and between applications and for social networking between the users, an associated graphical user interface and application to a virtual file system with an associated authorization scheme.

...read moreread less

Abstract: A General Object Graph is described arranged for sharing structured data between users and between applications and for social networking between the users, an associated graphical user interface and application to a virtual file system with an associated authorization scheme. A distributed version of the General Object Graph is also presented known as a General Object Graph.

...read moreread less

Proceedings Article•DOI•

GRAPHITE: A Visual Query System for Large Graphs

[...]

Duen Horng Chau¹, Christos Faloutsos¹, Hanghang Tong², Jason Hong¹, Brian Gallagher³, Tina Eliassi-Rad³ - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, Arizona State University², Lawrence Livermore National Laboratory³

15 Dec 2008

TL;DR: Graphite is a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches, enabling it to scale well with the graph database size.

...read moreread less

Abstract: We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for "finding a CEO who has interacted with a Secretary, a Manager, and an Accountant, or a structure very similar to this". Graphite uses the G-Ray algorithm to run the query against a user-chosen data graph, gaining all of its benefits, namely its high speed, scalability, and its ability to find both exact and near matches. Therefore, for the example above, Graphite tolerates indirect paths between, say, the CEO and the Accountant, when no direct path exists. Graphite uses fast algorithms to estimate node proximities when finding matches, enabling it to scale well with the graph database size.We demonstrate Graphitepsilas usage and benefits using the DBLP author-publication graph, which consists of 356 K nodes and 1.9 M edges. A demo video of Graphite can be downloaded at http://www.cs.cmu.edu/~dchau/graphite/graphite.mov.

...read moreread less

Proceedings Article•DOI•

Graph-based semi-supervised learning with multi-label

[...]

Zheng-Jun Zha¹, Tao Mei², Jingdong Wang², Zengfu Wang¹, Xian-Sheng Hua² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

26 Aug 2008

TL;DR: This paper proposes a novel graph-based learning framework in the setting of semi-supervised learning with multi-label and applies it to video annotation and reports superior performance compared to key existing approaches over the TRECVID 2006 corpus.

...read moreread less

Abstract: Conventional graph-based semi-supervised learning methods predominantly focus on single label problem. However, it is more popular in real-world applications that an example is associated with multiple labels simultaneously. In this paper, we propose a novel graph-based learning framework in the setting of semi-supervised learning with multi-label. The proposed approach is characterized by simultaneously exploiting the inherent correlations among multiple labels and the label consistency over the graph. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the TRECVID 2006 corpus.

...read moreread less

Proceedings Article•DOI•

A graph method for keyword-based selection of the top-K databases

[...]

Quang Hieu Vu¹, Beng Chin Ooi¹, Dimitris Papadias², Anthony K. H. Tung¹•Institutions (2)

National University of Singapore¹, Hong Kong University of Science and Technology²

09 Jun 2008

TL;DR: G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query, is proposed, which outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.

...read moreread less

Abstract: While database management systems offer a comprehensive solution to data storage, they require deep knowledge of the schema, as well as the data manipulation language, in order to perform effective retrieval. Since these requirements pose a problem to lay or occasional users, several methods incorporate keyword search (KS) into relational databases. However, most of the existing techniques focus on querying a single DBMS. On the other hand, the proliferation of distributed databases in several conventional and emerging applications necessitates the support for keyword-based data sharing and querying over multiple DMBSs. In order to avoid the high cost of searching in numerous, potentially irrelevant, databases in such systems, we propose G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query. G-KSsummarizes each database by a keyword relationship graph, where nodes represent terms and edges describe relationships between them. Keyword relationship graphs are utilized for computing the similarity between each database and a KS query, so that, during query processing, only the most promising databases are searched. An extensive experimental evaluation demonstrates that G-KS outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.

...read moreread less

Patent•

Method And System For Generating A Hyperlink-Click Graph

[...]

Barbara Poblete¹, Aristides Gionis¹•Institutions (1)

Yahoo!¹

20 Nov 2008

TL;DR: In this paper, a method of ascribing scores to web documents and search queries is proposed, which generates a hyperlink-click graph by taking the union of the hyperlink and click graphs, and then takes a random walk on the graph, and associates the transition probabilities resulting from the random walk with scores for each of the documents and queries.

...read moreread less

Abstract: A method of ascribing scores to web documents and search queries generates a hyperlink-click graph by taking the union of the hyperlink and click graphs, takes a random walk on the hyperlink-click graph, and associates the transition probabilities resulting from the random walk with scores for each of the documents and search queries.

...read moreread less

Proceedings Article•

Query-URL bipartite based approach to personalized query recommendation

[...]

Lin Li¹, Zhenglu Yang¹, Ling Liu², Masaru Kitsuregawa¹•Institutions (2)

University of Tokyo¹, Georgia Institute of Technology²

13 Jul 2008

TL;DR: It is argued that utilizing the connectivity of a query-URL bipartite graph to recommend relevant queries can significantly improve the accuracy and effectiveness of the conventional query-term based query recommendation systems.

...read moreread less

Abstract: Query recommendation is considered an effective assistant in enhancing keyword based queries in search engines and Web search software. Conventional approach to query recommendation has been focused on query-term based analysis over the user access logs. In this paper, we argue that utilizing the connectivity of a query-URL bipartite graph to recommend relevant queries can significantly improve the accuracy and effectiveness of the conventional query-term based query recommendation systems. We refer to the Query-URL Bipartite based query reCommendation approach as QUBIC. The QUBIC approach has two unique characteristics. First, instead of operating on the original bipartite graph directly using biclique based approach or graph clustering, we extract an affinity graph of queries from the initial query-URL bipartite graph. The affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (distance) measure. By utilizing the query affinity graph, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering. We compare our proposed ranking algorithm with both naive ranking that uses the query-URL similarity measure directly, and the single-linkage based ranking method. In addition, we make it possible for users to interactively participate in the query recommendation process, to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users' information needs, allowing the lists of related queries to be changed from user to user and query to query, thus personalizing the query recommendation on demand. The experimental results from two query collections demonstrate the effectiveness and feasibility of our approach.

...read moreread less

Proceedings Article•DOI•

GPM: A graph pattern matching kernel with diffusion for chemical compound classification

[...]

Aaron Smalter¹, Jun Huan¹, Gerald H. Lushington¹•Institutions (1)

University of Kansas¹

08 Dec 2008

TL;DR: A novel technique called Graph Pattern Matching kernel (GPM) is demonstrated, which leverages existing frequent pattern discovery methods and explores their application to kernel classifiers (e.g. support vector machine) for graph classification.

...read moreread less

Abstract: Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogeneous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge . In this paper, we demonstrate a novel technique called Graph Pattern Matching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and explore their application to kernel classifiers (e.g. support vector machine) for graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the database and use a diffusion process to label nodes in the graphs. Finally the kernel is computed using a set matching algorithm. We performed experiments on 16 chemical structure data sets and have compared our methods to other major graph kernels. The experimental results demonstrate excellent performance of our method.

...read moreread less

Patent•

Including function call graphs (FCG) generated from trace analysis data within a searchable problem determination knowledge base

[...]

Harjindersingh G. Mistry¹, Mithun B. Virajpet¹•Institutions (1)

IBM¹

06 Nov 2008

TL;DR: A trace file providing details of an execution of a software application experiencing unexpected behavior can be identified as discussed by the authors, which can be converted into a graph structure (e.g., Function Call Graph), which details functions called during the execution, a calling relationship, and errors encountered during execution.

...read moreread less

Abstract: A trace file providing details of an execution of a software application experiencing unexpected behavior can be identified. The trace file can be converted into a graph structure (e.g., Function Call Graph), which details functions called during the execution, a calling relationship, and errors encountered during the execution. The converted graph structure can be programmatically matched against a set of stored graph structures to determine matched results. Each stored graph structure can correspond to unique record of a symptom database. Each unique record can be associated with a determined problem. The matched results can be provided as possible problems causing the unexpected behavior of the software application.

...read moreread less