scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2008"


Proceedings ArticleDOI
09 Jun 2008
TL;DR: MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.
Abstract: Freebase is a practical, scalable tuple database used to structure general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Freebase currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP-based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

4,813 citations


Journal ArticleDOI
TL;DR: The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
Abstract: Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

1,669 citations


Book ChapterDOI
04 Dec 2008
TL;DR: A repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications is introduced, to make the different approaches in graph based machine learning better comparable.
Abstract: In recent years the use of graph based representation has gained popularity in pattern recognition and machine learning. As a matter of fact, object representation by means of graphs has a number of advantages over feature vectors. Therefore, various algorithms for graph based machine learning have been proposed in the literature. However, in contrast with the emerging interest in graph based representation, a lack of standardized graph data sets for benchmarking can be observed. Common practice is that researchers use their own data sets, and this behavior cumbers the objective evaluation of the proposed methods. In order to make the different approaches in graph based machine learning better comparable, the present paper aims at introducing a repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications.

484 citations


Proceedings ArticleDOI
09 Jun 2008
TL;DR: A graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs is presented and access methods of the selectionoperator are investigated.
Abstract: With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

433 citations


Journal ArticleDOI
TL;DR: This annotated bibliography gives an elementary classification of problems and results related to graph searching and provides a source of bibliographical references on this field.

362 citations


Proceedings ArticleDOI
26 Oct 2008
TL;DR: This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users.
Abstract: Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search engine results. Mining the wealth of information available in the query logs has many important applications including query-log analysis, user profiling and personalization, advertising, query recommendation, and more.In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same "search mission". Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path.The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world query-flow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.

346 citations


Proceedings ArticleDOI
09 Jun 2008
TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

331 citations


Proceedings ArticleDOI
07 Apr 2008
TL;DR: A novel indexing method that incorporates graph structural information in a hybrid index structure that achieves high pruning power and the index size scales linearly with the database size is proposed.
Abstract: Large graph datasets are common in many emerging database applications, and most notably in large-scale scientific applications. To fully exploit the wealth of information encoded in graphs, effective and efficient graph matching tools are critical. Due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact, graph matching is required. Furthermore, many modern applications need to query large graphs, each of which has hundreds to thousands of nodes and edges. This paper presents a novel technique for approximate matching of large graph queries. We propose a novel indexing method that incorporates graph structural information in a hybrid index structure. This indexing technique achieves high pruning power and the index size scales linearly with the database size. In addition, we propose an innovative matching paradigm to query large graphs. This technique distinguishes nodes by their importance in the graph structure. The matching algorithm first matches the important nodes of a query and then progressively extends these matches. Through experiments on several real datasets, this paper demonstrates the effectiveness and efficiency of the proposed method.

307 citations


Proceedings ArticleDOI
11 Feb 2008
TL;DR: This work presents a compression scheme for the web graph specifically designed to accommodate community queries and other random access algorithms on link servers, and uses a frequent pattern mining approach to extract meaningful connectivity formations.
Abstract: A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random access algorithms on link servers. We use a frequent pattern mining approach to extract meaningful connectivity formations. Our Virtual Node Miner achieves graph compression without sacrificing random access by generating virtual nodes from frequent itemsets in vertex adjacency lists. The mining phase guarantees scalability by bounding the pattern mining complexity to O(E log E). We facilitate global mining, relaxing the requirement for the graph to be sorted by URL, enabling discovery for both inter-domain as well as intra-domain patterns. As a consequence, the approach allows incremental graph updates. Further, it not only facilitates but can also expedite graph computations such as PageRank and local random walks by implementing them directly on the compressed graph. We demonstrate the effectiveness of the proposed approach on several publicly available large web graph data sets. Experimental results indicate that the proposed algorithm achieves a 10- to 15-fold compression on most real word web graph data sets

258 citations


Proceedings ArticleDOI
20 Jul 2008
TL;DR: Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.
Abstract: This paper proposes a new method for computing page importance, referred to as BrowseRank. The conventional approach to compute page importance is to exploit the link graph of the web and to build a model based on that graph. For instance, PageRank is such an algorithm, which employs a discrete-time Markov process as the model. Unfortunately, the link graph might be incomplete and inaccurate with respect to data for determining page importance, because links can be easily added and deleted by web content creators. In this paper, we propose computing page importance by using a 'user browsing graph' created from user behavior data. In this graph, vertices represent pages and directed edges represent transitions between pages in the users' web browsing history. Furthermore, the lengths of staying time spent on the pages by users are also included. The user browsing graph is more reliable than the link graph for inferring page importance. This paper further proposes using the continuous-time Markov process on the user browsing graph as a model and computing the stationary probability distribution of the process as page importance. An efficient algorithm for this computation has also been devised. In this way, we can leverage hundreds of millions of users' implicit voting on page importance. Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.

200 citations


Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper introduces a novel graph structure, referred to as path-tree, to help labeling very large graphs, which is a spanning subgraph of G in a tree shape and demonstrates both analytically and empirically the effectiveness of the new approaches.
Abstract: Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.

Journal ArticleDOI
TL;DR: GrouseFlocks provides a simple set of operations so that users can create and modify their graph hierarchies based on selections, and provides feedback to the user within seconds, allowing interactive exploration of this graph hierarchy space.
Abstract: Several previous systems allow users to interactively explore a large input graph through cuts of a superimposed hierarchy. This hierarchy is often created using clustering algorithms or topological features present in the graph. However, many graphs have domain-specific attributes associated with the nodes and edges, which could be used to create many possible hierarchies providing unique views of the input graph. GrouseFlocks is a system for the exploration of this graph hierarchy space. By allowing users to see several different possible hierarchies on the same graph, the system helps users investigate graph hierarchy space instead of a single fixed hierarchy. GrouseFlocks provides a simple set of operations so that users can create and modify their graph hierarchies based on selections. These selections can be made manually or based on patterns in the attribute data provided with the graph. It provides feedback to the user within seconds, allowing interactive exploration of this space.

Proceedings ArticleDOI
15 Dec 2008
TL;DR: Novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph are presented, with 'Representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph.
Abstract: While data mining in chemoinformatics studied graph data with dozens of nodes, systems biology and the Internet are now generating graph data with thousands and millions of nodes. Hence data mining faces the algorithmic challenge of coping with this significant increase in graph size: Classic algorithms for data analysis are often too expensive and too slow on large graphs. While one strategy to overcome this problem is to design novel efficient algorithms, the other is to 'reduce' the size of the large graph by sampling. This is the scope of this paper: We will present novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph, with 'representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph. In our experiments, we improve over the pioneering work of Leskovec and Faloutsos (KDD 2006), by producing representative subgraph samples that are both smaller and of higher quality than those produced by other methods from the literature.

Book ChapterDOI
26 Oct 2008
TL;DR: This paper shows that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph, and studies the expressiveness of the combination of nested regular expressions and SPARQL operators.
Abstract: Navigational features have been largely recognized as fundamental for graph database query languages. This fact has motivated several authors to propose RDF query languages with navigational capabilities. In particular, we have argued in a previous paper that nested regular expressions are appropriate to navigate RDF data, and we have proposed the nSPARQL query language for RDF, that uses nested regular expressions as building blocks. In this paper, we study some of the fundamental properties of nSPARQL concerning expressiveness and complexity of evaluation. Regarding expressiveness, we show that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph. We also show that nesting is necessary to obtain this last result, and we study the expressiveness of the combination of nested regular expressions and SPARQL operators. Regarding complexity of evaluation, we prove that the evaluation of a nested regular expression E over an RDF graph G can be computed in time O (|G |·|E |).

Journal ArticleDOI
01 Aug 2008
TL;DR: This paper proposes a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation.
Abstract: Keyword search on graph structured data has attracted a lot of attention in recent years. Graphs are a natural "lowest common denominator" representation which can combine relational, XML and HTML data. Responses to keyword queries are usually modeled as trees that connect nodes matching the keywords.In this paper we address the problem of keyword search on graphs that may be significantly larger than memory. We propose a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation. We propose two alternative approaches which extend existing search algorithms to exploit multigranular graphs; both approaches attempt to minimize IO by directing search towards areas of the graph that are likely to give good results. We compare our algorithms with a virtual memory approach on several real data sets. Our experimental results show significant benefits in terms of reduction in IO due to our algorithms.

Book
01 Jan 2008
TL;DR: A Course on the Web Graph is the first mathematically rigorous textbook discussing both models of the web graph and algorithms for searching the web, and is based on a graduate course taught at the AARMS 2006 Summer School at Dalhousie University.
Abstract: A Course on the Web Graph provides a comprehensive introduction to state-of-the-art research on the applications of graph theory to real-world networks such as the web graph. It is the first mathematically rigorous textbook discussing both models of the web graph and algorithms for searching the web. After introducing key tools required for the study of web graph mathematics, an overview is given of the most widely studied models for the web graph. A discussion of popular web search algorithms, e.g. PageRank, is followed by additional topics, such as applications of infinite graph theory to the web graph, spectral properties of power law graphs, domination in the web graph, and the spread of viruses in networks. The book is based on a graduate course taught at the AARMS 2006 Summer School at Dalhousie University. As such it is self-contained and includes over 100 exercises. The reader of the book will gain a working knowledge of current research in graph theory and its modern applications. In addition, the reader will learn first-hand about models of the web, and the mathematics underlying modern search engines.

Journal ArticleDOI
TL;DR: A novel graph-based approach toward network forensics analysis that facilitates evidence presentation and automated reasoning based on the evidence graph is developed and a hierarchical reasoning framework that consists of two levels is proposed.
Abstract: In this article we develop a novel graph-based approach toward network forensics analysis. Central to our approach is the evidence graph model that facilitates evidence presentation and automated reasoning. Based on the evidence graph, we propose a hierarchical reasoning framework that consists of two levels. Local reasoning aims to infer the functional states of network entities from local observations. Global reasoning aims to identify important entities from the graph structure and extract groups of densely correlated participants in the attack scenario. This article also presents a framework for interactive hypothesis testing, which helps to identify the attacker's nonexplicit attack activities from secondary evidence. We developed a prototype system that implements the techniques discussed. Experimental results on various attack datasets demonstrate that our analysis mechanism achieves good coverage and accuracy in attack group and scenario extraction with less dependence on hard-coded expert knowledge.

Proceedings ArticleDOI
21 Apr 2008
TL;DR: This paper investigates the underlying features of Chinese recipes, and based on workflow-like cooking procedures, it model recipes as graphs, and proposes a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching.
Abstract: Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to use keywords (e.g. recipe names) to search recipes, since the names can be literally so abstract that they do not bear much, if any, information on the underlying ingredients or cooking methods. In this paper, we investigate the underlying features of Chinese recipes, and based on workflow-like cooking procedures, we model recipes as graphs. We further propose a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching. Benefiting from the characteristics of graphs, frequent common patterns can be mined from a cooking graph database. So in our prototype system called RecipeView, we extend the subgraph mining algorithm FSG to cooking graphs and combine it with our proposed similarity measurement, resulting in an approach that well caters for specific users' needs. Our initial experimental studies show that the filtering algorithm can efficiently prune unrelated cooking graphs without affecting the retrieval performance and the similarity measurement gets a relatively higher precision/recall against its counterparts

Patent
Robert J. Breeds1, Philip R. Taunton1
01 Oct 2008
TL;DR: In this article, a method and system for generating views of data on a user interface in a computing environment is presented, which involves: at a server, generating coordinate data for a graph representing multiply connected objects; transmitting the coordinate data to a client as lightweight object data; at the client, based on the lightweight objects data, rendering an interactive dynamic graph view of the multiply-connected objects on user interface; and synchronizing the list view and the graph view.
Abstract: A method and system for generating views of data on a user interface in a computing environment, is provided. One implementation involves: at a server, generating coordinate data for a graph representing multiply connected objects; transmitting the coordinate data to a client as lightweight object data; at the client, based on the lightweight object data, rendering an interactive dynamic graph view of the multiply connected objects on a user interface; at the client, based on the lightweight object data, rendering an interactive dynamic list view of the multiply connected objects on a user interface; and synchronizing the list view and the graph view. The order of objects in the list view reflects the order of objects in the graph view per a breadth-first traversal starting at a root object.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper investigates the problem of generating queries that satisfy cardinality constraints on intermediate subexpressions when executed on a given test database, and develops a practical algorithm which utilizes sampling and space pruning techniques to quickly generate test queries that have desired properties.
Abstract: Tools for generating test queries for databases do not explicitly take into account the actual data in the database. As a consequence, such tools cannot guarantee suitable coverage of test cases commonly required for database testing. In this paper, we investigate the problem of generating queries that satisfy cardinality constraints on intermediate subexpressions when executed on a given test database. Such queries are required to test the performance of a database system under different operating conditions.We formally analyze this problem, quantify its difficulty and follow up this analysis with a description of a practical algorithm which utilizes sampling and space pruning techniques to quickly generate test queries that have desired properties. We present the results of an experimental evaluation of our approach as implemented in an open source data manager, demonstrating the utility of our proposal.

Patent
22 Mar 2008
TL;DR: In this paper, the authors present a system, method and computer program product for executing a query on linked data sources and generate an instance graph expressing relationships between objects in the linked data source and receive a query including at least first and second search terms.
Abstract: A system, method and computer program product for executing a query on linked data sources. Embodiments of the invention generate an instance graph expressing relationships between objects in the linked data sources and receive a query including at least first and second search terms. The first search term is then executed on the instance graph and a summary graph is generated using the results of the executing step. A second search term is then executed on the summary graph.

Patent
28 Feb 2008
TL;DR: In this paper, a service dependency analyzer is used to determine dependencies among components of a network, the components including services and hardware components, and the inference graph reflects cross-layer components including the services and the hardware components.
Abstract: Constructing an inference graph relates to the creation of a graph that reflects dependencies within a network. In an example embodiment, a method includes determining dependencies among components of a network and constructing an inference graph for the network responsive to the dependencies. The components of the network include services and hardware components, and the inference graph reflects cross-layer components including the services and the hardware components. In another example embodiment, a system includes a service dependency analyzer and an inference graph constructor. The service dependency analyzer is to determine dependencies among components of a network, the components including services and hardware components. The inference graph constructor is to construct an inference graph for the network responsive to the dependencies, the inference graph reflecting cross-layer components including the services and the hardware components.

Patent
09 Mar 2008
TL;DR: In this article, a general object graph is described for sharing structured data between users and between applications and for social networking between the users, an associated graphical user interface and application to a virtual file system with an associated authorization scheme.
Abstract: A General Object Graph is described arranged for sharing structured data between users and between applications and for social networking between the users, an associated graphical user interface and application to a virtual file system with an associated authorization scheme. A distributed version of the General Object Graph is also presented known as a General Object Graph.

Proceedings ArticleDOI
15 Dec 2008
TL;DR: Graphite is a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches, enabling it to scale well with the graph database size.
Abstract: We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for "finding a CEO who has interacted with a Secretary, a Manager, and an Accountant, or a structure very similar to this". Graphite uses the G-Ray algorithm to run the query against a user-chosen data graph, gaining all of its benefits, namely its high speed, scalability, and its ability to find both exact and near matches. Therefore, for the example above, Graphite tolerates indirect paths between, say, the CEO and the Accountant, when no direct path exists. Graphite uses fast algorithms to estimate node proximities when finding matches, enabling it to scale well with the graph database size.We demonstrate Graphitepsilas usage and benefits using the DBLP author-publication graph, which consists of 356 K nodes and 1.9 M edges. A demo video of Graphite can be downloaded at http://www.cs.cmu.edu/~dchau/graphite/graphite.mov.

Proceedings ArticleDOI
26 Aug 2008
TL;DR: This paper proposes a novel graph-based learning framework in the setting of semi-supervised learning with multi-label and applies it to video annotation and reports superior performance compared to key existing approaches over the TRECVID 2006 corpus.
Abstract: Conventional graph-based semi-supervised learning methods predominantly focus on single label problem. However, it is more popular in real-world applications that an example is associated with multiple labels simultaneously. In this paper, we propose a novel graph-based learning framework in the setting of semi-supervised learning with multi-label. The proposed approach is characterized by simultaneously exploiting the inherent correlations among multiple labels and the label consistency over the graph. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the TRECVID 2006 corpus.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query, is proposed, which outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.
Abstract: While database management systems offer a comprehensive solution to data storage, they require deep knowledge of the schema, as well as the data manipulation language, in order to perform effective retrieval. Since these requirements pose a problem to lay or occasional users, several methods incorporate keyword search (KS) into relational databases. However, most of the existing techniques focus on querying a single DBMS. On the other hand, the proliferation of distributed databases in several conventional and emerging applications necessitates the support for keyword-based data sharing and querying over multiple DMBSs. In order to avoid the high cost of searching in numerous, potentially irrelevant, databases in such systems, we propose G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query. G-KSsummarizes each database by a keyword relationship graph, where nodes represent terms and edges describe relationships between them. Keyword relationship graphs are utilized for computing the similarity between each database and a KS query, so that, during query processing, only the most promising databases are searched. An extensive experimental evaluation demonstrates that G-KS outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.

Patent
20 Nov 2008
TL;DR: In this paper, a method of ascribing scores to web documents and search queries is proposed, which generates a hyperlink-click graph by taking the union of the hyperlink and click graphs, and then takes a random walk on the graph, and associates the transition probabilities resulting from the random walk with scores for each of the documents and queries.
Abstract: A method of ascribing scores to web documents and search queries generates a hyperlink-click graph by taking the union of the hyperlink and click graphs, takes a random walk on the hyperlink-click graph, and associates the transition probabilities resulting from the random walk with scores for each of the documents and search queries.

Proceedings Article
13 Jul 2008
TL;DR: It is argued that utilizing the connectivity of a query-URL bipartite graph to recommend relevant queries can significantly improve the accuracy and effectiveness of the conventional query-term based query recommendation systems.
Abstract: Query recommendation is considered an effective assistant in enhancing keyword based queries in search engines and Web search software. Conventional approach to query recommendation has been focused on query-term based analysis over the user access logs. In this paper, we argue that utilizing the connectivity of a query-URL bipartite graph to recommend relevant queries can significantly improve the accuracy and effectiveness of the conventional query-term based query recommendation systems. We refer to the Query-URL Bipartite based query reCommendation approach as QUBIC. The QUBIC approach has two unique characteristics. First, instead of operating on the original bipartite graph directly using biclique based approach or graph clustering, we extract an affinity graph of queries from the initial query-URL bipartite graph. The affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (distance) measure. By utilizing the query affinity graph, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering. We compare our proposed ranking algorithm with both naive ranking that uses the query-URL similarity measure directly, and the single-linkage based ranking method. In addition, we make it possible for users to interactively participate in the query recommendation process, to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users' information needs, allowing the lists of related queries to be changed from user to user and query to query, thus personalizing the query recommendation on demand. The experimental results from two query collections demonstrate the effectiveness and feasibility of our approach.

Proceedings ArticleDOI
08 Dec 2008
TL;DR: A novel technique called Graph Pattern Matching kernel (GPM) is demonstrated, which leverages existing frequent pattern discovery methods and explores their application to kernel classifiers (e.g. support vector machine) for graph classification.
Abstract: Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogeneous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge . In this paper, we demonstrate a novel technique called Graph Pattern Matching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and explore their application to kernel classifiers (e.g. support vector machine) for graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the database and use a diffusion process to label nodes in the graphs. Finally the kernel is computed using a set matching algorithm. We performed experiments on 16 chemical structure data sets and have compared our methods to other major graph kernels. The experimental results demonstrate excellent performance of our method.

Patent
06 Nov 2008
TL;DR: A trace file providing details of an execution of a software application experiencing unexpected behavior can be identified as discussed by the authors, which can be converted into a graph structure (e.g., Function Call Graph), which details functions called during the execution, a calling relationship, and errors encountered during execution.
Abstract: A trace file providing details of an execution of a software application experiencing unexpected behavior can be identified. The trace file can be converted into a graph structure (e.g., Function Call Graph), which details functions called during the execution, a calling relationship, and errors encountered during the execution. The converted graph structure can be programmatically matched against a set of stored graph structures to determine matched results. Each stored graph structure can correspond to unique record of a symptom database. Each unique record can be associated with a determined problem. The matched results can be provided as possible problems causing the unexpected behavior of the software application.