scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2011"


Proceedings ArticleDOI
12 Dec 2011
TL;DR: These systems are compared by their data models, query possibilities, concurrency controls, partitioning and replication opportunities, and the underlying techniques of NoSQL databases considering their applicability for certain requirements.
Abstract: Motivated by requirements of Web 2.0 applications, a plethora of non-relational databases raised in recent years. Since it is very difficult to choose a suitable database for a specific use case, this paper evaluates the underlying techniques of NoSQL databases considering their applicability for certain requirements. These systems are compared by their data models, query possibilities, concurrency controls, partitioning and replication opportunities.

292 citations


Proceedings ArticleDOI
12 Jun 2011
TL;DR: This paper proposes to rank edges using a simple similarity-based heuristic that is efficiently compute by comparing the minhash signatures of the nodes incident to the edge, to preferentially retain the edges that are likely to be part of the same cluster.
Abstract: In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.

173 citations


Patent
30 Nov 2011
TL;DR: A distributed caching system for storing and serving information modeled as a graph that includes nodes and edges that define associations or relationships between nodes that the edges connect in the graph is described in this article.
Abstract: A distributed caching system for storing and serving information modeled as a graph that includes nodes and edges that define associations or relationships between nodes that the edges connect in the graph.

167 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: A class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types are proposed.
Abstract: It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using real-life data and synthetic data.

148 citations


Proceedings ArticleDOI
21 Aug 2011
TL;DR: This work designed and implemented an instance of GBASE, a scalable and general graph management and mining system that provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries.
Abstract: Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.

123 citations


Proceedings ArticleDOI
03 Oct 2011
TL;DR: This work presents a three-pronged approach to the link prediction task, along with several novel variations on established similarity metrics, and discusses the challenges of processing a graph with more than a million nodes.
Abstract: The growing ubiquity of social networks has spurred research in link prediction, which aims to predict new connections based on existing ones in the network. The 2011 IJCNN Social Network challenge asked participants to separate real edges from fake in a set of 8960 edges sampled from an anonymized, directed graph depicting a subset of relationships on Flickr. Our method incorporates 94 distinct graph features, used as input for classification with Random Forests. We present a three-pronged approach to the link prediction task, along with several novel variations on established similarity metrics. We discuss the challenges of processing a graph with more than a million nodes. We found that the best classification results were achieved through the combination of a large number of features that model different aspects of the graph structure. Our method achieved an area under the receiver-operator characteristic (ROC) curve of 0.9695, the 2nd best overall score in the competition and the best score which did not de-anonymize the dataset.

112 citations


Journal ArticleDOI
01 Aug 2011
TL;DR: This paper considers the problem of answering threshold-based probabilistic queries over a large uncertain graph database with the possible world semantics and adopts a filtering-and-verification strategy to speed up the search.
Abstract: Retrieving graphs containing a query graph from a large graph database is a key task in many graph-based applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, and inaccurate because of the way the data is produced. In this paper, we study subgraph queries over uncertain graphs. Specifically, we consider the problem of answering threshold-based probabilistic queries over a large uncertain graph database with the possible world semantics. We prove that problem is #P-complete, therefore, we adopt a filtering-and-verification strategy to speed up the search. In the filtering phase, we use a probabilistic inverted index, PIndex, based on subgraph features obtained by an optimal feature selection process. During the verification phase, we develop exact and bound algorithms to validate the remaining candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.

92 citations


Book ChapterDOI
29 May 2011
TL;DR: This paper proposes an approach for optimizing graph pattern matching by reinterpreting certain join tree structures as grouping operations which enables a greater degree of parallelism in join processing resulting in more "bushy" like query execution plans with fewer Map-Reduce cycles.
Abstract: Existing MapReduce systems support relational style join operators which translate multi-join query plans into severalMap-Reduce cycles. This leads to high I/O and communication costs due to the multiple data transfer steps between map and reduce phases. SPARQL graph pattern matching is dominated by join operations, and is unlikely to be efficiently processed using existing techniques. This cost is prohibitive for RDF graph pattern matching queries which typically involve several join operations. In this paper, we propose an approach for optimizing graph pattern matching by reinterpreting certain join tree structures as grouping operations. This enables a greater degree of parallelism in join processing resulting in more "bushy" like query execution plans with fewer Map-Reduce cycles. This approach requires that the intermediate results are managed as sets of groups of triples or TripleGroups. We therefore propose a data model and algebra - Nested TripleGroup Algebra for capturing and manipulating TripleGroups. The relationship with the traditional relational style algebra used in Apache Pig is discussed. A comparative performance evaluation of the traditional Pig approach and RAPID+ (Pig extended with NTGA) for graph pattern matching queries on the BSBM benchmark dataset is presented. Results show up to 60% performance improvement of our approach over traditional Pig for some tasks.

91 citations


Proceedings ArticleDOI
24 Oct 2011
TL;DR: This paper presents two improvements to existing landmark-based shortest path estimation methods that relate to the use of shortest-path trees (SPTs) and a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks.
Abstract: Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of approximate methods have been proposed, including several landmark-based methods that have been shown to scale up to very large graphs with acceptable accuracy. This paper presents two improvements to existing landmark-based shortest path estimation methods. The first improvement relates to the use of shortest-path trees (SPTs). Together with appropriate short-cutting heuristics, the use of SPTs allows to achieve higher accuracy with acceptable time and memory overhead. Furthermore, SPTs can be maintained incrementally under edge insertions and deletions, which allows for a fully-dynamic algorithm. The second improvement is a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks. The improved method is evaluated on the DBLP, Orkut, Twitter and Skype social networks.

87 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A novel Bi-relational Graph (BG) model is proposed that comprises both the data graph and the label graph as subgraphs, and connect them by an additional bipartite graph induced from label assignments, and is applied to automatic image annotation and semantic image retrieval tasks on four benchmark multi-label image data sets.
Abstract: Image annotation is usually formulated as a multi-label semi-supervised learning problem. Traditional graph-based methods only utilize the data (images) graph induced from image similarities, while ignore the label (semantic terms) graph induced from label correlations of a multi-label image data set. In this paper, we propose a novel Bi-relational Graph (BG) model that comprises both the data graph and the label graph as subgraphs, and connect them by an additional bipartite graph induced from label assignments. By considering each class and its labeled images as a semantic group, we perform random walk on the BG to produce group-to-vertex relevance, including class-to-image and class-to-class relevances. The former can be used to predict labels for unannotated images, while the latter are new class relationships, called as Causal Relationships (CR), which are asymmetric. CR is learned from input data and has better semantic meaning to enhance the label prediction for unannotated images. We apply the proposed approaches to automatic image annotation and semantic image retrieval tasks on four benchmark multi-label image data sets. The superior performance of our approaches compared to state-of-the-art multi-label classification methods demonstrate their effectiveness.

82 citations


Journal ArticleDOI
TL;DR: This article introduces a novel graph structure, referred to as path-tree, to help labeling very large graphs, and introduces a new compression scheme which groups vertices with similar labels together to further reduce the labeling size.
Abstract: Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposed for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real-world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this article, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches.

Proceedings ArticleDOI
08 Jun 2011
TL;DR: The Clause-Iteration algorithms form the basis of the scalable, SHARD graph-store built on the Hadoop implementation of MapReduce, which performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges.
Abstract: Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.

Journal ArticleDOI
01 Nov 2011
TL;DR: In this article, a new graph sketch method, gSketch, which combines well studied synopses for traditional data streams with a sketch partitioning technique, is proposed to estimate and optimize the responses to basic queries on graph streams.
Abstract: Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which edges of the underlying graph are received and updated sequentially in a form of a stream. It is often necessary and important to summarize the behavior of graph streams in order to enable effective query processing. However, the sheer size and dynamic nature of graph streams present an enormous challenge to existing graph management techniques. In this paper, we propose a new graph sketch method, gSketch, which combines well studied synopses for traditional data streams with a sketch partitioning technique, to estimate and optimize the responses to basic queries on graph streams. We consider two different scenarios for query estimation: (1) A graph stream sample is available; (2) Both a graph stream sample and a query workload sample are available. Algorithms for different scenarios are designed respectively by partitioning a global sketch to a group of localized sketches in order to optimize the query estimation accuracy. We perform extensive experimental studies on both real and synthetic data sets and demonstrate the power and robustness of gSketch in comparison with the state-of-the-art global sketch method.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work proposes to utilize the Tensor product graph (TPG) obtained by the tensor product of the original graph with itself, and is able to achieve the bull's eye retrieval score of 99.99% on MPEG-7 shape dataset, which is much higher than the state-of-the-art algorithms.
Abstract: As observed in several recent publications, improved retrieval performance is achieved when pairwise similarities between the query and the database objects are replaced with more global affinities that also consider the relation among the database objects. This is commonly achieved by propagating the similarity information in a weighted graph representing the database and query objects. Instead of propagating the similarity information on the original graph, we propose to utilize the tensor product graph (TPG) obtained by the tensor product of the original graph with itself. By virtue of this construction, not only local but also long range similarities among graph nodes are explicitly represented as higher order relations, making it possible to better reveal the intrinsic structure of the data manifold. In addition, we improve the local neighborhood structure of the original graph in a preprocessing stage. We illustrate the benefits of the proposed approach on shape and image ranking and retrieval tasks. We are able to achieve the bull's eye retrieval score of 99.99% on MPEG-7 shape dataset, which is much higher than the state-of-the-art algorithms.

Book ChapterDOI
23 Oct 2011
TL;DR: A multi-pivot approach to identify and query data in graph-based datasets, helping users connect key points of interest in the graph on the conceptual level, visually occluding the remainder parts of the graph, thus helping create a road-map for navigation is embodied in tool called Visor.
Abstract: The purpose of data browsers is to help users identify and query data effectively without being overwhelmed by large complex graphs of data. A proposed solution to identify and query data in graph-based datasets is Pivoting (or set-oriented browsing), a many-to-many graph browsing technique that allows users to navigate the graph by starting from a set of instances followed by navigation through common links. Relying solely on navigation, however, makes it difficult for users to find paths or even see if the element of interest is in the graph when the points of interest may be many vertices apart. Further challenges include finding paths which require combinations of forward and backward links in order to make the necessary connections which further adds to the complexity of pivoting. In order to mitigate the effects of these problems and enhance the strengths of pivoting we present a multi-pivot approach which we embodied in tool called Visor. Visor allows users to explore from multiple points in the graph, helping users connect key points of interest in the graph on the conceptual level, visually occluding the remainder parts of the graph, thus helping create a road-map for navigation. We carried out an user study to demonstrate the viability of our approach.

Proceedings ArticleDOI
13 Jun 2011
TL;DR: This paper introduces a new automata model for query answering with two modes of acceptance: one captures queries returning nodes, and the other queries returning paths, and introduces additional restrictions for tractability, and shows that some intractable cases can be naturally cast as instances of constraint satisfaction problem.
Abstract: Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood.Our goal is to study the basics of querying graph patterns. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. We then study standard graph queries on graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower complexity restrictions. We introduce a new automata model for query answering with two modes of acceptance: one captures queries returning nodes, and the other queries returning paths. We study properties of such automata, and the key computational tasks associated with them. Finally, we provide additional restrictions for tractability, and show that some intractable cases can be naturally cast as instances of constraint satisfaction problem.

Book
31 Aug 2011
TL;DR: This book discusses graphs for modeling complex structured and schemaless data from the Semantic Web, social networks, protein networks, chemical compounds, and multimedia databases and offers essential research for academics working in the interdisciplinary domains of databases, data mining, andimedia technology.
Abstract: Graph Data Management: Techniques and Applications is a central reference source for different data management techniques for graph data structures and their application. This book discusses graphs for modeling complex structured and schemaless data from the Semantic Web, social networks, protein networks, chemical compounds, and multimedia databases and offers essential research for academics working in the interdisciplinary domains of databases, data mining, and multimedia technology.

Proceedings ArticleDOI
21 Aug 2011
TL;DR: This paper defines a semi-supervised learning framework for ranking of nodes on a very large graph and derives within this framework an efficient algorithm called Semi-Supervised PageRank, which can outperform previous algorithms on several tasks.
Abstract: Graph ranking plays an important role in many applications, such as page ranking on web graphs and entity ranking on social networks. In applications, besides graph structure, rich information on nodes and edges and explicit or implicit human supervision are often available. In contrast, conventional algorithms (e.g., PageRank and HITS) compute ranking scores by only resorting to graph structure information. A natural question arises here, that is, how to effectively and efficiently leverage all the information to more accurately calculate graph ranking scores than the conventional algorithms, assuming that the graph is also very large. Previous work only partially tackled the problem, and the proposed solutions are also not satisfying. This paper addresses the problem and proposes a general framework as well as an efficient algorithm for graph ranking. Specifically, we define a semi-supervised learning framework for ranking of nodes on a very large graph and derive within our proposed framework an efficient algorithm called Semi-Supervised PageRank. In the algorithm, the objective function is defined based upon a Markov random walk on the graph. The transition probability and the reset probability of the Markov model are defined as parametric models based on features on nodes and edges. By minimizing the objective function, subject to a number of constraints derived from supervision information, we simultaneously learn the optimal parameters of the model and the optimal ranking scores of the nodes. Finally, we show that it is possible to make the algorithm efficient to handle a billion-node graph by taking advantage of the sparsity of the graph and implement it in the MapReduce logic. Experiments on real data from a commercial search engine show that the proposed algorithm can outperform previous algorithms on several tasks.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: This paper proposes a method that uses an index of the uncertain graph database to reduce the number of comparisons required for computing the expected support of each candidate pattern, and relies on the apriori property for enumerating candidate subgraph patterns efficiently.
Abstract: Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.

Journal ArticleDOI
TL;DR: A novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database using a general framework for modeling noisy distribution using a probability matrix and an efficient algorithm to identify approximate matched frequent subgraphs.
Abstract: In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.

Patent
30 Dec 2011
TL;DR: In this article, the authors describe a system and method that generally provides for creation of a distributed graph database, creation and deployment of nodes in a distributed database system, and integration of nodes into a set of distributed graph databases that include data nodes and edges.
Abstract: In a computer environment, a system and method is described that generally provides for creation of a distributed graph database, creation and deployment of nodes in a distributed graph database system, and integration of nodes into a set of distributed graph databases that include data nodes and edges that are: entities built using forms, relations, and relationships; immutable but evolvable through the addition of new data nodes or new edges joining the evolving data node to another data node; shareable and mergeable.

Proceedings ArticleDOI
11 Apr 2011
TL;DR: By using bitmap structures for graph representation it is possible to improve the performance of a graph database system, allowing for the efficient manipulation of very large graphs containing thousands of millions of nodes and edges.
Abstract: The amount of applications calling for efficient large graph management is dramatically increasing. Social network analysis, Internet or biocomputation are just three examples of such applications. In these cases, the interest focuses on the structural analysis of the relationships between different entities organized in huge networks or graph-like structures. Being able to efficiently handle such graphs becomes essential, placing graph database management systems in the eye of the storm. Among the different challenges posed by graph databases, finding an efficient way to represent and manipulate huge graphs that do not entirely fit in memory is still an unresolved problem. In this work, we present DEX, a high performance graph database management system based on bitmaps and other secondary structures. We show that by using bitmap structures for graph representation it is possible to improve the performance of a graph database system, allowing for the efficient manipulation of very large graphs containing thousands of millions of nodes and edges.

Patent
19 Sep 2011
TL;DR: In this paper, a system gathers information on important and influential people and uses an ontology to build a social graph, which is organized based on this social graph and provided to users as a service.
Abstract: A system gathers information on important and influential people and uses an ontology to build a social graph. The information is organized based on this social graph and provided to users as a service. The system uses ontology models to identify connectivity between entities (e.g., people, organizations, events, and things) in the social graph. Through its ontology, the system can determine, interpret, and represent the relationships of people that occur in the real world.

Patent
14 Oct 2011
TL;DR: In this article, the authors describe methods, systems, and computer program products for providing access to business network data, which includes identifying a logical graph from business network linked graph data to be transformed into a resource graph, the logical graph including at least two nodes and at least one edge connecting a pair of nodes.
Abstract: The present disclosure describes methods, systems, and computer program products for providing access to business network data. One method includes identifying a logical graph from business network linked graph data to be transformed into a resource graph, the logical graph including at least two nodes and at least one edge connecting a pair of nodes and defining a connection between the nodes. Each node is converted into a resource. A resource graph associated with the logical graph can be generated, where generation comprises, for each identified node, associating at least one attribute associated with the identified node as a resource attribute of the corresponding resource, adding at least one node connected to the identified node via an edge in the logical graph as a resource attribute of the corresponding resource, and dissolving at least one connection between the identified node and at least one other entity in the logical graph.

Book Chapter
01 Jan 2011
TL;DR: A dedicated data model and query language, SNQL, founded on previous research on graph databases and on the experience of SN researchers is introduced, allowing expressiveness for graph querying and node creation as required by SN, while keeping the complexity of query evaluation in NLOGSPACE.
Abstract: Social Network (SN) data has become ubiquitous, demanding advanced and exible means to represent, transform and query such data. In addition to the intrinsic challenges of querying graph data is the requirement that networks be restructured, and thus that new values be created. To address these, we introduce a dedicated data model and query language, SNQL, founded on previous research on graph databases and on the experience of SN researchers. Technically, it is based in GraphLog and second-order tuple generating dependencies, allowing expressiveness for graph querying and node creation as required by SN, while keeping the complexity of query evaluation in NLOGSPACE.

Patent
28 Apr 2011
TL;DR: In this paper, the authors present a system that facilitates the maintenance and execution of a software offering, which uses a graph database to facilitate the management of the software offering during operation.
Abstract: The disclosed embodiments provide a system that facilitates the maintenance and execution of a software offering. During operation, the system obtains model data associated with a multidimensional model of the software offering. Next, the system stores the model data in a graph database. Finally, the system uses the graph database to facilitate management of the software offering.

Journal ArticleDOI
01 Aug 2011
TL;DR: A graph querying system that achieves both fast indexing and efficient query processing, and the index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining.
Abstract: This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost.

Book ChapterDOI
29 May 2011
TL;DR: It is made the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance, in particular the Cray XMT.
Abstract: To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global sharedmemory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely in-memory.

Journal ArticleDOI
01 Jun 2011
TL;DR: This paper describes a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales.
Abstract: Graph conductance queries, also known as personalized PageRank and related to random walks with restarts, were originally proposed to assign a hyperlink-based prestige score to Web pages. More general forms of such queries are also very useful for ranking in entity-relation (ER) graphs used to represent relational, XML and hypertext data. Evaluation of PageRank usually involves a global eigen computation. If the graph is even moderately large, interactive response times may not be possible. Recently, the need for interactive PageRank evaluation has increased. The graph may be fully known only when the query is submitted. Browsing actions of the user may change some inputs to the PageRank computation dynamically. In this paper, we describe a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales. Our techniques--data and query statistics collection, index selection and materialization, and query-time index exploitation--have parallels in the extensive relational query optimization literature, but is applied to supporting novel graph data repositories. We report on experiments with five temporal snapshots of the CiteSeer ER graph having 74---702 thousand entity nodes, 0.17---1.16 million word nodes, 0.29---3.26 million edges between entities, and 3.29---32.8 million edges between words and entities. We also used two million actual queries from CiteSeer's logs. Queries run 3---4 orders of magnitude faster than whole-graph PageRank, the gap growing with graph size. Index size is smaller than a text index. Ranking accuracy is 94---98% with reference to whole-graph PageRank.

Patent
28 Feb 2011
TL;DR: In this article, the authors present an approach for creating, evolving and using a weighted semantic graph to manage and potentially identify certain information assets within an enterprise by monitoring users navigating through search results which provide a set of information assets responsive to a search query.
Abstract: Embodiments of the invention provide an approach for creating, evolving and using a weighted semantic graph to manage and potentially identify certain information assets within an enterprise. The semantic graph may be generated by monitoring users navigating through search results which provide a set of information assets responsive to a search query. By recording the navigation path taken by many users, relationships between information assets may be identified. Further, once generated, the semantic graph may be used to present users with in indication of related information assets as part of the search results. Further still, the semantic graph may also be used to identify information assert “hubs” as well as information assets that may provide low utility to individuals within the enterprise.