Showing papers on "Graph database published in 2016"

PDF

Open Access

Posted Content•

Semi-Supervised Classification with Graph Convolutional Networks

[...]

Thomas Kipf¹, Max Welling¹•Institutions (1)

09 Sep 2016-arXiv: Learning

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.

...read moreread less

Abstract: We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.

...read moreread less

15,696 citations

Book Chapter•DOI•

Semantic Object Parsing with Graph LSTM

[...]

Xiaodan Liang¹, Xiaohui Shen², Jiashi Feng³, Liang Lin¹, Shuicheng Yan³ - Show less +1 more•Institutions (3)

Sun Yat-sen University¹, Adobe Systems², National University of Singapore³

08 Oct 2016

TL;DR: Wang et al. as mentioned in this paper proposed the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTMs from sequential data or multi-dimensional data to general graph-structured data.

...read moreread less

Abstract: By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each arbitrary-shaped superpixel as a semantically consistent node, and adaptively construct an undirected graph for each image, where the spatial relations of the superpixels are naturally used as edges. Constructed on such an adaptive graph topology, the Graph LSTM is more naturally aligned with the visual patterns in the image (e.g., object boundaries or appearance similarities) and provides a more economical information propagation route. Furthermore, for each optimization step over Graph LSTM, we propose to use a confidence-driven scheme to update the hidden and memory states of nodes progressively till all nodes are updated. In addition, for each node, the forgets gates are adaptively learned to capture different degrees of semantic correlation with neighboring nodes. Comprehensive evaluations on four diverse semantic object parsing datasets well demonstrate the significant superiority of our Graph LSTM over other state-of-the-art solutions.

...read moreread less

312 citations

Proceedings Article•DOI•

Graphicionado: a high-performance and energy-efficient accelerator for graph analytics

[...]

Tae Jun Ham¹, Lisa Wu², Narayanan Sundaram³, Nadathur Satish³, Margaret Martonosi¹ - Show less +1 more•Institutions (3)

Princeton University¹, University of California, Berkeley², Intel³

15 Oct 2016

TL;DR: Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks, for high-performance, energy-efficient processing of graph analytics workloads.

...read moreread less

Abstract: Graphs are one of the key data structures for many real-world computing applications and the importance of graph analytics is ever-growing. While existing software graph processing frameworks improve programmability of graph analytics, underlying general purpose processors still limit the performance and energy efficiency of graph analytics. We architect a domain-specific accelerator, Graphicionado, for high-performance, energy-efficient processing of graph analytics workloads. For efficient graph analytics processing, Graphicionado exploits not only data structure-centric datapath specialization, but also memory subsystem specialization, all the while taking advantage of the parallelism inherent in this domain. Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks. This paper describes Graphicionado pipeline design choices in detail and gives insights on how Graphicionado combats application execution inefficiencies on general-purpose CPUs. Our results show that Graphicionado achieves a 1.76 − 6.54x speedup while consuming 50 − 100x less energy compared to a state-of-the-art software graph analytics processing framework executing 32 threads on a 16-core Haswell Xeon processor.

...read moreread less

255 citations

Posted Content•

Foundations of Modern Query Languages for Graph Databases

[...]

Renzo Angles¹, Marcelo Arenas², Pablo Barceló³, Aidan Hogan³, Juan L. Reutter², Domagoj Vrgoč² - Show less +2 more•Institutions (3)

University of Talca¹, Pontifical Catholic University of Chile², University of Chile³

20 Oct 2016-arXiv: Databases

TL;DR: The importance of formalisation for graph query languages is discussed, with a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

...read moreread less

Abstract: We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

...read moreread less

213 citations

Proceedings Article•

Text-enhanced representation learning for knowledge graph

[...]

Zhigang Wang¹, Juanzi Li¹•Institutions (1)

Tsinghua University¹

09 Jul 2016

TL;DR: The rich textual context information in a text corpus is incorporated to expand the semantic structure of the knowledge graph and each relation is enabled to own different representations for different head and tail entities to better handle 1-to-N, N- to-1 and N-To-N relations.

...read moreread less

Abstract: Learning the representations of a knowledge graph has attracted significant research interest in the field of intelligent Web. By regarding each relation as one translation from head entity to tail entity, translation-based methods including TransE, TransH and TransR are simple, effective and achieving the state-of-the-art performance. However, they still suffer the following issues: (i) low performance when modeling 1-to-N, N-to-1 and N-to-N relations. (ii) limited performance due to the structure sparseness of the knowledge graph. In this paper, we propose a novel knowledge graph representation learning method by taking advantage of the rich context information in a text corpus. The rich textual context information is incorporated to expand the semantic structure of the knowledge graph and each relation is enabled to own different representations for different head and tail entities to better handle 1-to-N, N-to-1 and N-to-N relations. Experiments on multiple benchmark datasets show that our proposed method successfully addresses the above issues and significantly outperforms the state-of-the-art methods.

...read moreread less

191 citations

Proceedings Article•DOI•

Speedup Graph Processing by Graph Ordering

[...]

Hao Wei¹, Jeffrey Xu Yu¹, Can Lu¹, Xuemin Lin²•Institutions (2)

The Chinese University of Hong Kong¹, University of New South Wales²

26 Jun 2016

TL;DR: A general approach to speed up CPU computing for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms and proposes a new algorithm to reduce the time complexity and improve the efficiency with new optimization techniques based on a new data structure.

...read moreread less

Abstract: The CPU cache performance is one of the key issues to efficiency in database systems. It is reported that cache miss latency takes a half of the execution time in database systems. To improve the CPU cache performance, there are studies to support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on CPU speedup for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms. The approaches dealing with trees are not applicable to graphs which are complex in nature. In this paper, we explore a general approach to speed up CPU computing, in order to further enhance the efficiency of the graph algorithms without changing the graph algorithms (implementations) and the data structures used. That is, we aim at designing a general solution that is not for a specific graph algorithm, neither for a specific data structure. The approach studied in this work is graph ordering, which is to find the optimal permutation among all nodes in a given graph by keeping nodes that will be frequently accessed together locally, to minimize the CPU cache miss ratio. We prove the graph ordering problem is NP-hard, and give a basic algorithm with a bounded approximation. To improve the time complexity of the basic algorithm, we further propose a new algorithm to reduce the time complexity and improve the efficiency with new optimization techniques based on a new data structure. We conducted extensive experiments to evaluate our approach in comparison with other 9 possible graph orderings (such as the one obtained by METIS) using 8 large real graphs and 9 representative graph algorithms. We confirm that our approach can achieve high performance by reducing the CPU cache miss ratios.

...read moreread less

131 citations

Journal Article•DOI•

Querying Graphs with Data

[...]

Leonid Libkin¹, Wim Martens², Domagoj Vrgoč•Institutions (2)

University of Edinburgh¹, University of Bayreuth²

20 Mar 2016-Journal of the ACM

TL;DR: A family of languages that enable combination of data and topology querying for graph databases are presented, and it is shown that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

...read moreread less

Abstract: Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it. In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

...read moreread less

101 citations

Journal Article•DOI•

Representing and querying disease networks using graph databases

[...]

Artem Lysenko¹, Irina A. Roznovăţ², Mansoor Saqi², Alexander Mazein², Christopher J. Rawlings¹, Charles Auffray² - Show less +2 more•Institutions (2)

Rothamsted Research¹, Institute for Systems Biology²

25 Jul 2016-Biodata Mining

TL;DR: This study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

...read moreread less

Abstract: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

...read moreread less

90 citations

Proceedings Article•DOI•

GraphFrames: an integrated API for mixing graph and relational queries

[...]

Ankur Dave¹, Alekh Jindal², Li Erran Li³, Reynold Xin, Joseph E. Gonzalez¹, Matei Zaharia⁴ - Show less +2 more•Institutions (4)

University of California, Berkeley¹, Microsoft², Uber ³, Massachusetts Institute of Technology⁴

24 Jun 2016

TL;DR: GraphFrames is presented, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them, while enabling optimizations across workflow steps that cannot occur in current systems.

...read moreread less

Abstract: Graph data is prevalent in many domains, but it has usually required specialized engines to analyze. This design is onerous for users and precludes optimization across complete workflows. We present GraphFrames, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them. GraphFrames generalize the ideas in previous graph-on-RDBMS systems, such as GraphX and Vertexica, by letting the system materialize multiple views of the graph (not just the specific triplet views in these systems) and executing both iterative algorithms and pattern matching using joins. To make applications easy to write, GraphFrames provide a concise, declarative API based on the "data frame" concept in R that can be used for both interactive queries and standalone programs. Under this API, GraphFrames use a graph-aware join optimization algorithm across the whole computation that can select from the available views.We implement GraphFrames over Spark SQL, enabling parallel execution on Spark and integration with custom code. We find that GraphFrames make it easy to express end-to-end workflows and match or exceed the performance of standalone tools, while enabling optimizations across workflow steps that cannot occur in current systems. In addition, we show that GraphFrames' view abstraction makes it easy to further speed up interactive queries by registering the appropriate view, and that the combination of graph and relational data allows for other optimizations, such as attribute-aware partitioning.

...read moreread less

90 citations

Proceedings Article•DOI•

Graph Stream Summarization: From Big Bang to Big Crunch

[...]

Nan Tang¹, Qing Chen¹, Prasenjit Mitra¹•Institutions (1)

Qatar Computing Research Institute¹

14 Jun 2016

TL;DR: TCM is presented, a novel generalized graph stream summary that can effectively and efficiently support analytics over graph streams, which demonstrates its potential to start a new line of research and applications in graph stream management.

...read moreread less

Abstract: A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash- or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel generalized graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams, which demonstrates its potential to start a new line of research and applications in graph stream management.

...read moreread less

86 citations

Proceedings Article•DOI•

Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking

[...]

Rong Quan, Junwei Han, Dingwen Zhang, Feiping Nie

27 Jun 2016

TL;DR: A novel two-stage co-segmentation framework is proposed, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately and a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-Segment the common objects.

...read moreread less

Abstract: Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years. Although a number of approaches have been proposed to address this problem, many of them are designed with the misleading assumption, unscalable prior, or low flexibility and thus still suffer from certain limitations, which reduces their capability in the real-world scenarios. To alleviate these limitations, we propose a novel two-stage co-segmentation framework, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately. Then a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-segment the common objects. Experiments on three image datasets demonstrate that our method outperforms other state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Time-evolving graph processing at scale

[...]

Anand Padmanabha Iyer¹, Li Erran Li², Tathagata Das, Ion Stoica¹•Institutions (2)

University of California, Berkeley¹, Uber ²

24 Jun 2016

TL;DR: GraphTau is introduced, a time-evolving graph processing framework built on top of Apache Spark, a widely used distributed dataflow system that achieves high performance and fault tolerant graph stream processing via a number of optimizations.

...read moreread less

Abstract: Time-evolving graph-structured big data arises naturally in many application domains such as social networks and communication networks. However, existing graph processing systems lack support for efficient computations on dynamic graphs.In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph snapshots, and (2) a small set of operators that manipulate such streams of snapshots. We then introduce GraphTau, a time-evolving graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphTau quickly builds fault-tolerant graph snapshots as each small batch of new data arrives. GraphTau achieves high performance and fault tolerant graph stream processing via a number of optimizations. GraphTau also unifies data streaming and graph streaming processing. Our preliminary evaluations on two representative datasets show promising results. Besides performance benefit, GraphTau API relieves programmers from handling graph snapshot generation, windowing operators and sophisticated differential computation mechanisms.

...read moreread less

Journal Article•DOI•

GraphJet: real-time content recommendations at twitter

[...]

Aneesh Sharma¹, Jerry Jiang¹, Praveen Bommannavar¹, Brian Larson¹, Jimmy Lin² - Show less +1 more•Institutions (2)

Twitter¹, University of Waterloo²

01 Sep 2016

TL;DR: This paper presents GraphJet, an in-memory graph processing engine that maintains a real-time bipartite interaction graph between users and tweets and organizes the interaction graph into temporally-partitioned index segments that hold adjacency lists.

...read moreread less

Abstract: This paper presents GraphJet, a new graph-based system for generating content recommendations at Twitter. As motivation, we trace the evolution of our formulation and approach to the graph recommendation problem, embodied in successive generations of systems. Two trends can be identified: supplementing batch with real-time processing and a broadening of the scope of recommendations from users to content. Both of these trends come together in Graph-Jet, an in-memory graph processing engine that maintains a real-time bipartite interaction graph between users and tweets. The storage engine implements a simple API, but one that is sufficiently expressive to support a range of recommendation algorithms based on random walks that we have refined over the years. Similar to Cassovary, a previous graph recommendation engine developed at Twitter, GraphJet assumes that the entire graph can be held in memory on a single server. The system organizes the interaction graph into temporally-partitioned index segments that hold adjacency lists. GraphJet is able to support rapid ingestion of edges while concurrently serving lookup queries through a combination of compact edge encoding and a dynamic memory allocation scheme that exploits power-law characteristics of the graph. Each GraphJet server ingests up to one million graph edges per second, and in steady state, computes up to 500 recommendations per second, which translates into several million edge read operations per second.

...read moreread less

Journal Article•DOI•

A distributed approach for graph mining in massive networks

[...]

Nilothpal Talukder¹, Mohammed J. Zaki¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Sep 2016-Data Mining and Knowledge Discovery

TL;DR: This work proposes a novel distributed algorithm, DistGraph, which is the first approach demonstrated to scale to graphs with over a billion vertices and edges, and uses a set of optimizations and efficient collective communication operations to minimize information exchange.

...read moreread less

Abstract: We propose a novel distributed algorithm for mining frequent subgraphs from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory of any individual compute node. The input graph thus has to be partitioned among the nodes, which can lead to potential false negatives. Furthermore, for scalable performance it is crucial to minimize the communication among the compute nodes. Our algorithm, DistGraph, ensures that there are no false negatives, and uses a set of optimizations and efficient collective communication operations to minimize information exchange. To our knowledge DistGraph is the first approach demonstrated to scale to graphs with over a billion vertices and edges. Scalability results on up to 2048 IBM Blue Gene/Q compute nodes, with 16 cores each, show very good speedup.

...read moreread less

Book Chapter•DOI•

UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases

[...]

Gwendal Daniel¹, Gerson Sunyé¹, Jordi Cabot•Institutions (1)

French Institute for Research in Computer Science and Automation¹

14 Nov 2016

TL;DR: This article describes a mapping from UML/OCL conceptual schemas to Blueprints, an abstraction layer on top of a variety of graph databases, and Gremlin, a graph traversal language, via an intermediate Graph metamodel.

...read moreread less

Abstract: The need to store and manipulate large volume of (unstructured) data has led to the development of several NoSQL databases for better scalability. Graph databases are a particular kind of NoSQL databases that have proven their efficiency to store and query highly interconnected data, and have become a promising solution for multiple applications. While the mapping of conceptual schemas to relational databases is a well-studied field of research, there are only few solutions that target conceptual modeling for NoSQL databases and even less focusing on graph databases. This is specially true when dealing with the mapping of business rules and constraints in the conceptual schema. In this article we describe a mapping from UML/OCL conceptual schemas to Blueprints, an abstraction layer on top of a variety of graph databases, and Gremlin, a graph traversal language, via an intermediate Graph metamodel. Tool support is fully available.

...read moreread less

Journal Article•DOI•

Region-Based Retrieval of Remote Sensing Images Using an Unsupervised Graph-Theoretic Approach

[...]

Bindita Chaudhuri¹, Begum Demir², Lorenzo Bruzzone², Subhasis Chaudhuri¹•Institutions (2)

Indian Institute of Technology Bombay¹, University of Trento²

02 Jun 2016-IEEE Geoscience and Remote Sensing Letters

TL;DR: Experiments carried out on an archive of aerial images point out that the proposed approach significantly improves the retrieval performance compared to the state-of-the-art unsupervised RS image retrieval methods.

...read moreread less

Abstract: This letter introduces a novel unsupervised graph-theoretic approach in the framework of region-based retrieval of remote sensing (RS) images. The proposed approach is characterized by two main steps: 1) modeling each image by a graph, which provides region-based image representation combining both local information and related spatial organization, and 2) retrieving the images in the archive that are most similar to the query image by evaluating graph-based similarities. In the first step, each image is initially segmented into distinct regions and then modeled by an attributed relational graph, where nodes and edges represent region characteristics and their spatial relationships, respectively. In the second step, a novel inexact graph matching strategy, which jointly exploits a subgraph isomorphism algorithm and a spectral graph embedding technique, is applied to match corresponding graphs and to retrieve images in the order of graph similarity. Experiments carried out on an archive of aerial images point out that the proposed approach significantly improves the retrieval performance compared to the state-of-the-art unsupervised RS image retrieval methods.

...read moreread less

Proceedings Article•DOI•

Fast top-k search in knowledge graphs

[...]

Shengqi Yang¹, Fangqiu Han¹, Yinghui Wu², Xifeng Yan¹•Institutions (2)

University of California, Santa Barbara¹, Washington State University²

16 May 2016

TL;DR: This work proposes STAR, a top-k knowledge graph search framework that has two components: a fast top-K algorithm for star queries, and an assembling algorithm for general graph queries that uses star query as a building block and iteratively sweeps the star match lists with a dynamically adjusted bound.

...read moreread less

Abstract: Given a graph query Q posed on a knowledge graph G, top-k graph querying is to find k matches in G with the highest ranking score according to a ranking function. Fast top-k search in knowledge graphs is challenging as both graph traversal and similarity search are expensive. Conventional top-k graph search is typically based on threshold algorithm (TA), which can no long fit the demand in the new setting. This work proposes STAR, a top-k knowledge graph search framework. It has two components: (a) a fast top-k algorithm for star queries, and (b) an assembling algorithm for general graph queries. The assembling algorithm uses star query as a building block and iteratively sweeps the star match lists with a dynamically adjusted bound. For top-k star graph query where an edge can be matched to a path with bounded length d, we develop a message passing algorithm, achieving time complexity O(d2|E| + md) and space complexity linear to d|V| (assuming the size of Q and k is bounded by a constant), where m is the maximum node degree in G. STAR can further be leveraged to answer general graph queries by decomposing a query to multiple star queries and joining their results later. Learning-based techniques to optimize query decomposition are also developed. We experimentally verify that STAR is 5–10 times faster than the state-of-the-art TA-style graph search algorithm, and 10–100 times faster than a belief propagation approach.

...read moreread less

Journal Article•DOI•

Towards a biodiversity knowledge graph

[...]

Roderic D. M. Page

04 Jul 2016-Research Ideas and Outcomes

TL;DR: This article explores the "biodiversity knowledge graph" as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, and collections, and sketches a set of services and tools needed in order to construct the graph.

...read moreread less

Abstract: One way to think about "core" biodiversity data is as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, and collections that form the "biodiversity knowledge graph". Many questions in biodiversity informatics can be framed as paths in this graph. This article explores this futher, and sketches a set of services and tools we would need in order to construct the graph.

...read moreread less

Proceedings Article•DOI•

Scalable Pattern Matching over Compressed Graphs via Dedensification

[...]

Antonio Maccioni¹, Daniel J. Abadi²•Institutions (2)

Roma Tre University¹, Yale University²

13 Aug 2016

TL;DR: This paper presents a dedensification technique that losslessly compresses the neighborhood around high-degree nodes, and introduces a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data.

...read moreread less

Abstract: One of the most common operations on graph databases is graph pattern matching (eg, graph isomorphism and more general types of "subgraph pattern matching") In fact, in some graph query languages every single query is expressed as a graph matching operation Consequently, there has been a significant amount of research effort in optimizing graph matching operations in graph database systems As graph databases have scaled in recent years, so too has recent work on scaling graph matching operations However, the performance of recent proposals for scaling graph pattern matching is limited by the presence of high-degree nodes These high-degree nodes result in an explosion of intermediate result sizes during query execution, and therefore significant performance bottlenecks In this paper we present a dedensification technique that losslessly compresses the neighborhood around high-degree nodes Furthermore, we introduce a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data For pattern matching operations, we show how this technique can be implemented as a layer above existing graph database systems, so that the end-user can benefit from this technique without requiring modifications to the core graph database engine code Our technique reduces the size of the intermediate result sets during query processing, and thereby improves query performance

...read moreread less

Book Chapter•DOI•

Querying Wikidata: Comparing SPARQL, Relational and Graph Databases

[...]

Daniel Hernández¹, Aidan Hogan¹, Cristian Riveros², Carlos del Valle Rojas², Enzo Zerega² - Show less +1 more•Institutions (2)

University of Chile¹, Pontifical Catholic University of Chile²

17 Oct 2016

TL;DR: This paper experimentally compares the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base, which can be conceptualised as a directed edge-labelled graph where edges can be annotated with meta-information called qualifiers.

...read moreread less

Abstract: In this paper, we experimentally compare the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base, which can be conceptualised as a directed edge-labelled graph where edges can be annotated with meta-information called qualifiers. We take two popular SPARQL databases (Virtuoso, Blazegraph), a popular relational database (PostgreSQL), and a popular graph database (Neo4J) for comparison and discuss various options as to how Wikidata can be represented in the models of each engine. We design a set of experiments to test the relative query performance of these representations in the context of their respective engines. We first execute a large set of atomic lookups to establish a baseline performance for each test setting, and subsequently perform experiments on instances of more complex graph patterns based on real-world examples. We conclude with a summary of the strengths and limitations of the engines observed.

...read moreread less

Proceedings Article•DOI•

G-store: high-performance graph store for trillion-edge processing

[...]

Pradeep Kumar¹, H. Howie Huang¹•Institutions (1)

George Washington University¹

13 Nov 2016

TL;DR: G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system and employing a novel slide-cache-rewind strategy to pipeline graph I/O and computing.

...read moreread less

Abstract: High-performance graph processing brings great benefits to a wide range of scientific applications, e.g., biology networks, recommendation systems, and social networks, where such graphs have grown to terabytes of data with billions of vertices and trillions of edges. Subsequently, storage performance plays a critical role in designing a high-performance computer system for graph analytics. In this paper, we present G-Store, a new graph store that incorporates three techniques to accelerate the I/O and computation of graph algorithms. First, G-Store develops a space-efficient tile format for graph data, which takes advantage of the symmetry present in graphs as well as a new smallest number of bits representation. Second, G-Store utilizes tile-based physical grouping on disks so that multi-core CPUs can achieve high cache and memory performance and fully utilize the throughput from an array of solid-state disks. Third, G-Store employs a novel slide-cache-rewind strategy to pipeline graph I/O and computing. With a modest amount of memory, G-Store utilizes a proactive caching strategy in the system so that all fetched graph data are fully utilized before evicted from memory. We evaluate G-Store on a number of graphs against two state-of-the-art graph engines and show that G-Store achieves 2 to 8× saving in storage and outperforms both by 2 to 32×. G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system.

...read moreread less

Book Chapter•DOI•

Big Data Storage

[...]

Martin Strohbach, Jörg Daubert, Herman Ravkin¹, Mario Lischka•Institutions (1)

Tel Aviv University¹

01 Jan 2016

TL;DR: This chapter provides a concise overview of big data storage systems that are capable of dealing with high velocity, high volumes, and high varieties of data and investigates the challenge of storing data in a secure and privacy-preserving way.

...read moreread less

Abstract: This chapter provides an overview of big data storage technologies It is the result of a survey of the current state of the art in data storage technologies in order to create a cross-sectorial technology roadmap This chapter provides a concise overview of big data storage systems that are capable of dealing with high velocity, high volumes, and high varieties of data It describes distributed file systems, NoSQL databases, graph databases, and NewSQL databases The chapter investigates the challenge of storing data in a secure and privacy-preserving way The social and economic impact of big data storage technologies is described, open research challenges highlighted, and three selected case studies are provided from the health, finance, and energy sector Some of the key insights on big data storage are (1) in-memory databases and columnar databases typically outperform traditional relational database systems, (2) the major technical barrier to widespread up-take of big data storage solutions are missing standards, and (3) there is a need to address open research challenges related to the scalability and performance of graph databases

...read moreread less

Journal Article•DOI•

Weaver: a high-performance, transactional graph database based on refinable timestamps

[...]

Ayush Dubey¹, Greg D. Hill², Robert Escriva¹, Emin Gün Sirer¹•Institutions (2)

Cornell University¹, Stanford University²

01 Jul 2016

TL;DR: A new distributed graph database, called Weaver, is introduced, which enables efficient, transactional graph analyses as well as strictly serializable ACID transactions on dynamic graphs, and a novel request ordering mechanism called refinable timestamps.

...read moreread less

Abstract: Graph databases have become a common infrastructure component. Yet existing systems either operate on offline snapshots, provide weak consistency guarantees, or use expensive concurrency control techniques that limit performance.In this paper, we introduce a new distributed graph database, called Weaver, which enables efficient, transactional graph analyses as well as strictly serializable ACID transactions on dynamic graphs. The key insight that allows Weaver to combine strict serializability with horizontal scalability and high performance is a novel request ordering mechanism called refinable timestamps. This technique couples coarse-grained vector timestamps with a fine-grained timeline oracle to pay the overhead of strong consistency only when needed. Experiments show that Weaver enables a Bitcoin blockchain explorer that is 8x faster than Blockchain.info, and achieves 10.9x higher throughput than the Titan graph database on social network workloads and 4x lower latency than GraphLab on offline graph traversal workloads.

...read moreread less

Proceedings Article•DOI•

The CloudMdsQL Multistore System

[...]

Boyan Kolev¹, Carlyna Bondiombouy¹, Patrick Valduriez¹, Ricardo Jiménez-Peris, Raquel Pau, José Pereira - Show less +2 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

26 Jun 2016

TL;DR: This demonstration presents a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine, a functional SQL-like language capable of querying multiple heterogeneous data stores within a single query that may contain embedded invocations to each data store's native query interface.

...read moreread less

Abstract: The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized. Within our demonstration, we focus on two use cases each involving four diverse data stores (graph, document, relational, and key-value) with its corresponding CloudMdsQL queries. The query execution flows are visualized by an embedded real-time monitoring subsystem. The users can also try out different ad-hoc queries, not necessarily in the context of the use cases.

...read moreread less

Journal Article•DOI•

Big Graph Mining: Frameworks and Techniques

[...]

Sabeur Aridhi¹, Engelbert Mephu Nguifo², Engelbert Mephu Nguifo³•Institutions (3)

Aalto University¹, Blaise Pascal University², Centre national de la recherche scientifique³

09 Feb 2016-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: An overview of existing data mining and graph processing frameworks that deal with very big graphs, and a survey of current researches in the field of data mining / pattern mining in big graphs are presented and the main research issues related to this field are discussed.

...read moreread less

Abstract: Big graph mining is an important research area and it has attracted considerable attention. It allows to process, analyze, and extract meaningful information from large amounts of graph data. Big graph mining has been highly motivated not only by the tremendously increasing size of graphs but also by its huge number of applications. Such applications include bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in big graph mining is pattern mining in big graphs. This task consists on using data mining algorithms to discover interesting, unexpected and useful patterns in large amounts of graph data. It aims also to provide deeper understanding of graph data. In this context, several graph processing frameworks and scaling data mining/pattern mining techniques have been proposed to deal with very big graphs. This paper gives an overview of existing data mining and graph processing frameworks that deal with very big graphs. Then it presents a survey of current researches in the field of data mining / pattern mining in big graphs and discusses the main research issues related to this field. It also gives a categorization of both distributed data mining and machine learning techniques, graph processing frameworks and large scale pattern mining approaches.

...read moreread less

Journal Article•DOI•

A general-purpose query-centric framework for querying big graphs

[...]

Da Yan¹, James Cheng¹, M. Tamer Özsu², Fan Yang¹, Yi Lu¹, John C. S. Lui¹, Qizhen Zhang¹, Wilfred Ng³ - Show less +4 more•Institutions (3)

The Chinese University of Hong Kong¹, University of Waterloo², Hong Kong University of Science and Technology³

01 Mar 2016

TL;DR: This work develops a new open-source system, called Quegel, for querying big graphs, which treats queries as first-class citizens in its design and provides a convenient interface for constructing graph indexes, which significantly improve query performance but are not supported by existing graph-parallel systems.

...read moreread less

Abstract: Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems employ a user-friendly "think like a vertex" programming model, and exhibit good scalability for tasks where the majority of graph vertices participate in computation. However, the design of these systems can seriously under-utilize the resources in a cluster for processing light-workload graph queries, where only a small fraction of vertices need to be accessed. In this work, we develop a new open-source system, called Quegel, for querying big graphs. Quegel treats queries as first-class citizens in its design: users only need to specify the Pregel-like algorithm for a generic query, and Quegel processes light-workload graph queries on demand, using a novel superstep-sharing execution model to effectively utilize the cluster resources. Quegel further provides a convenient interface for constructing graph indexes, which significantly improve query performance but are not supported by existing graph-parallel systems. Our experiments verified that Quegel is highly efficient in answering various types of graph queries and is up to orders of magnitude faster than existing systems.

...read moreread less

Journal Article•DOI•

Big graph search: challenges and techniques

[...]

Shuai Ma¹, Jia Li¹, Chunming Hu¹, Xuelian Lin¹, Jinpeng Huai¹ - Show less +1 more•Institutions (1)

Beihang University¹

01 Jun 2016-Frontiers of Computer Science

TL;DR: In this article, the authors argue that big graph search is the one filling the gap between traditional relational and XML models, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both industry and academia.

...read moreread less

Abstract: On one hand, compared with traditional relational and XML models, graphs have more expressive power and are widely used today. On the other hand, various applications of social computing trigger the pressing need of a new search paradigm. In this article, we argue that big graph search is the one filling this gap. We first introduce the application of graph search in various scenarios. We then formalize the graph search problem, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both the industry and academia. After that, we analyze the difficulties and challenges of big graph search. Finally, we present three classes of techniques towards big graph search: query techniques, data techniques and distributed computing techniques.

...read moreread less

Posted Content•

Foundations of Modern Graph Query Languages.

[...]

Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, Domagoj Vrgoč - Show less +2 more

20 Oct 2016

TL;DR: RENZO ANGLES, Universidad de Talca & Center for Semantic Web Research MARCELO ARENAS, Pontificia Universidad Católica de Chile and Center forSemantic Web research.

...read moreread less

Abstract: RENZO ANGLES, Universidad de Talca & Center for Semantic Web Research MARCELO ARENAS, Pontificia Universidad Católica de Chile & Center for Semantic Web Research PABLO BARCELÓ, DCC, Universidad de Chile & Center for Semantic Web Research AIDAN HOGAN, DCC, Universidad de Chile & Center for Semantic Web Research JUAN REUTTER, Pontificia Universidad Católica de Chile & Center for Semantic Web Research DOMAGOJ VRGOČ, Pontificia Universidad Católica de Chile & Center for Semantic Web Research

...read moreread less

Journal Article•DOI•

iGraph: an incremental data processing system for dynamic graph

[...]

Wuyang Ju¹, Jianxin Li¹, Weiren Yu¹, Richong Zhang¹•Institutions (1)

Beihang University¹

01 Jun 2016-Frontiers of Computer Science

TL;DR: iGraph is designed, an incremental graph processing system for dynamic graph with its continuous updates, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.

...read moreread less

Abstract: With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this paper, we design iGraph, an incremental graph processing system for dynamic graph with its continuous updates. The contributions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertexbased graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incremental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algorithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.

...read moreread less

Proceedings Article•DOI•

graphVizdb: A Scalable Platform for Interactive Large Graph Visualization

[...]

Nikos Bikakis¹, John Liagouris², Maria Krommyda¹, George Papastefanatos, Timos Sellis³ - Show less +1 more•Institutions (3)

National Technical University of Athens¹, ETH Zurich², Swinburne University of Technology³

20 Feb 2016-arXiv: Human-Computer Interaction

TL;DR: This work presents a novel platform for the interactive visualization of very large graphs that involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane and translates user operations into simple and very efficient spatial operations in the backend.

...read moreread less

Abstract: We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our approach involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane. The respective points are indexed with a spatial data structure, i.e., an R-tree, and stored in a database. Multiple abstraction layers of the graph based on various criteria are also created offline, and they are indexed similarly so that the user can explore the dataset at different levels of granularity, depending on her particular needs. Then, our system translates user operations into simple and very efficient spatial operations (i.e., window queries) in the backend. This technique allows for a fine-grained access to very large graphs with extremely low latency and memory requirements and without compromising the functionality of the tool. Our web-based prototype supports three main operations: (1) interactive navigation, (2) multi-level exploration, and (3) keyword search on the graph metadata.

...read moreread less

Collapse