Top 411 papers published in the topic of Graph database in 2019

Posted Content•

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.

[...]

Maciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler - Show less +4 more

20 Oct 2019-arXiv: Databases

TL;DR: This work presents the first survey and taxonomy of graph database systems, identifying and analyzing fundamental categories of these systems, and outlines graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms).

...read moreread less

Abstract: Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 45 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we describe research and engineering challenges to outline the future of graph databases.

...read moreread less

70 citations

Journal Article•DOI•

Efficient algorithms for densest subgraph discovery

[...]

Yixiang Fang¹, Kaiqiang Yu², Reynold Cheng², Laks V. S. Lakshmanan³, Xuemin Lin¹ - Show less +1 more•Institutions (3)

University of New South Wales¹, University of Hong Kong², University of British Columbia³

01 Jul 2019

TL;DR: In this article, a new solution paradigm was proposed to find the densest subgraphs through a k-core (a kind of dense subgraph of a graph) with theoretical guarantees.

...read moreread less

Abstract: Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D). Because DSD is difficult to solve, we propose a new solution paradigm in this paper. Our main observation is that the densest subgraph can be accurately found through a k-core (a kind of dense subgraph of G), with theoretical guarantees. Based on this intuition, we develop efficient exact and approximation solutions for DSD. Moreover, our solutions are able to find the densest subgraphs for a wide range of graph density definitions, including clique-based- and general pattern-based density. We have performed extensive experimental evaluation on both real and synthetic datasets. Our results show that our algorithms are up to four orders of magnitude faster than existing approaches.

...read moreread less

65 citations

Posted Content•

TigerGraph: A Native MPP Graph Database.

[...]

Alin Deutsch, Yu Xu, Mingxi Wu, Victor E. Lee

24 Jan 2019-arXiv: Databases

TL;DR: TigerGraph's high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high- level specification.

...read moreread less

Abstract: We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. TigerGraph's high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high-level specification. GSQL is sufficiently high-level to allow declarative SQL-style programming, yet sufficiently expressive to concisely specify the sophisticated iterative algorithms required by modern graph analytics and traditionally coded in general-purpose programming languages like C++ and Java. We report very strong scale-up and scale-out performance over a benchmark we published on GitHub for full reproducibility.

...read moreread less

39 citations

Journal Article•DOI•

DICO: A Graph-DB Framework for Community Detection on Big Scholarly Data

[...]

Fabio Mercorio¹, Mario Mezzanzanica¹, Vincenzo Moscato², Antonio Picariello², Giancarlo Sperlì² - Show less +1 more•Institutions (2)

University of Milano-Bicocca¹, University of Naples Federico II²

18 Nov 2019-IEEE Transactions on Emerging Topics in Computing

TL;DR: This article proposes a framework, namely Discovery Information using COmmunity detection (DICO), for identifying overlapped communities of authors from Big Scholarly Data by modeling authors’ interactions through a novel graph-based data model combining jointly document metadata with semantic information.

...read moreread less

Abstract: The widespread use of Online Social Networks has also involved the scientific field in which researchers interact each other by publishing or citing a given paper. The huge amount of information about scientific research documents has been described through the term Big Scholarly Data. In this paper we propose a framework, namely Discovery Information using COmmunity detection (DICO), for identifying overlapped communities of authors from Big Scholarly Data by modeling authors' interactions through a novel graph-based data model combining jointly document metadata with semantic information. In particular, DICO presents three distinctive characteristics:i) the co-authorship network has been built from publication records using a novel approach for estimating relationships weight between users;ii) a new community detection algorithm based on Node Location Analysis has been developed to identify overlapped communities;iii) some built-in queries are provided to browse the generated network, though any graph-traversal query can be implemented through the Cypher query language. The experimental evaluation has been carried out to evaluate the efficacy of the proposed community detection algorithm on benchmark networks.Finally, DICO has been tested on a real-world Big Scholarly Dataset to show its usefulness working on the DBLP+AMiner dataset, that contains 1.7M+ distinct authors, 3M+ papers, handling 25M+ citation relationships.

...read moreread less

35 citations

Journal Article•DOI•

Exploiting GPU parallelism in improving bees swarm optimization for mining big transactional databases

[...]

Youcef Djenouri, Djamel Djenouri, Asma Belhadi, Philippe Fournier-Viger¹, Jerry Chun-Wei Lin², Ahcene Bendjoudi - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Bergen University College²

01 Sep 2019-Information Sciences

TL;DR: A GPU-based Bees Swarm Optimization Miner where the GPU is used as a co-processor to compute the CPU-time intensive steps of the algorithm and reveals that GBSO-Miner is up to 800 times faster than an optimized CPU-Implementation.

...read moreread less

33 citations

Journal Article•DOI•

Graph-based data integration from bioactive peptide databases of pharmaceutical interest: toward an organized collection enabling visual network analysis

[...]

Longendri Aguilera-Mendoza¹, Yovani Marrero-Ponce², Jesus A. Beltran¹, Roberto Tellez Ibarra, Hugo A. Guillen-Ramirez¹, Carlos A. Brizuela¹ - Show less +2 more•Institutions (2)

Ensenada Center for Scientific Research and Higher Education¹, Universidad San Francisco de Quito²

01 Nov 2019-Bioinformatics

TL;DR: A software tool has been developed for supporting visual network analysis in a user-friendly way; providing several functionalities such as peptide retrieval and filtering, network construction and visualization, interactive exploration, and exporting data options.

...read moreread less

Abstract: Motivation Bioactive peptides have gained great attention in the academy and pharmaceutical industry since they play an important role in human health. However, the increasing number of bioactive peptide databases is causing the problem of data redundancy and duplicated efforts. Even worse is the fact that the available data is non-standardized and often dirty with data entry errors. Therefore, there is a need for a unified view that enables a more comprehensive analysis of the information on this topic residing at different sites. Results After collecting web pages from a large variety of bioactive peptide databases, we organized the web content into an integrated graph database (starPepDB) that holds a total of 71 310 nodes and 348 505 relationships. In this graph structure, there are 45 120 nodes representing peptides, and the rest of the nodes are connected to peptides for describing metadata. Additionally, to facilitate a better understanding of the integrated data, a software tool (starPep toolbox) has been developed for supporting visual network analysis in a user-friendly way; providing several functionalities such as peptide retrieval and filtering, network construction and visualization, interactive exploration and exporting data options. Availability and implementation Both starPepDB and starPep toolbox are freely available at http://mobiosd-hub.com/starpep/. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

33 citations

Proceedings Article•DOI•

Scaling Up Subgraph Query Processing with Efficient Subgraph Matching

[...]

Shixuan Sun¹, Qiong Luo¹•Institutions (1)

Hong Kong University of Science and Technology¹

08 Apr 2019

TL;DR: The results show that (1) the slow verification method in existing IFV algorithms can lead us to over-estimate the gain of filtering; and (2) the modified subgraph querying algorithms with efficient subgraph matching are competitive in time performance and can scale to hundreds of thousands of data graphs and graphs ofThousands of vertices.

...read moreread less

Abstract: A subgraph query finds all data graphs in a graph database each of which contains the given query graph. Existing work takes the indexing-filtering-verification (IFV) approach to first index all data graphs, then filter out some of them based on the index, and finally test subgraph isomorphism on each of the remaining data graphs. This final test of subgraph isomorphism is a sub-problem of subgraph matching, which finds all subgraph isomorphisms from a query graph to a data graph. As such, in this paper, we study whether, and if so, how to utilize efficient subgraph matching to improve subgraph query processing. Specifically, we modify leading subgraph matching algorithms and integrate them with top-performing subgraph querying algorithms. Our results show that (1) the slow verification method in existing IFV algorithms can lead us to over-estimate the gain of filtering; and (2) our modified subgraph querying algorithms with efficient subgraph matching are competitive in time performance and can scale to hundreds of thousands of data graphs and graphs of thousands of vertices.

...read moreread less

30 citations

Journal Article•DOI•

Which Category Is Better: Benchmarking Relational and Graph Database Management Systems

[...]

Yijian Cheng¹, Pengjie Ding¹, Tongtong Wang¹, Wei Lu¹, Xiaoyong Du¹ - Show less +1 more•Institutions (1)

Renmin University of China¹

01 Dec 2019-Data Science and Engineering

TL;DR: RDBMSs outperform GDMBSs by a substantial margin under the workloads which mainly consist of group by, sort, and aggregation operations, and their combinations; and G DMBSs show their superiority underThe workloads that mainly consists of multi-table join, pattern match, path identification, and the combinations.

...read moreread less

Abstract: Over decades, relational database management systems (RDBMSs) have been the first choice to manage data. Recently, due to the variety properties of big data, graph database management systems (GDBMSs) have emerged as an important complement to RDBMSs. As pointed out in the existing literature, both RDBMSs and GDBMSs are capable of managing graph data and relational data; however, the boundaries of them still remain unclear. For this reason, in this paper, we first extend a unified benchmark for RDBMSs and GDBMSs over the same datasets using the same query workload under the same metrics. We then conduct extensive experiments to evaluate them and make the following findings: (1) RDBMSs outperform GDMBSs by a substantial margin under the workloads which mainly consist of group by, sort, and aggregation operations, and their combinations; (2) GDMBSs show their superiority under the workloads that mainly consist of multi-table join, pattern match, path identification, and their combinations.

...read moreread less

29 citations

Journal Article•DOI•

QAnalysis: a question-answer driven analytic tool on knowledge graphs for leveraging electronic medical records for clinical research.

[...]

Tong Ruan¹, Yueqi Huang¹, Xuli Liu¹, Yuhang Xia¹, Ju Gao - Show less +1 more•Institutions (1)

East China University of Science and Technology¹

01 Apr 2019-BMC Medical Informatics and Decision Making

TL;DR: A novel tool QAnalysis is built, where doctors enter their analytic requirements in their natural language and then the tool returns charts and tables to the doctors, which provides a convenient way for doctors to get statistical results directly in natural language.

...read moreread less

Abstract: While doctors should analyze a large amount of electronic medical record (EMR) data to conduct clinical research, the analyzing process requires information technology (IT) skills, which is difficult for most doctors in China. In this paper, we build a novel tool QAnalysis, where doctors enter their analytic requirements in their natural language and then the tool returns charts and tables to the doctors. For a given question from a user, we first segment the sentence, and then we use grammar parser to analyze the structure of the sentence. After linking the segmentations to concepts and predicates in knowledge graphs, we convert the question into a set of triples connected with different kinds of operators. These triples are converted to queries in Cypher, the query language for Neo4j. Finally, the query is executed on Neo4j, and the results shown in terms of tables and charts are returned to the user. The tool supports top 50 questions we gathered from two hospital departments with the Delphi method. We also gathered 161 questions from clinical research papers with statistical requirements on EMR data. Experimental results show that our tool can directly cover 78.20% of these statistical questions and the precision is as high as 96.36%. Such extension is easy to achieve with the help of knowledge-graph technology we have adopted. The recorded demo can be accessed from https://github.com/NLP-BigDataLab/QAnalysis-project . Our tool shows great flexibility in processing different kinds of statistic questions, which provides a convenient way for doctors to get statistical results directly in natural language.

...read moreread less

29 citations

Proceedings Article•DOI•

The maven dependency graph: a temporal graph-based representation of maven central

[...]

Amine Benelallam¹, Nicolas Harrand², César Soto-Valero², Benoit Baudry², Olivier Barais¹ - Show less +1 more•Institutions (2)

University of Rennes¹, Royal Institute of Technology²

26 May 2019

TL;DR: The Maven Dependency Graph as discussed by the authors is a dataset of 2.8M artifacts from the Maven Central Repository with metadata such as exact version, date of upload and list of dependencies towards other artifacts.

...read moreread less

Abstract: The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.

...read moreread less

29 citations

Proceedings Article•

GraphOne: A Data Store for Real-time Analytics on Evolving Graphs.

[...]

Pradeep Kumar¹, H. Howie Huang¹•Institutions (1)

George Washington University¹

01 Jan 2019

TL;DR: GraphOne as mentioned in this paper is a graph data store that abstracts the graph data stores away from the specialized systems to solve the fundamental research problems associated with the data store design, and it combines two complementary graph storage formats (edge list and adjacency list) to decouple graph computations from updates.

...read moreread less

Abstract: There is a growing need to perform a diverse set of real-time analytics (batch and stream analytics) on evolving graphs to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations; rather, they excel in one area that keeps a private data store in a specialized way to favor their operations only. To address this challenge, we have designed and developed GraphOne, a graph data store that abstracts the graph data store away from the specialized systems to solve the fundamental research problems associated with the data store design. It combines two complementary graph storage formats (edge list and adjacency list) and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities of data ingestions (called data visibility) for concurrent execution of diverse classes of real-time graph analytics with only a small data duplication. Experimental results show that GraphOne is able to deliver 11.40× and 5.36× average speedup in ingestion rate against LLAMA and Stinger, the two state-of-the-art dynamic graph systems, respectively. Further, they achieve an average speedup of 8.75× and 4.14× against LLAMA and 12.80× and 3.18× against Stinger for BFS and PageRank analytics (batch version), respectively. GraphOne also gains over 2,000× speedup against Kickstarter, a state-of-the-art stream analytics engine in ingesting the streaming edges and performing streaming BFS when treating first half as a base snapshot and rest as streaming edge in a synthetic graph. GraphOne also achieves an ingestion rate of two to three orders of magnitude higher than graph databases. Finally, we demonstrate that it is possible to run concurrent stream analytics from the same data store.

...read moreread less

Journal Article•DOI•

OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science

[...]

Lyubomir Penev, Mariya Dimitrova, Viktor Senderov, Georgi Zhelezov, Teodor Georgiev, Pavel Stoev, Kiril Simov - Show less +3 more

29 May 2019-Publications

TL;DR: This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge.

...read moreread less

Abstract: Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.

...read moreread less

Book Chapter•DOI•

Schema Validation and Evolution for Graph Databases

[...]

Angela Bonifati¹, Peter Furniss, Alastair Green, Russ Harmer², Eugenia Oshurko², Hannes Voigt - Show less +2 more•Institutions (2)

Claude Bernard University Lyon 1¹, École normale supérieure de Lyon²

04 Nov 2019

TL;DR: Apart from proposing concise schema DDL inspired by Cypher syntax, this work shows how schema validation can be enforced through homomorphisms between PG schemas and PG instances; and how schema evolution can be described through the use of graph rewriting operations.

...read moreread less

Abstract: Despite the maturity of commercial graph databases, little consensus has been reached so far on the standardization of data definition languages (DDLs) for property graphs (PG). Discussion on the characteristics of PG schemas is ongoing in many standardization and community groups. Although some basic aspects of a schema are already present in most commercial graph databases, full support is missing allowing to constraint property graphs with more or less flexibility.

...read moreread less

Posted Content•

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems

[...]

Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler - Show less +1 more

29 Dec 2019-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency, and for different graph updates as well as analytics workloads.

...read moreread less

Abstract: Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such highly dynamic workloads. Recent years have seen the development of many such frameworks. However, they differ in their general architectures (with key details such as the support for the concurrent execution of graph updates and queries, or the incorporated graph data organization), the types of updates and workloads allowed, and many others. To facilitate the understanding of this growing field, we provide the first analysis and taxonomy of dynamic and streaming graph processing. We focus on identifying the fundamental system designs and on understanding their support for concurrency, and for different graph updates as well as analytics workloads. We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases. Moreover, we provide a bridge with the very rich landscape of graph streaming theory by giving a broad overview of recent theoretical related advances, and by discussing which graph streaming models and settings could be helpful in developing more powerful streaming frameworks and designs. We also outline graph streaming workloads and research challenges.

...read moreread less

Posted Content•

Efficient Algorithms for Densest Subgraph Discovery

[...]

Yixiang Fang¹, Kaiqiang Yu², Reynold Cheng², Laks V. S. Lakshmanan³, Xuemin Lin¹ - Show less +1 more•Institutions (3)

University of New South Wales¹, University of Hong Kong², University of British Columbia³

02 Jun 2019-arXiv: Databases

TL;DR: The main observation is that a densest subgraph can be accurately found through a k-core (a kind of dense subgraph of G), with theoretical guarantees, and efficient exact and approximation solutions for DSD are developed.

...read moreread less

Abstract: Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D). Because DSD is difficult to solve, we propose a new solution paradigm in this paper. Our main observation is that a densest subgraph can be accurately found through a k-core (a kind of dense subgraph of G), with theoretical guarantees. Based on this intuition, we develop efficient exact and approximation solutions for DSD. Moreover, our solutions are able to find the densest subgraphs for a wide range of graph density definitions, including clique-based and general pattern-based density. We have performed extensive experimental evaluation on eleven real datasets. Our results show that our algorithms are up to four orders of magnitude faster than existing approaches.

...read moreread less

Journal Article•DOI•

A novel approach based on Neo4j for multi-constrained flexible job shop scheduling problem

[...]

Zhenwei Zhu¹, Xionghui Zhou¹, Kang Shao¹•Institutions (1)

Shanghai Jiao Tong University¹

14 Mar 2019-Computers & Industrial Engineering

TL;DR: A semantic graph model is proposed which can not only represent the scheduling problem with extended constraints but also integrate the entire lifecycle data and inspire a simulation-based ant colony algorithm to acquire a feasible and nearly optimal schedule solution.

...read moreread less

Proceedings Article•DOI•

Efficient Maximal Spatial Clique Enumeration

[...]

Chen Zhang¹, Ying Zhang², Wenjie Zhang¹, Lu Qin², Jianye Yang³ - Show less +1 more•Institutions (3)

University of New South Wales¹, University of Technology, Sydney², Alibaba Group³

08 Apr 2019

TL;DR: This paper develops two pruning techniques based on geometric properties of the maximal spatial clique to significantly enhance the computing efficiency and shows that this technique can identify groups of spatially close objects in a variety of location-based-service (LBS) applications.

...read moreread less

Abstract: Maximal clique enumeration is a fundamental problem in graph database. In this paper, we investigate this problem in the context of spatial database. Given a set P of spatial objects in a 2-dimensional space (e.g., geo-locations of users or point of interests) and a distance threshold r, we can come up with a spatial neighbourhood graph Pr by connecting every pair of objects (vertices) in P within distance r. Given a clique S of Pr, namely a spatial clique, it is immediate that any pairwise distance among objects in S is bounded by r. As the maximal pairwise distance has been widely used to capture the spatial cohesiveness of a group of objects, the maximal spatial clique enumeration technique can identify groups of spatially close objects in a variety of location-based-service (LBS) applications. In addition, we show that the maximal spatial clique enumeration can also be used to identify maximal clique pattern instances in the co-location pattern mining applications. Given the existing techniques for maximal clique enumeration, which can be immediately applied on the spatial neighbourhood graph Pr, two questions naturally arise for the enumeration of maximal spatial cliques: (1) the maximal clique enumeration on general graph is NP hard, can we have a polynomial time solution on the spatial neighbourhood graph? and (2) can we exploit the geometric property of the spatial clique to speed up the computation? In this paper, we give a negative answer to the first question by an example where the number of maximal spatial cliques is exponential to the number of the objects. While the answer to the second question is rather positive: we indeed develop two pruning techniques based on geometric properties of the maximal spatial clique to significantly enhance the computing efficiency. Extensive experiments on real-life geolocation data demonstrate the superior performance of proposed methods compared with two baseline algorithms.

...read moreread less

Journal Article•DOI•

A Review on OLAP Technologies Applied to Information Networks

[...]

Paulo Orlando Queiroz-Sousa¹, Ana Carolina Salgado¹•Institutions (1)

Federal University of Pernambuco¹

13 Dec 2019-ACM Transactions on Knowledge Discovery From Data

TL;DR: In this paper, the authors present a literature review on the main applications of OLAP technology in the analysis of information network data, and show a systematic review to list the works that apply OLAP technologies in graph data.

...read moreread less

Abstract: Many real systems produce network data or highly interconnected data, which can be called information networks. These information networks form a critical component in modern information infrastructure, constituting a large graph data volume. The analysis of information network data covers several technological areas, among them OLAP technologies. OLAP is a technology that enables multi-dimensional and multi-level analysis on a large volume of data, providing aggregated data visualizations with different perspectives. This article presents a literature review on the main applications of OLAP technology in the analysis of information network data. To achieve such goal, it shows a systematic review to list the works that apply OLAP technologies in graph data. It defines seven comparison criteria (Materialization, Network, Selection, Aggregation, Model, OLAP Operations, Analytics) to qualify the works found based on their functionalities. The works are analyzed according to each criterion and discussed to identify trends and challenges in the application of OLAP in the information network.

...read moreread less

Book Chapter•DOI•

A Worst-Case Optimal Join Algorithm for SPARQL

[...]

Aidan Hogan¹, Cristian Riveros², Carlos del Valle Rojas², Adrián Soto²•Institutions (2)

University of Chile¹, Pontifical Catholic University of Chile²

26 Oct 2019

TL;DR: In this article, a worst-case optimal multiway join algorithm called Leapfrog TrieJoin is proposed to evaluate SPARQL queries based on an existing worstcase join algorithm.

...read moreread less

Abstract: Worst-case optimal multiway join algorithms have recently gained a lot of attention in the database literature. These algorithms not only offer strong theoretical guarantees of efficiency, but have also been empirically demonstrated to significantly improve query runtimes for relational and graph databases. Despite these promising theoretical and practical results, however, the Semantic Web community has yet to adopt such techniques; to the best of our knowledge, no native RDF database currently supports such join algorithms, where in this paper we demonstrate that this should change. We propose a novel procedure for evaluating SPARQL queries based on an existing worst-case join algorithm called Leapfrog Triejoin. We propose an adaptation of this algorithm for evaluating SPARQL queries, and implement it in Apache Jena. We then present experiments over the Berlin and WatDiv SPARQL benchmarks, and a novel benchmark that we propose based on Wikidata that is designed to provide insights into join performance for a more diverse set of basic graph patterns. Our results show that with this new join algorithm, Apache Jena often runs orders of magnitude faster than the base version and two other SPARQL engines: Virtuoso and Blazegraph.

...read moreread less

Journal Article•DOI•

Graph Database Solution for Higher-order Spatial Statistics in the Era of Big Data

[...]

Cristiano G. Sabiu¹, Ben Hoyle², Ben Hoyle³, Juhan Kim⁴, Xiaodong Li⁵ - Show less +1 more•Institutions (5)

Yonsei University¹, Ludwig Maximilian University of Munich², Max Planck Society³, Korea Institute for Advanced Study⁴, Sun Yat-sen University⁵

17 Jun 2019-Astrophysical Journal Supplement Series

TL;DR: In this paper, the authors present an algorithm for the fast computation of the general $N$-point spatial correlation functions of any discrete point set embedded within an Euclidean space of $mathbb{R}^n.

...read moreread less

Abstract: We present an algorithm for the fast computation of the general $N$-point spatial correlation functions of any discrete point set embedded within an Euclidean space of $\mathbb{R}^n$. Utilizing the concepts of kd-trees and graph databases, we describe how to count all possible $N$-tuples in binned configurations within a given length scale, e.g. all pairs of points or all triplets of points with side lengths $

...read moreread less

Book Chapter•DOI•

GKC: A Reasoning System for Large Knowledge Bases

[...]

Tanel Tammet¹•Institutions (1)

Tallinn University of Technology¹

27 Aug 2019

TL;DR: This paper introduces GKC, a resolution prover optimized for search in large knowledge bases built upon a shared memory graph database Whitedb, enabling it to solve multiple different queries without a need to repeatedly parse or load the large parsed knowledge base from the disk.

...read moreread less

Abstract: This paper introduces GKC, a resolution prover optimized for search in large knowledge bases. The system is built upon a shared memory graph database Whitedb, enabling it to solve multiple different queries without a need to repeatedly parse or load the large parsed knowledge base from the disk. Due to the relatively shallow and simple structure of most of the literals in the knowledge base, the indexing methods used are mostly hash-based. While GKC performs well on large problems from the TPTP set, the system is built for use as a core system for developing a toolset of commonsense reasoning functionalities.

...read moreread less

Proceedings Article•DOI•

RedisGraph GraphBLAS Enabled Graph Database

[...]

Pieter Cailliau, Timothy A. Davis¹, Vijay Gadepally², Jeremy Kepner², Roi Lipman, Jeffrey Lovitz, Keren Ouaknine - Show less +3 more•Institutions (2)

Texas A&M University¹, Massachusetts Institute of Technology²

20 May 2019

TL;DR: RedisGraph is a Redis module developed by Redis Labs to add graph database functionality to the Redis database and is significantly faster than comparable graph databases.

...read moreread less

Abstract: RedisGraph is a Redis module developed by Redis Labs to add graph database functionality to the Redis database. RedisGraph represents connected data as adjacency matrices. By representing the data as sparse matrices and employing the power of GraphBLAS (a highly optimized library for sparse matrix operations), RedisGraph delivers a fast and efficient way to store, manage and process graphs. Initial benchmarks indicate that RedisGraph is significantly faster than comparable graph databases.

...read moreread less

Proceedings Article•DOI•

SQL Database with physical database tuning technique and NoSQL graph database comparisons

[...]

Wisal Khan¹, Waqas Ahmad¹, Bin Luo¹, Ejaz Ahmed²•Institutions (2)

Anhui University¹, National University of Computer and Emerging Sciences²

15 Mar 2019

TL;DR: The physical database tuning of the Oracle Relational database is done and NoSQL graph database is compared with NoSQL Graph database to increase the performance of relational databases.

...read moreread less

Abstract: Relational databases are used in many organizations of various natures from last three decades such as Education, health, businesses and in many other applications. SQL databases are designed to manage structured data and show tremendous performance. Atomicity, Consistency Isolation, Durability (ACID) property of Relational databases is used to manage data integrity and consistency. Physical database techniques are used to increase the performance of relational databases. Tablespaces also called subfolder is one of the physical database technique used by Oracle SQL database. Tablespaces are used to store the data logically in separate data files. Now-a-days huge amount and varied nature (unstructured and semi structured) of data is generated by the various organizations i.e., videos, images, blogs etc. This large amount of data is not handled by the SQL databases efficiently. NoSQL databases are used to process and analyze the large amount of data efficiently. Four different types of NoSQL databases are used in the industry according to the organization requirement. In this article, first, we do the physical database tuning of the Oracle Relational database and then compared with NoSQL Graph database. Relational database performance is increased up to 50% due to physical database tuning technique (Tablespaces). Besides, physical database tuning approach of relational database NoSQL graph database performed better in all our proposed scenarios.

...read moreread less

Proceedings Article•DOI•

GraphSE²: An Encrypted Graph Database for Privacy-Preserving Social Search

[...]

Shangqi Lai¹, Xingliang Yuan¹, Shi-Feng Sun¹, Joseph K. Liu¹, Yuhong Liu², Dongxi Liu³ - Show less +2 more•Institutions (3)

Monash University¹, Santa Clara University², Commonwealth Scientific and Industrial Research Organisation³

02 Jul 2019

TL;DR: GraphSE\textsuperscript2 as mentioned in this paper is an encrypted graph database for online social network services to address massive data breaches, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user generated contents.

...read moreread less

Abstract: In this paper, we propose GraphSE\textsuperscript2, an encrypted graph database for online social network services to address massive data breaches. GraphSE\textsuperscript2 ~preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE\textsuperscript2 ~provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE\textsuperscript2 ~with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE\textsuperscript2 ~is practical for querying a social graph with a million of users.

...read moreread less

Journal Article•DOI•

A survey of graph-based algorithms for discovering business processes

[...]

Riyanarto Sarno¹, Kelly Rossa Sungkono¹•Institutions (1)

Sepuluh Nopember Institute of Technology¹

30 Jul 2019-International Journal of Advances in Intelligent Informatics

TL;DR: In this article, the authors analyzed graph-based algorithms by measuring the time complexity and performance metrics and comparing them with a widely used algorithm, i.e. Alpha Miner and its expansion.

...read moreread less

Abstract: Algorithms of process discovery help analysts to understand business processes and problems in a system by creating a process model based on a log of the system. There are existing algorithms of process discovery, namely graph-based. Of all algorithms, there are algorithms that process graph-database to depict a process model. Those algorithms claimed that those have less time complexity because of the graph-database ability to store relationships. This research analyses graph-based algorithms by measuring the time complexity and performance metrics and comparing them with a widely used algorithm, i.e. Alpha Miner and its expansion. Other than that, this research also gives outline explanations about graph-based algorithms and their focus issues. Based on the evaluations, the graph-based algorithm has high performance and less time complexity than Alpha Miner algorithm.

...read moreread less

Journal Article•DOI•

Key aspects of covert networks data collection: Problems, challenges, and opportunities

[...]

Tomas Diviak¹, Tomas Diviak²•Institutions (2)

University of Groningen¹, Charles University in Prague²

31 Oct 2019-Social Networks

TL;DR: Three potentially synergistic and combinable techniques for data collection are proposed for each stage of data collection – biographies for data extraction, graph databases for data storage, and checklists for data reporting.

...read moreread less

Proceedings Article•DOI•

Grasper: A High Performance Distributed System for OLAP on Property Graphs

[...]

Hongzhi Chen¹, Changji Li¹, Juncheng Fang¹, Chenghuan Huang¹, James Cheng¹, Jian Zhang¹, Yifan Hou¹, Xiao Yan¹ - Show less +4 more•Institutions (1)

The Chinese University of Hong Kong¹

20 Nov 2019

TL;DR: A novel query execution model, called Expert Model, is proposed, which supports adaptive parallelism control at the fine-grained query operation level and allows tailored optimizations for different categories of query operators, thus achieving high parallelism and good load balancing.

...read moreread less

Abstract: The property graph (PG) model is one of the most general graph data model and has been widely adopted in many graph analytics and processing systems. However, existing systems suffer from poor performance in terms of both latency and throughput for processing online analytical workloads on PGs due to their design defects such as expensive interactions with external databases, low parallelism, and high network overheads. In this paper, we propose Grasper, a high performance distributed system for OLAP on property graphs. Grasper adopts RDMA-aware system designs to reduce the network communication cost. We propose a novel query execution model, called Expert Model, which supports adaptive parallelism control at the fine-grained query operation level and allows tailored optimizations for different categories of query operators, thus achieving high parallelism and good load balancing. Experimental results show that Grasper achieves low latency and high throughput on a broad range of online analytical workloads.

...read moreread less

Journal Article•DOI•

ChronoSphere: a graph-based EMF model repository for IT landscape models

[...]

Martin Haeusler¹, Thomas Trojer, Johannes Kessler¹, Matthias Farwick, Emmanuel Nowakowski¹, Ruth Breu¹ - Show less +2 more•Institutions (1)

University of Innsbruck¹

01 Jan 2019-Software and Systems Modeling

TL;DR: This work combines domain-driven modeling concepts with scalable graph-based repository technology and a custom language for model-level queries to solve the challenges of IT Landscape models and meet the requirements that arise from this application domain.

...read moreread less

Abstract: IT Landscape models are representing the real-world IT infrastructure of a company. They include hardware assets such as physical servers and storage media, as well as virtual components like clusters, virtual machines and applications. These models are a critical source of information in numerous tasks, including planning, error detection and impact analysis. The responsible stakeholders often struggle to keep such a large and densely connected model up-to-date due to its inherent size and complexity, as well as due to the lack of proper tool support. Even though modeling techniques are very suitable for this domain, existing tools do not offer the required features, scalability or flexibility. In order to solve these challenges and meet the requirements that arise from this application domain, we combine domain-driven modeling concepts with scalable graph-based repository technology and a custom language for model-level queries. We analyze in detail how we synthesized these requirements from the application domain and how they relate to the features of our repository. We discuss the architecture of our solution which comprises the entire data management stack, including transactions, queries, versioned persistence and metamodel evolution. Finally, we evaluate our approach in a case study where our open-source repository implementation is employed in a production environment in an industrial context, as well as in a comparative benchmark with an existing state-of-the-art solution.

...read moreread less

Posted Content•

Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin (Extended Version).

[...]

Harsh Thakkar, Dharmen Punjani, Sören Auer, Maria-Esther Vidal

17 Aug 2019-arXiv: Databases

TL;DR: Gremlin this paper is a graph traversal language and machine that provides a common platform for supporting any graph computing system (such as an OLTP graph database or OLAP graph processors).

...read moreread less

Abstract: Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flexible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provides a common platform for supporting any graph computing system (such as an OLTP graph database or OLAP graph processors). We present a formalization of graph pattern matching for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.

...read moreread less

Proceedings Article•DOI•

CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation

[...]

Kai Huang¹, Huey Eng Chua², Sourav S. Bhowmick², Byron Choi³, Shuigeng Zhou¹ - Show less +1 more•Institutions (3)

Fudan University¹, Nanyang Technological University², Hong Kong Baptist University³

25 Jun 2019

TL;DR: Catapult takes a data-driven approach toautomatically select canned patterns, thereby taking a concrete step towards the vision of data- driven construction of visual query interfaces.

...read moreread less

Abstract: Visual graph query interfaces (a.k.a gui ) widen the reach of graph querying frameworks across different users by enabling non-programmers to use them. Consequently, several commercial and academic frameworks for querying a large collection of small- or medium-sized data graphs (\textite.g., chemical compounds) provide such visual interfaces. Majority of these interfaces expose a fixed set ofcanned patterns (\textiti.e., small subgraph patterns) to expedite query formulation by enabling pattern-at-a-time in lieu of edge-at-a-time construction mode. Canned patterns to be displayed on a gui are typically selected manually based on domain knowledge. However, manual generation of canned patterns is labour intensive. Furthermore, these patterns may not sufficiently cover the underlying data graphs to expedite visual formulation of a wide range of subgraph queries. In this paper, we present a generic and extensible framework called Catapult to address these limitations. Catapult takes a data-driven approach toautomatically select canned patterns, thereby taking a concrete step towards the vision of data-driven construction of visual query interfaces. Specifically, it firstclusters the underlying data graphs based on their topological similarities and thensummarize each cluster to create acluster summary graph (csg ). The canned patterns within a user-specifiedpattern budget are then generated from these csg s by maximizingcoverage anddiversity, and minimizingcognitive load of the patterns. Experimental study with real-world datasets and visual graph interfaces demonstrates the superiority of Catapult compared to traditional techniques.

...read moreread less

Showing papers on "Graph database published in 2019"