scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2022"


Proceedings ArticleDOI
10 Jun 2022
TL;DR: GQL as discussed by the authors is a standard property graph query language that complements the SQL/PGQ project, which specifies how to define graph views over a SQL tabular schema, and to run read-only queries against them.
Abstract: As graph databases become widespread, the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) have approved a project to create GQL, a standard property graph query language. This complements the SQL/PGQ project, which specifies how to define graph views over a SQL tabular schema, and to run read-only queries against them.

11 citations


Book ChapterDOI
01 Jan 2022

8 citations


Journal ArticleDOI
TL;DR: The characteristics of the widely used graph algorithms and graph processing frameworks on GPU are explored and several graph-based optimization technologies for IVN data processing are proposed.
Abstract: Intelligent vehicular network (IVN) is the underlying support for the connected vehicles and smart city, but there are several challenges for IVN data processing due to the dynamic structure of the vehicular network. Graph processing, as one of the essential machine learning and big data processing paradigm, which provide a set of big data processing scheme, is well-designed to processing the connected data. In this paper, we discussed the research challenges of IVN data processing and motivated us to address these challenges by using graph processing technologies. We explored the characteristics of the widely used graph algorithms and graph processing frameworks on GPU. Furthermore, we proposed several graph-based optimization technologies for IVN data processing. The experimental results show the graph processing technologies on GPU can archive excellent performance on IVN data.

7 citations


Proceedings ArticleDOI
18 Jul 2022
TL;DR: This paper presents Grand, an approach for automatically finding logic bugs in GDBs that adopt Gremlin as their query language, and proposes a model-based query generation approach to generate valid Gremlin queries that can potentially return non-empty results, and a data mapping approach to unify the format of query results for different G DBs.
Abstract: Graph database systems (GDBs) allow efficiently storing and retrieving graph data, and have become the critical component in many applications, e.g., knowledge graphs, social networks, and fraud detection. It is important to ensure that GDBs operate correctly. Logic bugs can occur and make GDBs return an incorrect result for a given query. These bugs are critical and can easily go unnoticed by developers when the graph and queries become complicated. Despite the importance of GDBs, logic bugs in GDBs have received less attention than those in relational database systems. In this paper, we present Grand, an approach for automatically finding logic bugs in GDBs that adopt Gremlin as their query language. The core idea of Grand is to construct semantically equivalent databases for multiple GDBs, and then compare the results of a Gremlin query on these databases. If the return results of a query on multiple GDBs are different, the likely cause is a logic bug in these GDBs. To effectively test GDBs, we propose a model-based query generation approach to generate valid Gremlin queries that can potentially return non-empty results, and a data mapping approach to unify the format of query results for different GDBs. We evaluate Grand on six widely-used GDBs, e.g., Neo4j and HugeGraph. In total, we have found 21 previously-unknown logic bugs in these GDBs. Among them, developers have confirmed 18 bugs, and fixed 7 bugs.

6 citations


Journal ArticleDOI
TL;DR: In this paper , a comprehensive evaluation of the popular semantic data repositories and their computational performance in managing and providing semantic support for spatial queries is provided. And the results show that Virtuoso achieves the overall best performance in both non-spatial and spatial-semantic queries.

5 citations


Journal ArticleDOI
20 Nov 2022
TL;DR: PG-Schema as discussed by the authors is a formalism for specifying graph schemas with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism.
Abstract: Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Schema with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.

5 citations


Journal ArticleDOI
TL;DR: HyGraph as discussed by the authors is a subgraph isomorphism approach that is efficient both in querying and with memory requirements for index creation in big graph databases, and it is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases.
Abstract: Abstract The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets—(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios.

5 citations


Journal ArticleDOI
TL;DR: In this article , a graph-based approach to develop a modular library of prefabricated parts and assemblies is proposed. But, existing approaches to manage such libraries are oriented around single-use projects and there is need for a more flexible data structure to support storage, analysis and reuse of design information.

5 citations


Journal ArticleDOI
TL;DR: Compared the query performance of a graph-based database system (Neo4j) and relational database systems (MySQL and MariaDB), the effect of different efficiency issues were included in the comparison in order to investigate the most efficient solutions for different query types.
Abstract: In developing NoSQL databases, a major motivation is to achieve better efficient query performance compared with relational databases. The graph database is a NoSQL paradigm where navigation is based on links instead of joining tables. Links can be implemented as pointers, and following a pointer is a constant time operation, whereas joining tables is more complicated and slower, even in the presence of foreign keys. Therefore, link-based navigation has been seen as a more efficient query approach than using join operations on tables. Existing studies strongly support this assumption. However, query complexity has received less attention. For example, in enterprise information systems, queries are usually complex so data need to be collected from several tables or by traversing paths of graph nodes of different types. In the present study, we compared the query performance of a graph-based database system (Neo4j) and relational database systems (MySQL and MariaDB). The effect of different efficiency issues (e.g., indexing and optimization) were included in the comparison in order to investigate the most efficient solutions for different query types. The outcome is that although Neo4j is more efficient for simple queries, MariaDB is essentially more efficient when the complexity of queries increases. The study also highlighted how dramatically the efficiency of relational database has grown during the last decade.

5 citations


Journal ArticleDOI
20 Nov 2022
TL;DR: PG-Schema as mentioned in this paper is a formalism for specifying graph schemas with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism.
Abstract: Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Schema with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.

5 citations



Proceedings ArticleDOI
01 May 2022
TL;DR: Horae is proposed, a novel graph stream summarization structure for efficient temporal range query, which presents a time prefix embedded multi-layer summarizing structure and an efficient Binary Range Decomposition algorithm, which achieves a logarithmic scale query processing time.
Abstract: Graph stream, referred to as an evolving graph with a timing sequence of updated edges through a continuous stream, is an emerging data format widely used in big data applications. Coping with a graph stream is challenging because: 1) fully storing the continuously produced and extremely large-scale datasets is difficult if not impossible; 2) supporting queries relevant to both graph topology and temporal information is nontrivial. Recently, graph stream summarization techniques have attracted much attention in providing approximate storage and query processing for a graph stream. Existing designs largely utilize hash functions to reduce the graph scale and leverage a compressive matrix to represent the graph stream. However, such designs are unable to store the time dimension information of graph streams, and thus fail to support temporal queries. In this paper, we propose Horae, a novel graph stream summarization structure for efficient temporal range query, which presents a time prefix embedded multi-layer summarization structure. Our design is based on the insight that an arbitrary temporal range of length $L$ can be decomposed to at most $2\log L$ sub-ranges, where all the time points in each sub-range have the same binary code prefix. We further design an efficient Binary Range Decomposition (BRD) algorithm, which achieves a logarithmic scale query processing time. Experimental results show that Horae significantly reduces the latency of various temporal range queries by two to three orders of magnitude compared to the state-of-the-art designs.

Journal ArticleDOI
TL;DR: In this paper , a graph data model of non-small cell lung cancer clinical and genomic data has been constructed with two aims: (1) provide a suitable model for facilitating graph analytics within the Neo4j framework or through tools which can interact through existing Neo4J APIs; and (2) provided a base model extensible to other cancer types and additional datasets such as those derived from electronic health records and other real world sources.
Abstract: A novel graph data model of non-small cell lung cancer clinical and genomic data has been constructed with two aims: (1) provide a suitable model for facilitating graph analytics within the Neo4j framework or through tools which can interact through existing Neo4j APIs; and (2) provide a base model extensible to other cancer types and additional datasets such as those derived from electronic health records and other real world sources.Clinical and genomic data integrated with a novel property graph database schema from publicly available datasets and analyses based on The Cancer Genome Atlas lung cancer datasets augmented by with subgraphs patient-patient social network from similarity and correlation as well as individual based biological networks.

Journal ArticleDOI
TL;DR: In this article , a graph data model of non-small cell lung cancer clinical and genomic data has been constructed with two aims: (1) provide a suitable model for facilitating graph analytics within the Neo4j framework or through tools which can interact through existing Neo4J APIs; and (2) provided a base model extensible to other cancer types and additional datasets such as those derived from electronic health records and other real world sources.
Abstract: A novel graph data model of non-small cell lung cancer clinical and genomic data has been constructed with two aims: (1) provide a suitable model for facilitating graph analytics within the Neo4j framework or through tools which can interact through existing Neo4j APIs; and (2) provide a base model extensible to other cancer types and additional datasets such as those derived from electronic health records and other real world sources.Clinical and genomic data integrated with a novel property graph database schema from publicly available datasets and analyses based on The Cancer Genome Atlas lung cancer datasets augmented by with subgraphs patient-patient social network from similarity and correlation as well as individual based biological networks.

Journal ArticleDOI
TL;DR: In this article , the authors provide an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.
Abstract: Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies from an industry perspective. The purpose of this paper is to provide the research community with an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.

Journal ArticleDOI
TL;DR: In this paper , an LPG-based graph database is proposed for BIM/GIS data integration, which can facilitate the use of graph technology in BIM and GIS integration.
Abstract: Information exchange between building information modelling (BIM) and geographic information system (GIS) is problematic, especially in terms of semantic information. Graph-based technologies, such as the resource description framework (RDF) and the labelled property graph (LPG), are promising in solving this problem. These two technologies are different but have not been systematically investigated in the context of BIM/GIS integration. This paper presents our systematic investigation into these two technologies, trying to propose the proper one for BIM/GIS data integration. The main findings are as follows. (1) Both LPG-based databases and RDF-based databases can be generally considered graph databases, but an LPG-based database is considered a native graph database, while an RDF-based database is not. (2) RDF suits applications focusing more on linking data and sharing data, and (3) LPG-based graph database suits applications focusing more on data query and analysis. An LPG-based graph database is thus proposed for BIM/GIS data integration. This review can facilitate the use of graph technology in BIM/GIS integration.

Journal ArticleDOI
TL;DR: Genomic Knowledgebase (GenomicKB) as discussed by the authors is a graph database for researchers to explore and investigate human genome, epigenome, transcriptome, and 4D nucleome with simple and efficient queries.
Abstract: Abstract Genomic Knowledgebase (GenomicKB) is a graph database for researchers to explore and investigate human genome, epigenome, transcriptome, and 4D nucleome with simple and efficient queries. The database uses a knowledge graph to consolidate genomic datasets and annotations from over 30 consortia and portals, including 347 million genomic entities, 1.36 billion relations, and 3.9 billion entity and relation properties. GenomicKB is equipped with a web-based query system (https://gkb.dcmb.med.umich.edu/) which allows users to query the knowledge graph with customized graph patterns and specific constraints on entities and relations. Compared with traditional tabular-structured data stored in separate data portals, GenomicKB emphasizes the relations among genomic entities, intuitively connects isolated data matrices, and supports efficient queries for scientific discoveries. GenomicKB transforms complicated analysis among multiple genomic entities and relations into coding-free queries, and facilitates data-driven genomic discoveries in the future.

Book ChapterDOI
01 Jan 2022
TL;DR: Wang et al. as mentioned in this paper proposed KGMR (Knowledge Graph for Movie Recommendation) algorithm which uses a large amount of heterogeneous information provided by the multi-modal knowledge graph to design the semantic types and semantic relations in the movie field.
Abstract: Knowledge-graph-aware recommendation often overlooks the importance of unstructured information, such as text information and picture information. In order to address the problem, we propose KGMR (Knowledge Graph for Movie Recommendation) algorithm which uses a large amount of heterogeneous information provided by the multi-modal knowledge graph to design the semantic types and semantic relations in the movie field. Constructed a knowledge graph in the film domain, and visualized it with Neo4j graph database. The text information and picture information are respectively represented by Doc2Vec and Convolutional Auto-Encode (CAE) algorithms, combined with Bayesian Personalized Ranking (BPR) to construct a recommendation system. Using the MovieLens-latest-small data set for testing, the results show that KGMR has a good improvement in the evaluation value, which proves that the knowledge graph of movies is integrated into the recommendation algorithm, and movies can be accurately recommended to target users.

Journal ArticleDOI
TL;DR: In this article , a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G) is proposed, which allows conventional star and snowflake schemas to fit into NoSQL graph databases.
Abstract: Nowadays, the data used for decision-making come from a wide variety of sources which are difficult to manage using relational databases. To address this problem, many researchers have turned to Not only SQL (NoSQL) databases to provide scalability and flexibility for On-Line Analytical Processing (OLAP) systems. In this paper, we propose a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G). These rules allow conventional star and snowflake schemas to fit into NoSQL graph databases. We apply the proposed rules to implement star-like and snowflake-like graph data warehouses. We compare their performances to similar relational ones focusing on the data model, dimensionality, and size. The experimental results show large differences between relational and graph implementations of a data warehouse. A relational implementation performs better for queries on a couple of tables, but conversely, a graph implementation is better when queries involve many tables. Surprisingly the performances of a star-like and snowflake-like graph data warehouses are very close. Hence a snowflake schema could be used in order to easily consider new sub-dimensions in a graph data warehouse.

Proceedings ArticleDOI
13 May 2022
TL;DR: This article will discuss how to use Python to visualize unstructured data, then, the processed data is imported into the neo4j database through Cypher syntax, and the knowledge graph of object relationship, organization relationship and character relationship is generated, which is easy to observe.
Abstract: Nowadays people have entered the era of big data, unstructured data becomes a vast sea of smoke. In the process of working in each line, unstructured data accounts for a high proportion, it takes a lot of human resources to analyze these unstructured data. For massive unstructured data, this article will discuss how to use Python to visualize unstructured data, then, the processed data is imported into the neo4j database through Cypher syntax, finally, the knowledge graph of object relationship, organization relationship and character relationship is generated, which is easy to observe.

Proceedings ArticleDOI
12 Jun 2022
TL;DR: This paper initiates the study of data-path query languages in the classic setting of embedded finite model theory, wherein each graph is "embedded" into a background infinite structure (with a decidable FO theory or fragments thereof) by proposing an extension of register automata by allowing powerful constraints over the theory and the database as guards.
Abstract: This paper initiates the study of data-path query languages (in particular, regular data path queries (RDPQ) and conjunctive RDPQ (CRDPQ)) in the classic setting of embedded finite model theory, wherein each graph is "embedded" into a background infinite structure (with a decidable FO theory or fragments thereof). Our goal is to address the current lack of support for typed attribute data (e.g. integer arithmetics) in existing data-path query languages, which are crucial in practice. We propose an extension of register automata by allowing powerful constraints over the theory and the database as guards, and having two types of registers: registers that can store values from the active domain, and read-only registers that can store arbitrary values. We prove NL data complexity for (C)RDPQ over the Presburger arithmetic, the real-closed field, the existential theory of automatic structures and word equations with regular constraints. All these results strictly extend the known NL data complexity of RDPQ with only equality comparisons, and provides an answer to a recent open problem posed by Libkin et al. Among others, we introduce one crucial proof technique for obtaining NL data complexity for data path queries over embedded graph databases called "Restricted Register Collapse (RRC)", inspired by the notion of Restricted Quantifier Collapse (RQC) in embedded finite model theory.

Journal ArticleDOI
TL;DR: In this paper , a formal algebra for specifying and generating graph schema for labeled property graph databases is presented, and the use of generated graph schemas to systematically transform and load data sets related to domains of cyber-physical systems, big data analytics and tourism.
Abstract: Abstract Contemporary labeled property graph databases are either schema-less or schema-optional to support frequent changes in the structure of data found in domains requiring high flexibility. However, the lack of structure impacts data transformation and loading operations from heterogeneous sources into graph databases. We present a formal algebra for specifying and generating graph schema for labeled property graph databases. We formally define and demonstrate the use of generated graph schemas to systematically transform and load data-sets related to domains of cyber-physical systems, big data analytics and tourism. Findings from three disparate case studies show that -generated schemas assist in enforcing integrity constraints that reduce the chance of data corruption, hence assuring data consistency and integrity.

Book ChapterDOI
TL;DR: In this article , the authors investigate the theoretical feasibility of realising ReBAC systems using off-the-shelf graph database technology and propose a unified framework through which they characterise and compare existing relationship-based access control models.
Abstract: Relationship-Based Access Control (ReBAC) is a paradigm to specify access constraints in terms of interpersonal relationships. To express these graph-like constraints, a variety of ReBAC models with varying features and ad-hoc implementations have been proposed. In this work, we investigate the theoretical feasibility of realising ReBAC systems using off-the-shelf graph database technology and propose a unified framework through which we characterise and compare existing ReBAC models. To this end, we formalise a ReBAC specific query language, ReLOG, an extension to regular graph queries over property graphs. We show that existing ReBAC models are instantiations of queries over property graphs, laying a foundation for the design of ReBAC mechanisms based on graph database technology.

Proceedings ArticleDOI
12 Jun 2022
TL;DR: A translation between a formalism for dynamic programming over hypergraphs and the computation of semiring-based provenance for Datalog programs for specific classes of semirings is established, which is applied to provenance-aware querying of graph databases.
Abstract: We establish a translation between a formalism for dynamic programming over hypergraphs and the computation of semiring-based provenance for Datalog programs. The benefit of this translation is a new method for computing the provenance of Datalog programs for specific classes of semirings, which we apply to provenance-aware querying of graph databases. Theoretical results and practical optimizations lead to an efficient implementation using Soufflé, a state-of-the-art Datalog interpreter. Experimental results on real-world data suggest this approach to be efficient in practical contexts, competing with dedicated solutions for graphs.

Journal ArticleDOI
TL;DR: A survey of personalized graph queries can be found in this article , where the purpose is to compute personalized results which can meet the preferences of different users from the three aspects of specified query vertices, structures, and attributes.

Journal ArticleDOI
TL;DR: OneGraph as discussed by the authors is a single unified graph data model that embraces both RDF and LPGs, which aims to achieve interoperability at both data level and query level by enabling queries and updates over the unified data model with a query language of choice.
Abstract: Amazon Neptune is a graph database service that supports two graph models: W3C’s Resource Description Framework (RDF) and Labeled Property Graphs (LPG). Customers choose one or the other model. This choice determines which data modeling features can be used and – perhaps more importantly – which query languages are available. The choice between the two technology stacks is difficult and time consuming. It requires consideration of data modeling aspects, query language features, their adequacy for current and future use cases, as well as developer knowledge. Even in cases where customers evaluate the pros and cons and make a conscious choice that fits their use case, over time we often see requirements from new use cases emerge that could be addressed more easily with a different data model or query language. It is therefore highly desirable that the choice of the query language can be made without consideration of what graph model is chosen and can be easily revised or complemented at a later point. To this end, we advocate and explore the idea of OneGraph (“1G” for short), a single, unified graph data model that embraces both RDF and LPGs. The goal of 1G is to achieve interoperability at both data level, by supporting the co-existence of RDF and LPG in the same database, as well as query level, by enabling queries and updates over the unified data model with a query language of choice. In this paper, we sketch our vision and investigate technical challenges towards a unification of the two graph data models.

Journal ArticleDOI
TL;DR: In this paper , a formal algebra for specifying and generating graph schema for labeled property graph databases is presented, and the use of generated graph schemas to systematically transform and load data sets related to domains of cyber-physical systems, big data analytics and tourism.
Abstract: Abstract Contemporary labeled property graph databases are either schema-less or schema-optional to support frequent changes in the structure of data found in domains requiring high flexibility. However, the lack of structure impacts data transformation and loading operations from heterogeneous sources into graph databases. We present a formal algebra for specifying and generating graph schema for labeled property graph databases. We formally define and demonstrate the use of generated graph schemas to systematically transform and load data-sets related to domains of cyber-physical systems, big data analytics and tourism. Findings from three disparate case studies show that -generated schemas assist in enforcing integrity constraints that reduce the chance of data corruption, hence assuring data consistency and integrity.

Proceedings ArticleDOI
01 May 2022
TL;DR: The design of a temporal graph management system Clock-G is discussed and a new space-efficient storage technique δ-Copy+Log is introduced, designed by the devel-opers of the Thing'in platform and is currently being deployed into production.
Abstract: IoT applications can be naturally modeled as a graph where the edges represent the interactions between devices, sensors, and their environment. Thing'in11https://www.thinginthefuture.com/ is a platform, initiated by Orange22Orange is a French multinational telecommunication operator. The platform manages a graph of millions of connected and non-connected objects using a commercial graph database. The graph of Thing'in is dynamic because loT devices create temporary connections between each other. Analyzing the history of these connections paves the way to new promising applications such as object tracking, anomaly detection, and forecasting the future behavior of devices. However, existing com-mercial graph databases are not designed with native temporal support which limits their usability in such use cases. In this paper, we discuss the design of a temporal graph management system Clock-G and introduce a new space-efficient storage technique δ-Copy+Log, Clock-G is designed by the devel-opers of the Thing'in platform and is currently being deployed into production. It differentiates from existing temporal graph management systems by adopting the δ-Copy+Log technique. This technique targets the mitigation of the apparent trade-off between the conflicting goals of the reduction of space usage and acceleration of query execution time. Our experimental results demonstrate that the δ-Copy+Log presents an overall better performance as compared to traditional storage methods in terms of space usage and query evaluation time.

Journal ArticleDOI
TL;DR: In this article , the authors discuss the future challenges of graph analytics and present several programming models, execution modes, and messaging strategies to improve the utilization of traditional hardware and performance of graph applications.
Abstract: Graph analytics, which mainly includes graph processing, graph mining, and graph learning, has become increasingly important in several domains, including social network analysis, bioinformatics, and machine learning. However, graph analytics applications suffer from poor locality, limited bandwidth, and low parallelism owing to the irregular sparse structure, explosive growth, and dependencies of graph data. To address those challenges, several programming models, execution modes, and messaging strategies are proposed to improve the utilization of traditional hardware and performance. In recent years, novel computing and memory devices have emerged, e.g., HMCs, HBM, and ReRAM, providing massive bandwidth and parallelism resources, making it possible to address bottlenecks in graph applications. To facilitate understanding of the graph analytics domain, our study summarizes and categorizes current software systems implementation and domain-specific architectures. Finally, we discuss the future challenges of graph analytics.

Journal ArticleDOI
TL;DR: Comparisons of graph database maturity, features, performance in standard tasks and the Object-Graph Mappers available to interact with each database in an Object-Oriented way show neomodel is the most mature solution, although it is also the least performing.
Abstract: The Portuguese General Directorate for Book, Archives and Libraries (DGLAB) has selected CIDOC CRM as the basis for its next-generation digital archive management software. Given the ontological foundations of the Conceptual Reference Model (CRM), a graph database or a triplestore was seen as the best candidate to represent a CRM-based data model for the new software. We thus decided to compare several of these databases, based on their maturity, features, performance in standard tasks and, most importantly, the Object-Graph Mappers (OGM) available to interact with each database in an object-oriented way. Our conclusions are drawn not only from a systematic review of related works but from an experimental scenario. For our experiment, we designed a simple CRM-compliant graph designed to test the ability of each OGM/database combination to tackle the so-called “diamond-problem” in Object-Oriented Programming (OOP) to ensure that property instances follow domain and range constraints. Our results show that (1) ontological consistency enforcement in graph databases and triplestores is much harder to achieve than in a relational database, making them more suited to an analytical rather than a transactional role; (2) OGMs are still rather immature solutions; and (3) neomodel, an OGM for the Neo4j graph database, is the most mature solution in the study as it satisfies all requirements, although it is also the least performing.