scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2018"


Proceedings ArticleDOI
27 May 2018
TL;DR: This work describes Cypher 9, which is the first version of the language governed by the openCypher Implementers Group, and introduces the language by example, and provides a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model.
Abstract: The Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is the first version of the language governed by the openCypher Implementers Group. We first introduce the language by example, and describe its uses in industry. We then provide a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model, and its ASCII Art graph pattern matching mechanism for expressing subgraphs of interest to an application. We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs.

353 citations


Journal ArticleDOI
TL;DR: The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery and the Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Abstract: Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways One of its main priorities is to provide easy and efficient access to its high quality curated data At present, biological pathway databases typically store their contents in relational databases This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data The same data in a graph database can be queried more efficiently Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery The adoption of this technology greatly improved query efficiency, reducing the average query time by 93% The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage By adopting graph database technology we are providing a high performance pathway data resource to the community The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types

324 citations


Journal ArticleDOI
TL;DR: Overall, users find the TensorFlow Graph Visualizer useful for understanding, debugging, and sharing the structures of their models.
Abstract: We present a design study of the TensorFlow Graph Visualizer, part of the TensorFlow machine intelligence platform. This tool helps users understand complex machine learning architectures by visualizing their underlying dataflow graphs. The tool works by applying a series of graph transformations that enable standard layout techniques to produce a legible interactive diagram. To declutter the graph, we decouple non-critical nodes from the layout. To provide an overview, we build a clustered graph using the hierarchical structure annotated in the source code. To support exploration of nested structure on demand, we perform edge bundling to enable stable and responsive cluster expansion. Finally, we detect and highlight repeated structures to emphasize a model's modular composition. To demonstrate the utility of the visualizer, we describe example usage scenarios and report user feedback. Overall, users find the visualizer useful for understanding, debugging, and sharing the structures of their models.

292 citations


Book
22 Jun 2018
TL;DR: This workshop provides a hands-on introduction to the popular open source graph database Neo4j through fixing a series of increasingly sophisticated, but broken, test cases each of which highlights an important graph modeling or API affordance.
Abstract: In this workshop we provide a hands-on introduction to the popular open source graph database Neo4j [1] through fixing a series of increasingly sophisticated, but broken, test cases each of which highlights an important graph modeling or API affordance.

266 citations


Journal ArticleDOI
TL;DR: A semantic query graph is proposed to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem and resolve the ambiguity of natural language questions at the time when matches of query are found.
Abstract: RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a natural language question, the existing work takes a two-stage approach: question understanding and query evaluation. Their focus is on question understanding to deal with the disambiguation of the natural language phrases. The most common technique is the joint disambiguation, which has the exponential search space. In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We propose a semantic query graph to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem. More importantly, we resolve the ambiguity of natural language questions at the time when matches of query are found. The cost of disambiguation is saved if there are no matching found. More specifically, we propose two different frameworks to build the semantic query graph, one is relation (edge)-first and the other one is node-first. We compare our method with some state-of-the-art RDF Q/A systems in the benchmark dataset. Extensive experiments confirm that our method not only improves the precision but also speeds up query performance greatly.

215 citations


Proceedings ArticleDOI
27 May 2018
TL;DR: G-CORE is reported on a community effort between industry and academia to shape the future of graph query languages, and strikes a careful balance between path query expressivity and evaluation complexity.
Abstract: We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.

101 citations


01 Jan 2018
TL;DR: This paper presents a formal definition of the property graph database model, which defines the property graphs data structure, basic notions of integrity constraints, and a graph query language.
Abstract: Most of the current graph database systems have been designed to support property graphs. Surprisingly, there is no standard specification of the database model behind such systems. This paper presents a formal definition of the property graph database model. Specifically, we define the property graph data structure, basic notions of integrity constraints (e.g. graph schema), and a graph query language.

94 citations


Journal ArticleDOI
TL;DR: A novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution.
Abstract: Zero-Shot Learning (ZSL) for visual recognition is typically achieved by exploiting a semantic embedding space. In such a space, both seen and unseen class labels as well as image features can be embedded so that the similarity among them can be measured directly. In this work, we consider that the key to effective ZSL is to compute an optimal distance metric in the semantic embedding space. Existing ZSL works employ either euclidean or cosine distances. However, in a high-dimensional space where the projected class labels (prototypes) are sparse, these distances are suboptimal, resulting in a number of problems including hubness and domain shift. To overcome these problems, a novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution. To further alleviate the domain shift problem, a new regularisation term is introduced into a ranking loss based embedding model. Specifically, the ranking loss objective is regularised by unseen class prototypes to prevent the projected object features from being biased towards the seen prototypes. Extensive experiments on four benchmarks show that our method significantly outperforms the state-of-the-art.

80 citations


Journal ArticleDOI
TL;DR: A machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels is presented and an important contribution of this work is the development of a new framework to design graph kernels.
Abstract: Using different methods for laying out a graph can lead to very different visual appearances, with which the viewer perceives different information. Selecting a “good” layout method is thus important for visualizing a graph. The selection can be highly subjective and dependent on the given task. A common approach to selecting a good layout is to use aesthetic criteria and visual inspection. However, fully calculating various layouts and their associated aesthetic metrics is computationally expensive. In this paper, we present a machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels. For a given graph, our approach can show what the graph would look like in different layouts and estimate their corresponding aesthetic metrics. An important contribution of our work is the development of a new framework to design graph kernels. Our experimental study shows that our estimation calculation is considerably faster than computing the actual layouts and their aesthetic metrics. Also, our graph kernels outperform the state-of-the-art ones in both time and accuracy. In addition, we conducted a user study to demonstrate that the topological similarity computed with our graph kernel matches perceptual similarity assessed by human users.

74 citations


Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper analyzes the most popular graph databases and studies the most important features for a complete and effective application, such as flexible schema, query language, sharding and scalability.
Abstract: Graph databases are a very powerful solution for storing and searching for data designed for data rich in relationships, such as Facebook and Twitter. With data multiplication and data type diversity there has been a need to create new storage and analysis platforms that structure irregular data with a flexible schema, maintaining a high level of performance and ensuring data scalability effectively, which is a problem that relational databases cannot handle. In this paper, we analyse the most popular graph databases: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J and OrientDB. We study the most important features for a complete and effective application, such as flexible schema, query language, sharding and scalability.

72 citations


Journal ArticleDOI
TL;DR: This survey reviews and analyzes the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability, and surveys applications that appear in recent distributed graph systems papers.
Abstract: Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

Journal ArticleDOI
TL;DR: The benchmark focuses on the performance of query evaluation, i.e. its execution time and memory consumption, with a particular emphasis on reevaluation, and can be adopted to various technologies and query engines, including modeling tools; relational, graph and semantic databases.
Abstract: In model-driven development of safety-critical systems (like automotive, avionics or railways), well-formedness of models is repeatedly validated in order to detect design flaws as early as possible. In many industrial tools, validation rules are still often implemented by a large amount of imperative model traversal code which makes those rule implementations complicated and hard to maintain. Additionally, as models are rapidly increasing in size and complexity, efficient execution of validation rules is challenging for the currently available tools. Checking well-formedness constraints can be captured by declarative queries over graph models, while model update operations can be specified as model transformations. This paper presents a benchmark for systematically assessing the scalability of validating and revalidating well-formedness constraints over large graph models. The benchmark defines well-formedness validation scenarios in the railway domain: a metamodel, an instance model generator and a set of well-formedness constraints captured by queries, fault injection and repair operations (imitating the work of systems engineers by model transformations). The benchmark focuses on the performance of query evaluation, i.e. its execution time and memory consumption, with a particular emphasis on reevaluation. We demonstrate that the benchmark can be adopted to various technologies and query engines, including modeling tools; relational, graph and semantic databases. The Train Benchmark is available as an open-source project with continuous builds from https://github.com/FTSRG/trainbenchmark .

Journal ArticleDOI
TL;DR: This work presents VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results, and demonstrates how it helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents.
Abstract: Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

Proceedings Article
11 Jul 2018
TL;DR: Wang et al. as mentioned in this paper proposed a query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation, and they deployed their system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events).
Abstract: The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semantics-agnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a real-world APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL).

Book ChapterDOI
01 Jan 2018
TL;DR: An overview about the foundations and systems for graph data management can be found in this paper, where the authors present a historical overview of the area, studied graph database models, characterized essential graph-oriented queries, reviewed graph query languages, and explore the features of current graph data-management systems.
Abstract: Graph data management concerns the research and development of powerful technologies for storing, processing and analyzing large volumes of graph data. This chapter presents an overview about the foundations and systems for graph data management. Specifically, we present a historical overview of the area, studied graph database models, characterized essential graph-oriented queries, reviewed graph query languages, and explore the features of current graph data management systems (i.e. graph databases and graph-processing frameworks).

Journal ArticleDOI
01 Dec 2018
TL;DR: A novel microbenchmarking framework is introduced that provides insights on graph database systems performance that go beyond what macro-benchmarks can offer, and includes the largest set of queries and operators so far considered.
Abstract: Despite the increasing interest in graph databases their requirements and specifications are not yet fully understood by everyone, leading to a great deal of variation in the supported functionalities and the achieved performances. In this work, we provide a comprehensive study of the existing graph database systems. We introduce a novel microbenchmarking framework that provides insights on their performance that go beyond what macro-benchmarks can offer. The framework includes the largest set of queries and operators so far considered. The graph database systems are evaluated on synthetic and real data, from different domains, and at scales much larger than any previous work. The framework is materialized as an open-source suite and is easily extended to new datasets, systems, and queries1.

Journal ArticleDOI
TL;DR: This paper describes how the architecture of a collaboration platform that allows companies to simulate and analyse the economic viability of establishing waste-to-resource exchanges in the By-product Exchange Network (BEN) model is enhanced with a database engine for waste- to-resource matching.

Journal ArticleDOI
TL;DR: It is shown how to generate " really hard " random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices.
Abstract: The subgraph isomorphism problem involves deciding whether a copy of a pattern graph occurs inside a larger target graph. The non-induced version allows extra edges in the target, whilst the induced version does not. Although both variants are NP-complete, algorithms inspired by constraint programming can operate comfortably on many real-world problem instances with thousands of vertices. However, they cannot handle arbitrary instances of this size. We show how to generate " really hard " random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices. For the non-induced version of the problem, these instances lie on a satisfiable / unsatisfiable phase transition, whose location we can predict; for the induced variant, much richer behaviour is observed, and constrained-ness gives a better measure of difficulty than does proximity to a phase transition. These results have practical consequences: we explain why the widely researched " filter / verify " indexing technique used in graph databases is founded upon a misunderstanding of the empirical hardness of NP-complete problems, and cannot be beneficial when paired with any reasonable subgraph isomorphism algorithm.

Journal ArticleDOI
TL;DR: A novel multimodal hashing method, termed as semantic neighbor graph hashing (SNGH), which aims to preserve the fine-grained similarity metric based on the semantic graph that is constructed by jointly pursuing the semantic supervision and the local neighborhood structure is proposed.
Abstract: Hashing methods have been widely used for approximate nearest neighbor search in recent years due to its computational and storage effectiveness. Most existing multimodal hashing methods try to preserve the similarity relationship based on either metric distances or semantic labels in a procrustean way, while ignoring the intra-class and inter-class variations inherent in the metric space. In this paper, we propose a novel multimodal hashing method, termed as semantic neighbor graph hashing (SNGH), which aims to preserve the fine-grained similarity metric based on the semantic graph that is constructed by jointly pursuing the semantic supervision and the local neighborhood structure. Specifically, the semantic graph is constructed to capture the local similarity structure for the image modality and the text modality, respectively. Furthermore, we define a function based on the local similarity in particular to adaptively calculate multi-level similarities by encoding the intra-class and inter-class variations. After obtaining the unified hash codes, the logistic regression with kernel trick is employed to learn view-specific hash functions independently for each modality. Extensive experiments are conducted on four widely used multimodal data sets. The experimental results demonstrate the superiority of the proposed SNGH method compared with the state-of-the-art multimodal hashing methods.

Journal ArticleDOI
01 Feb 2018
TL;DR: This article presents a partition-based approach to tackle threshold-based graph similarity search with edit distance constraints, by dividing data graphs into variable-size non-overlapping partitions, and develops efficient query processing algorithms based on the novel paradigm.
Abstract: Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This article mainly studies threshold-based graph similarity search with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees and distance thresholds. In this article, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size non-overlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the novel paradigm. Moreover, candidate-pruning techniques and an improved graph edit distance verification algorithm are developed to boost the performance. In addition, a cost-aware graph partitioning method is devised to optimize the index. Extending the partition-based filtering paradigm, we present a solution to the top- $$k$$ graph similarity search problem, where tailored filtering, look-ahead and computation-sharing strategies are exploited. Using both public real-life and synthetic datasets, extensive experiments demonstrate that our approaches significantly outperform the baseline and its alternatives.

Journal ArticleDOI
TL;DR: This paper reports the adoption of the proposed Smart City RDF Benchmark on the basis of Florence Smart City model, data sets and tools accessible as Km4City, and extends available RDF store benchmarks at the state the art.
Abstract: Smart cities are providing advanced services aggregating and exploiting data from different sources. Cities collect static data such as road graphs, service description, as well as dynamic/real time data like weather forecast, traffic sensors, bus positions, city sensors, events, emergency data, flows, etc. RDF stores may be used to set up knowledge bases integrating heterogeneous information for web and mobile applications to use the data for new advanced services to citizens and city administrators, thus exploiting inferential capabilities, temporal and spatial reasoning, and text indexing. In this paper, the needs and constraints for RDF stores to be used for smart cities services, together with the currently available RDF stores are evaluated. The assessment model allows a full understanding of whether an RDF store is suitable to be used as a basis for Smart City modeling and applications. The RDF assessment model is also supported by a benchmark which extends available RDF store benchmarks at the state the art. The comparison of the RDF stores has been applied on a number of well-known RDF stores as Virtuoso, GraphDB (former OWLIM), Oracle, StarDog, and many others. The paper also reports the adoption of the proposed Smart City RDF Benchmark on the basis of Florence Smart City model, data sets and tools accessible as Km4City Http://www.Km4City.org , and adopted in the European Commission international smart city projects named RESOLUTE H2020, REPLICATE H2020, and in Sii-Mobility National Smart City project in Italy.

Journal ArticleDOI
TL;DR: BioGraph implements state-of-the-art technologies and provides pre-compiled bioinformatics scenarios, as well as the possibility to perform custom queries and obtaining an interactive and dynamic visualization of results.
Abstract: Several online databases provide a large amount of biomedical data of different biological entities. These resources are typically stored in systems implementing their own data model, user interface and query language. On the other hand, in many bioinformatics scenarios there is often the need to use more than one resource. The availability of a single bioinformatics platform that integrates many biological resources and services is, for those reasons a fundamental issue. Here, we present BioGraph, a web application that allows to query, visualize and analyze biological data belonging to several online available sources. BioGraph is built upon our previously developed graph database called BioGraphDB, that integrates and stores heterogeneous biological resources and make them available by means of a common structure and a unique query language. BioGraph implements state-of-the-art technologies and provides pre-compiled bioinformatics scenarios, as well as the possibility to perform custom queries and obtaining an interactive and dynamic visualization of results. We present a case study about functional analysis of microRNA in breast cancer in order to demonstrate the functionalities of the system. BioGraph is freely available at http://biograph.pa.icar.cnr.it . Source files are available on GitHub at https://github.com/IcarPA-TBlab/BioGraph

Proceedings ArticleDOI
10 Jun 2018
TL;DR: An early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload, designed by taking into account technical "chokepoints" identified by database system architects from academia and industry.
Abstract: In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.

Journal ArticleDOI
A. H. Hor1, G. Sohn1, P. Claudio1, M. Jadidi1, A. Afnan1 
TL;DR: This paper is presenting an architectural design with complete implementation of BIM-GIS integrated RDF graph database and the workflows and transformations of IFC and CityGML schemas into object graph databases model are developed and applied to an intelligent urban mobility web application on a game engine platform validate the integration methodology.
Abstract: Over the recent years, the usage of semantic web technologies and Resources Description Framework (RDF) data models have been notably increased in many fields Multiple systems are using RDF data to describe information resources and semantic associations RDF data plays a very important role in advanced information retrieval, and graphs are efficient ways to visualize and represent real world data by providing solutions to many real-time scenarios that can be simulated and implemented using graph databases, and efficiently query graphs with multiple attributes representing different domains of knowledge Given that graph databases are schema less with efficient storage for semi-structured data, they can provide fast and deep traversals instead of slow RDBMS SQL based joins allowing Atomicity, Consistency, Isolation and durability (ACID) transactions with rollback support, and by utilizing mathematics of graph they can enormous potential for fast data extraction and storage of information in the form of nodes and relationships In this paper, we are presenting an architectural design with complete implementation of BIM-GIS integrated RDF graph database The proposed integration approach is composed of four main phases: ontological BIM and GIS model’s construction, mapping and semantic integration using interoperable data formats, then an import into a graph database with querying and filtering capabilities The workflows and transformations of IFC and CityGML schemas into object graph databases model are developed and applied to an intelligent urban mobility web application on a game engine platform validate the integration methodology

Book ChapterDOI
01 Jan 2018
TL;DR: This paper presents a long-term research challenge how to generate graph models specific to a domain which are consistent, diverse, scalable and realistic at the same time.
Abstract: Automated model generation can be highly beneficial for various application scenarios including software tool certification, validation of cyber-physical systems or benchmarking graph databases to avoid tedious manual synthesis of models. In the paper, we present a long-term research challenge how to generate graph models specific to a domain which are consistent, diverse, scalable and realistic at the same time.

Journal ArticleDOI
TL;DR: This work introduces a new logic of attributed graph properties, where the graph part and attribution part are neatly separated, and extends the refutationally complete tableau-based reasoning method as well as the symbolic model generation approach for graph properties to attribute graph properties.
Abstract: Graphs are ubiquitous in computer science. Moreover, in various application fields, graphs are equipped with attributes to express additional information such as names of entities or weights of relationships. Due to the pervasiveness of attributed graphs, it is highly important to have the means to express properties on attributed graphs to strengthen modeling capabilities and to enable analysis. Firstly, we introduce a new logic of attributed graph properties, where the graph part and attribution part are neatly separated. The graph part is equivalent to first-order logic on graphs as introduced by Courcelle. It employs graph morphisms to allow the specification of complex graph patterns. The attribution part is added to this graph part by reverting to the symbolic approach to graph attribution, where attributes are represented symbolically by variables whose possible values are specified by a set of constraints making use of algebraic specifications. Secondly, we extend our refutationally complete tableau-based reasoning method as well as our symbolic model generation approach for graph properties to attributed graph properties. Due to the new logic mentioned above, neatly separating the graph and attribution parts, and the categorical constructions employed only on a more abstract level, we can leave the graph part of the algorithms seemingly unchanged. For the integration of the attribution part into the algorithms, we use an oracle, allowing for flexible adoption of different available SMT solvers in the actual implementation. Finally, our automated reasoning approach for attributed graph properties is implemented in the tool AutoGraph integrating in particular the SMT solver Z3 for the attribute part of the properties. We motivate and illustrate our work with a particular application scenario on graph database query validation.

Posted Content
TL;DR: A novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation and surpasses existing systems in both efficiency and conciseness.
Abstract: The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semantics-agnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a real-world APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL).

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This work study RPQ evaluation for simple paths from a parameterized complexity perspective and defines a class of simple transitive expressions that is prominent in practice and for which it can prove a dichotomy for the evaluation problem.
Abstract: Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: A graph based power system modeling is explored and a graph computing based state estimation is proposed to speed up its performance and the testing results of IEEE 14-bus, IEEE 118-bus systems and a provincial system in China verify the accuracy and high-performance of the proposed methodology.
Abstract: With the increased complexity of power systems due to the integration of smart grid technologies and renewable energy resources, more frequent changes have been introduced to system status, and the traditional serial mode of state estimation algorithm cannot well meet the restrict time-constrained requirement for the future dynamic power grid, even with advanced computer hardware. To guarantee the grid’s reliability and minimize the impacts caused by system status fluctuations, a fast, even SCADA-rate, state estimator is urgently needed. In this paper, a graph based power system modeling is firstly explored and a graph computing based state estimation is proposed to speed up its performance. The power system is represented by a graph, which is a collection of vertices and edges, and the measurements are attributes of vertices and edges. Each vertex can independently implement local computation, like formulations of the node-based H matrix, gain matrix and right-hand-side (RHS) vector, only with the information on its connected edges and neighboring vertices. Then, by taking advantages of graph database, these node-based data are conveniently collected and stored in the compressed sparse row (CSR) format avoiding the complexity and heaviness introduced by the sparse matrices. With communications and synchronization, centralized computation of solving the weighted least square (WLS) state estimation is completed with hierarchical parallel computing. The proposed strategy is implemented on a graph database platform. The testing results of IEEE 14-bus, IEEE 118-bus systems and a provincial system in China verify the accuracy and high-performance of the proposed methodology.

Journal ArticleDOI
01 Aug 2018
TL;DR: G radoop is demonstrated, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing with a rich graph data model and powerful graph operators.
Abstract: We demonstrate Gradoop, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing. Using a rich graph data model and powerful graph operators, users can declaratively express graph analytical programs for distributed execution without needing advanced programming experience or a deeper understanding of the underlying system. Visitors of the demo can declare graph analytical programs using the Gradoop operators and also visually experience two of our advanced operators: graph pattern matching and graph grouping. We provide real world and artificial social network data with up to 10 billion edges and allow running the programs either locally or on a remote research cluster to demonstrate scalability.