Showing papers on "Graph database published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Cypher: An Evolving Query Language for Property Graphs

[...]

Nadime Francis¹, Alastair Green, Paolo Guagliardo², Leonid Libkin², Tobias Lindaaker, Victor Marsault², Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor - Show less +6 more•Institutions (2)

University of Paris¹, University of Edinburgh²

27 May 2018

TL;DR: This work describes Cypher 9, which is the first version of the language governed by the openCypher Implementers Group, and introduces the language by example, and provides a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model.

...read moreread less

Abstract: The Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is the first version of the language governed by the openCypher Implementers Group. We first introduce the language by example, and describe its uses in industry. We then provide a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model, and its ASCII Art graph pattern matching mechanism for expressing subgraphs of interest to an application. We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs.

...read moreread less

353 citations

Journal Article•DOI•

Reactome graph database: Efficient access to complex pathway data.

[...]

Antonio Fabregat¹, Florian Korninger¹, Guilherme Viteri¹, Konstantinos Sidiropoulos¹, Pablo Marin-Garcia², Peipei Ping³, Guanming Wu⁴, Lincoln Stein⁵, Lincoln Stein⁶, Peter D'Eustachio⁷, Henning Hermjakob¹, Henning Hermjakob⁸ - Show less +8 more•Institutions (8)

European Bioinformatics Institute¹, University of Valencia², University of California, Los Angeles³, Oregon Health & Science University⁴, Ontario Institute for Cancer Research⁵, University of Toronto⁶, New York University⁷, Protein Sciences⁸

29 Jan 2018-PLOS Computational Biology

TL;DR: The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery and the Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

...read moreread less

Abstract: Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways One of its main priorities is to provide easy and efficient access to its high quality curated data At present, biological pathway databases typically store their contents in relational databases This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data The same data in a graph database can be queried more efficiently Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery The adoption of this technology greatly improved query efficiency, reducing the average query time by 93% The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage By adopting graph database technology we are providing a high performance pathway data resource to the community The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types

...read moreread less

324 citations

Journal Article•DOI•

Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow

[...]

Kanit Wongsuphasawat, Daniel Smilkov¹, James Wexler¹, Jimbo Wilson¹, Dandelion Mane¹, Doug Fritz¹, Dilip Krishnan¹, Fernanda B. Viégas¹, Martin Wattenberg¹ - Show less +5 more•Institutions (1)

Google¹

01 Jan 2018-IEEE Transactions on Visualization and Computer Graphics

TL;DR: Overall, users find the TensorFlow Graph Visualizer useful for understanding, debugging, and sharing the structures of their models.

...read moreread less

Abstract: We present a design study of the TensorFlow Graph Visualizer, part of the TensorFlow machine intelligence platform. This tool helps users understand complex machine learning architectures by visualizing their underlying dataflow graphs. The tool works by applying a series of graph transformations that enable standard layout techniques to produce a legible interactive diagram. To declutter the graph, we decouple non-critical nodes from the layout. To provide an overview, we build a clustered graph using the hierarchical structure annotated in the source code. To support exploration of nested structure on demand, we perform edge bundling to enable stable and responsive cluster expansion. Finally, we detect and highlight repeated structures to emphasize a model's modular composition. To demonstrate the utility of the visualizer, we describe example usage scenarios and report user feedback. Overall, users find the visualizer useful for understanding, debugging, and sharing the structures of their models.

...read moreread less

292 citations

Book•

A Programmatic Introduction to Neo4j

[...]

Jim Webber

22 Jun 2018

TL;DR: This workshop provides a hands-on introduction to the popular open source graph database Neo4j through fixing a series of increasingly sophisticated, but broken, test cases each of which highlights an important graph modeling or API affordance.

...read moreread less

Abstract: In this workshop we provide a hands-on introduction to the popular open source graph database Neo4j [1] through fixing a series of increasingly sophisticated, but broken, test cases each of which highlights an important graph modeling or API affordance.

...read moreread less

266 citations

Journal Article•DOI•

Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs

[...]

Sen Hu¹, Lei Zou¹, Jeffrey Xu Yu², Haixun Wang³, Dongyan Zhao¹ - Show less +1 more•Institutions (3)

Peking University¹, The Chinese University of Hong Kong², Facebook³

01 May 2018-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A semantic query graph is proposed to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem and resolve the ambiguity of natural language questions at the time when matches of query are found.

...read moreread less

Abstract: RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a natural language question, the existing work takes a two-stage approach: question understanding and query evaluation. Their focus is on question understanding to deal with the disambiguation of the natural language phrases. The most common technique is the joint disambiguation, which has the exponential search space. In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We propose a semantic query graph to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem. More importantly, we resolve the ambiguity of natural language questions at the time when matches of query are found. The cost of disambiguation is saved if there are no matching found. More specifically, we propose two different frameworks to build the semantic query graph, one is relation (edge)-first and the other one is node-first. We compare our method with some state-of-the-art RDF Q/A systems in the benchmark dataset. Extensive experiments confirm that our method not only improves the precision but also speeds up query performance greatly.

...read moreread less

215 citations

Proceedings Article•DOI•

G-CORE: A Core for Future Graph Query Languages

[...]

Renzo Angles¹, Marcelo Arenas², Pablo Barceló³, Peter Boncz, George H. L. Fletcher⁴, Claudio Gutierrez³, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest⁵, Hannes Voigt⁶ - Show less +8 more•Institutions (6)

University of Talca¹, Pontifical Catholic University of Chile², University of Chile³, Eindhoven University of Technology⁴, Oracle Corporation⁵, Dresden University of Technology⁶

27 May 2018

TL;DR: G-CORE is reported on a community effort between industry and academia to shape the future of graph query languages, and strikes a careful balance between path query expressivity and evaluation complexity.

...read moreread less

Abstract: We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.

...read moreread less

101 citations

The Property Graph Database Model.

[...]

Renzo Angles

01 Jan 2018

TL;DR: This paper presents a formal definition of the property graph database model, which defines the property graphs data structure, basic notions of integrity constraints, and a graph query language.

...read moreread less

Abstract: Most of the current graph database systems have been designed to support property graphs. Surprisingly, there is no standard specification of the database model behind such systems. This paper presents a formal definition of the property graph database model. Specifically, we define the property graph data structure, basic notions of integrity constraints (e.g. graph schema), and a graph query language.

...read moreread less

94 citations

Journal Article•DOI•

Zero-Shot Learning on Semantic Class Prototype Graph

[...]

Zhenyong Fu¹, Tao Xiang¹, Elyor Kodirov¹, Shaogang Gong¹•Institutions (1)

Queen Mary University of London¹

01 Aug 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution.

...read moreread less

Abstract: Zero-Shot Learning (ZSL) for visual recognition is typically achieved by exploiting a semantic embedding space. In such a space, both seen and unseen class labels as well as image features can be embedded so that the similarity among them can be measured directly. In this work, we consider that the key to effective ZSL is to compute an optimal distance metric in the semantic embedding space. Existing ZSL works employ either euclidean or cosine distances. However, in a high-dimensional space where the projected class labels (prototypes) are sparse, these distances are suboptimal, resulting in a number of problems including hubness and domain shift. To overcome these problems, a novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution. To further alleviate the domain shift problem, a new regularisation term is introduced into a ranking loss based embedding model. Specifically, the ranking loss objective is regularised by unseen class prototypes to prevent the projected object features from being biased towards the seen prototypes. Extensive experiments on four benchmarks show that our method significantly outperforms the state-of-the-art.

...read moreread less

80 citations

Journal Article•DOI•

What Would a Graph Look Like in this Layout? A Machine Learning Approach to Large Graph Visualization

[...]

Oh-Hyun Kwon¹, Tarik Crnovrsanin¹, Kwan-Liu Ma¹•Institutions (1)

University of California, Davis¹

01 Jan 2018-IEEE Transactions on Visualization and Computer Graphics

TL;DR: A machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels is presented and an important contribution of this work is the development of a new framework to design graph kernels.

...read moreread less

Abstract: Using different methods for laying out a graph can lead to very different visual appearances, with which the viewer perceives different information. Selecting a “good” layout method is thus important for visualizing a graph. The selection can be highly subjective and dependent on the given task. A common approach to selecting a good layout is to use aesthetic criteria and visual inspection. However, fully calculating various layouts and their associated aesthetic metrics is computationally expensive. In this paper, we present a machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels. For a given graph, our approach can show what the graph would look like in different layouts and estimate their corresponding aesthetic metrics. An important contribution of our work is the development of a new framework to design graph kernels. Our experimental study shows that our estimation calculation is considerably faster than computing the actual layouts and their aesthetic metrics. Also, our graph kernels outperform the state-of-the-art ones in both time and accuracy. In addition, we conducted a user study to demonstrate that the topological similarity computed with our graph kernel matches perceptual similarity assessed by human users.

...read moreread less

74 citations

Proceedings Article•DOI•

Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB.

[...]

Diogo A. B. Fernandes, Jorge Bernardino¹•Institutions (1)

University of Coimbra¹

01 Jan 2018

TL;DR: This paper analyzes the most popular graph databases and studies the most important features for a complete and effective application, such as flexible schema, query language, sharding and scalability.

...read moreread less

Abstract: Graph databases are a very powerful solution for storing and searching for data designed for data rich in relationships, such as Facebook and Twitter. With data multiplication and data type diversity there has been a need to create new storage and analysis platforms that structure irregular data with a flexible schema, maintaining a high level of performance and ensuring data scalability effectively, which is a problem that relational databases cannot handle. In this paper, we analyse the most popular graph databases: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J and OrientDB. We study the most important features for a complete and effective application, such as flexible schema, query language, sharding and scalability.

...read moreread less

72 citations

Journal Article•DOI•

High-Level Programming Abstractions for Distributed Graph Processing

[...]

Vasiliki Kalavri¹, Vladimir Vlassov², Seif Haridi²•Institutions (2)

ETH Zurich¹, Royal Institute of Technology²

01 Feb 2018-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey reviews and analyzes the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability, and surveys applications that appear in recent distributed graph systems papers.

...read moreread less

Abstract: Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

...read moreread less

Journal Article•DOI•

The Train Benchmark: cross-technology performance evaluation of continuous model queries

[...]

Gábor Szárnyas¹, Gábor Szárnyas², Benedek Izsó², István Ráth², Dániel Varró², Dániel Varró¹ - Show less +2 more•Institutions (2)

McGill University¹, Budapest University of Technology and Economics²

01 Jan 2018-Software and Systems Modeling

TL;DR: The benchmark focuses on the performance of query evaluation, i.e. its execution time and memory consumption, with a particular emphasis on reevaluation, and can be adopted to various technologies and query engines, including modeling tools; relational, graph and semantic databases.

...read moreread less

Abstract: In model-driven development of safety-critical systems (like automotive, avionics or railways), well-formedness of models is repeatedly validated in order to detect design flaws as early as possible. In many industrial tools, validation rules are still often implemented by a large amount of imperative model traversal code which makes those rule implementations complicated and hard to maintain. Additionally, as models are rapidly increasing in size and complexity, efficient execution of validation rules is challenging for the currently available tools. Checking well-formedness constraints can be captured by declarative queries over graph models, while model update operations can be specified as model transformations. This paper presents a benchmark for systematically assessing the scalability of validating and revalidating well-formedness constraints over large graph models. The benchmark defines well-formedness validation scenarios in the railway domain: a metamodel, an instance model generator and a set of well-formedness constraints captured by queries, fault injection and repair operations (imitating the work of systems engineers by model transformations). The benchmark focuses on the performance of query evaluation, i.e. its execution time and memory consumption, with a particular emphasis on reevaluation. We demonstrate that the benchmark can be adopted to various technologies and query engines, including modeling tools; relational, graph and semantic databases. The Train Benchmark is available as an open-source project with continuous builds from https://github.com/FTSRG/trainbenchmark .

...read moreread less

Journal Article•DOI•

VIGOR: Interactive Visual Exploration of Graph Query Results

[...]

Robert Pienta¹, Fred Hohman¹, Alex Endert¹, Acar Tamersoy², Kevin Roundy², Christopher Gates², Shamkant B. Navathe¹, Duen Horng Chau¹ - Show less +4 more•Institutions (2)

Georgia Institute of Technology¹, Symantec²

01 Jan 2018-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This work presents VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results, and demonstrates how it helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents.

...read moreread less

Abstract: Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

...read moreread less

Proceedings Article•

AIQL: enabling efficient attack investigation from system monitoring data

[...]

Peng Gao¹, Xusheng Xiao², Zhichun Li, Kangkook Jee, Fengyuan Xu³, Sanjeev R. Kulkarni¹, Prateek Mittal¹ - Show less +3 more•Institutions (3)

Princeton University¹, Case Western Reserve University², Nanjing University³

11 Jul 2018

TL;DR: Wang et al. as mentioned in this paper proposed a query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation, and they deployed their system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events).

...read moreread less

Abstract: The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semantics-agnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a real-world APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL).

...read moreread less

Book Chapter•DOI•

An Introduction to Graph Data Management

[...]

Renzo Angles¹, Claudio Gutierrez², Claudio Gutierrez³•Institutions (3)

University of Talca¹, Millennium Institute², University of Chile³

01 Jan 2018

TL;DR: An overview about the foundations and systems for graph data management can be found in this paper, where the authors present a historical overview of the area, studied graph database models, characterized essential graph-oriented queries, reviewed graph query languages, and explore the features of current graph data-management systems.

...read moreread less

Abstract: Graph data management concerns the research and development of powerful technologies for storing, processing and analyzing large volumes of graph data. This chapter presents an overview about the foundations and systems for graph data management. Specifically, we present a historical overview of the area, studied graph database models, characterized essential graph-oriented queries, reviewed graph query languages, and explore the features of current graph data management systems (i.e. graph databases and graph-processing frameworks).

...read moreread less

Journal Article•DOI•

Beyond macrobenchmarks: microbenchmark-based graph database evaluation

[...]

Matteo Lissandrini¹, Martin Brugnara², Yannis Velegrakis²•Institutions (2)

Aalborg University¹, University of Trento²

01 Dec 2018

TL;DR: A novel microbenchmarking framework is introduced that provides insights on graph database systems performance that go beyond what macro-benchmarks can offer, and includes the largest set of queries and operators so far considered.

...read moreread less

Abstract: Despite the increasing interest in graph databases their requirements and specifications are not yet fully understood by everyone, leading to a great deal of variation in the supported functionalities and the achieved performances. In this work, we provide a comprehensive study of the existing graph database systems. We introduce a novel microbenchmarking framework that provides insights on their performance that go beyond what macro-benchmarks can offer. The framework includes the largest set of queries and operators so far considered. The graph database systems are evaluated on synthetic and real data, from different domains, and at scales much larger than any previous work. The framework is materialized as an open-source suite and is easily extended to new datasets, systems, and queries1.

...read moreread less

Journal Article•DOI•

A Collaboration Platform for Enabling Industrial Symbiosis: Application of the Database Engine for Waste-to-Resource Matching

[...]

Jonathan Sze Choong Low, Tobias Bestari Tjandra, Fajrian Yunus¹, Si Ying Chung, Daren Zong Loong Tan, Benjamin Raabe², Ng Yen Ting, Zhiquan Yeo, Stéphane Bressan¹, Seeram Ramakrishna¹, Christoph Herrmann² - Show less +7 more•Institutions (2)

National University of Singapore¹, Braunschweig University of Technology²

01 Jan 2018-Procedia CIRP

TL;DR: This paper describes how the architecture of a collaboration platform that allows companies to simulate and analyse the economic viability of establishing waste-to-resource exchanges in the By-product Exchange Network (BEN) model is enhanced with a database engine for waste- to-resource matching.

...read moreread less

Journal Article•DOI•

When Subgraph Isomorphism is Really Hard, and Why This Matters for Graph Databases

[...]

Ciaran McCreesh¹, Patrick Prosser¹, Christine Solnon, James Trimble¹•Institutions (1)

University of Glasgow¹

30 Mar 2018-Journal of Artificial Intelligence Research

TL;DR: It is shown how to generate " really hard " random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices.

...read moreread less

Abstract: The subgraph isomorphism problem involves deciding whether a copy of a pattern graph occurs inside a larger target graph. The non-induced version allows extra edges in the target, whilst the induced version does not. Although both variants are NP-complete, algorithms inspired by constraint programming can operate comfortably on many real-world problem instances with thousands of vertices. However, they cannot handle arbitrary instances of this size. We show how to generate " really hard " random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices. For the non-induced version of the problem, these instances lie on a satisfiable / unsatisfiable phase transition, whose location we can predict; for the induced variant, much richer behaviour is observed, and constrained-ness gives a better measure of difficulty than does proximity to a phase transition. These results have practical consequences: we explain why the widely researched " filter / verify " indexing technique used in graph databases is founded upon a misunderstanding of the empirical hardness of NP-complete problems, and cannot be beneficial when paired with any reasonable subgraph isomorphism algorithm.

...read moreread less

Journal Article•DOI•

Semantic Neighbor Graph Hashing for Multimodal Retrieval

[...]

Lu Jin¹, Kai Li², Hao Hu², Guo-Jun Qi², Jinhui Tang¹ - Show less +1 more•Institutions (2)

Nanjing University of Science and Technology¹, University of Central Florida²

01 Mar 2018-IEEE Transactions on Image Processing

TL;DR: A novel multimodal hashing method, termed as semantic neighbor graph hashing (SNGH), which aims to preserve the fine-grained similarity metric based on the semantic graph that is constructed by jointly pursuing the semantic supervision and the local neighborhood structure is proposed.

...read moreread less

Abstract: Hashing methods have been widely used for approximate nearest neighbor search in recent years due to its computational and storage effectiveness. Most existing multimodal hashing methods try to preserve the similarity relationship based on either metric distances or semantic labels in a procrustean way, while ignoring the intra-class and inter-class variations inherent in the metric space. In this paper, we propose a novel multimodal hashing method, termed as semantic neighbor graph hashing (SNGH), which aims to preserve the fine-grained similarity metric based on the semantic graph that is constructed by jointly pursuing the semantic supervision and the local neighborhood structure. Specifically, the semantic graph is constructed to capture the local similarity structure for the image modality and the text modality, respectively. Furthermore, we define a function based on the local similarity in particular to adaptively calculate multi-level similarities by encoding the intra-class and inter-class variations. After obtaining the unified hash codes, the logistic regression with kernel trick is employed to learn view-specific hash functions independently for each modality. Extensive experiments are conducted on four widely used multimodal data sets. The experimental results demonstrate the superiority of the proposed SNGH method compared with the state-of-the-art multimodal hashing methods.

...read moreread less

Journal Article•DOI•

Efficient structure similarity searches: a partition-based approach

[...]

Xiang Zhao¹, Chuan Xiao², Xuemin Lin³, Wenjie Zhang³, Yang Wang³ - Show less +1 more•Institutions (3)

National University of Defense Technology¹, Nagoya University², University of New South Wales³

01 Feb 2018

TL;DR: This article presents a partition-based approach to tackle threshold-based graph similarity search with edit distance constraints, by dividing data graphs into variable-size non-overlapping partitions, and develops efficient query processing algorithms based on the novel paradigm.

...read moreread less

Abstract: Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This article mainly studies threshold-based graph similarity search with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees and distance thresholds. In this article, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size non-overlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the novel paradigm. Moreover, candidate-pruning techniques and an improved graph edit distance verification algorithm are developed to boost the performance. In addition, a cost-aware graph partitioning method is devised to optimize the index. Extending the partition-based filtering paradigm, we present a solution to the top- $$k$$ graph similarity search problem, where tailored filtering, look-ahead and computation-sharing strategies are exploited. Using both public real-life and synthetic datasets, extensive experiments demonstrate that our approaches significantly outperform the baseline and its alternatives.

...read moreread less

Journal Article•DOI•

Performance assessment of RDF graph databases for smart city services

[...]

Pierfrancesco Bellini¹, Paolo Nesi¹•Institutions (1)

University of Florence¹

01 Apr 2018-Journal of Visual Languages and Computing

TL;DR: This paper reports the adoption of the proposed Smart City RDF Benchmark on the basis of Florence Smart City model, data sets and tools accessible as Km4City, and extends available RDF store benchmarks at the state the art.

...read moreread less

Abstract: Smart cities are providing advanced services aggregating and exploiting data from different sources. Cities collect static data such as road graphs, service description, as well as dynamic/real time data like weather forecast, traffic sensors, bus positions, city sensors, events, emergency data, flows, etc. RDF stores may be used to set up knowledge bases integrating heterogeneous information for web and mobile applications to use the data for new advanced services to citizens and city administrators, thus exploiting inferential capabilities, temporal and spatial reasoning, and text indexing. In this paper, the needs and constraints for RDF stores to be used for smart cities services, together with the currently available RDF stores are evaluated. The assessment model allows a full understanding of whether an RDF store is suitable to be used as a basis for Smart City modeling and applications. The RDF assessment model is also supported by a benchmark which extends available RDF store benchmarks at the state the art. The comparison of the RDF stores has been applied on a number of well-known RDF stores as Virtuoso, GraphDB (former OWLIM), Oracle, StarDog, and many others. The paper also reports the adoption of the proposed Smart City RDF Benchmark on the basis of Florence Smart City model, data sets and tools accessible as Km4City Http://www.Km4City.org , and adopted in the European Commission international smart city projects named RESOLUTE H2020, REPLICATE H2020, and in Sii-Mobility National Smart City project in Italy.

...read moreread less

Journal Article•DOI•

BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources

[...]

Antonio Messina¹, Antonino Fiannaca¹, Laura La Paglia¹, Massimo La Rosa¹, Alfonso Urso¹ - Show less +1 more•Institutions (1)

Indian Council of Agricultural Research¹

20 Nov 2018-BMC Systems Biology

TL;DR: BioGraph implements state-of-the-art technologies and provides pre-compiled bioinformatics scenarios, as well as the possibility to perform custom queries and obtaining an interactive and dynamic visualization of results.

...read moreread less

Abstract: Several online databases provide a large amount of biomedical data of different biological entities. These resources are typically stored in systems implementing their own data model, user interface and query language. On the other hand, in many bioinformatics scenarios there is often the need to use more than one resource. The availability of a single bioinformatics platform that integrates many biological resources and services is, for those reasons a fundamental issue. Here, we present BioGraph, a web application that allows to query, visualize and analyze biological data belonging to several online available sources. BioGraph is built upon our previously developed graph database called BioGraphDB, that integrates and stores heterogeneous biological resources and make them available by means of a common structure and a unique query language. BioGraph implements state-of-the-art technologies and provides pre-compiled bioinformatics scenarios, as well as the possibility to perform custom queries and obtaining an interactive and dynamic visualization of results. We present a case study about functional analysis of microRNA in breast cancer in order to demonstrate the functionalities of the system. BioGraph is freely available at http://biograph.pa.icar.cnr.it . Source files are available on GitHub at https://github.com/IcarPA-TBlab/BioGraph

...read moreread less

Proceedings Article•DOI•

An early look at the LDBC social network benchmark's business intelligence workload

[...]

Gábor Szárnyas¹, Arnau Prat-Pérez, Alex Averbuch, József Marton¹, Marcus Paradies, Moritz Kaufmann², Orri Erling³, Peter Boncz⁴, Vlad Haprian⁵, János Benjamin Antal¹ - Show less +6 more•Institutions (5)

Budapest University of Technology and Economics¹, Technische Universität München², OpenLink Software³, Centrum Wiskunde & Informatica⁴, Oracle Corporation⁵

10 Jun 2018

TL;DR: An early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload, designed by taking into account technical "chokepoints" identified by database system architects from academia and industry.

...read moreread less

Abstract: In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.

...read moreread less

Journal Article•DOI•

A semantic graph database for bim-gis integrated information model for an intelligent urban mobility web application

[...]

A. H. Hor¹, G. Sohn¹, P. Claudio¹, M. Jadidi¹, A. Afnan¹ - Show less +1 more•Institutions (1)

Keele University¹

19 Sep 2018-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

TL;DR: This paper is presenting an architectural design with complete implementation of BIM-GIS integrated RDF graph database and the workflows and transformations of IFC and CityGML schemas into object graph databases model are developed and applied to an intelligent urban mobility web application on a game engine platform validate the integration methodology.

...read moreread less

Abstract: Over the recent years, the usage of semantic web technologies and Resources Description Framework (RDF) data models have been notably increased in many fields Multiple systems are using RDF data to describe information resources and semantic associations RDF data plays a very important role in advanced information retrieval, and graphs are efficient ways to visualize and represent real world data by providing solutions to many real-time scenarios that can be simulated and implemented using graph databases, and efficiently query graphs with multiple attributes representing different domains of knowledge Given that graph databases are schema less with efficient storage for semi-structured data, they can provide fast and deep traversals instead of slow RDBMS SQL based joins allowing Atomicity, Consistency, Isolation and durability (ACID) transactions with rollback support, and by utilizing mathematics of graph they can enormous potential for fast data extraction and storage of information in the form of nodes and relationships In this paper, we are presenting an architectural design with complete implementation of BIM-GIS integrated RDF graph database The proposed integration approach is composed of four main phases: ontological BIM and GIS model’s construction, mapping and semantic integration using interoperable data formats, then an import into a graph database with querying and filtering capabilities The workflows and transformations of IFC and CityGML schemas into object graph databases model are developed and applied to an intelligent urban mobility web application on a game engine platform validate the integration methodology

...read moreread less

Book Chapter•DOI•

Towards the Automated Generation of Consistent, Diverse, Scalable and Realistic Graph Models

[...]

Dániel Varró¹, Oszkár Semeráth¹, Gábor Szárnyas¹, Ákos Horváth¹•Institutions (1)

Budapest University of Technology and Economics¹

01 Jan 2018

TL;DR: This paper presents a long-term research challenge how to generate graph models specific to a domain which are consistent, diverse, scalable and realistic at the same time.

...read moreread less

Abstract: Automated model generation can be highly beneficial for various application scenarios including software tool certification, validation of cyber-physical systems or benchmarking graph databases to avoid tedious manual synthesis of models. In the paper, we present a long-term research challenge how to generate graph models specific to a domain which are consistent, diverse, scalable and realistic at the same time.

...read moreread less

Journal Article•DOI•

Automated reasoning for attributed graph properties

[...]

Sven Schneider¹, Leen Lambers¹, Fernando Orejas²•Institutions (2)

Hasso Plattner Institute¹, Polytechnic University of Catalonia²

30 Jun 2018-International Journal on Software Tools for Technology Transfer

TL;DR: This work introduces a new logic of attributed graph properties, where the graph part and attribution part are neatly separated, and extends the refutationally complete tableau-based reasoning method as well as the symbolic model generation approach for graph properties to attribute graph properties.

...read moreread less

Abstract: Graphs are ubiquitous in computer science. Moreover, in various application fields, graphs are equipped with attributes to express additional information such as names of entities or weights of relationships. Due to the pervasiveness of attributed graphs, it is highly important to have the means to express properties on attributed graphs to strengthen modeling capabilities and to enable analysis. Firstly, we introduce a new logic of attributed graph properties, where the graph part and attribution part are neatly separated. The graph part is equivalent to first-order logic on graphs as introduced by Courcelle. It employs graph morphisms to allow the specification of complex graph patterns. The attribution part is added to this graph part by reverting to the symbolic approach to graph attribution, where attributes are represented symbolically by variables whose possible values are specified by a set of constraints making use of algebraic specifications. Secondly, we extend our refutationally complete tableau-based reasoning method as well as our symbolic model generation approach for graph properties to attributed graph properties. Due to the new logic mentioned above, neatly separating the graph and attribution parts, and the categorical constructions employed only on a more abstract level, we can leave the graph part of the algorithms seemingly unchanged. For the integration of the attribution part into the algorithms, we use an oracle, allowing for flexible adoption of different available SMT solvers in the actual implementation. Finally, our automated reasoning approach for attributed graph properties is implemented in the tool AutoGraph integrating in particular the SMT solver Z3 for the attribute part of the properties. We motivate and illustrate our work with a particular application scenario on graph database query validation.

...read moreread less

Posted Content•

AIQL: Enabling Efficient Attack Investigation from System Monitoring Data

[...]

Peng Gao¹, Xusheng Xiao², Zhichun Li, Kangkook Jee, Fengyuan Xu³, Sanjeev R. Kulkarni¹, Prateek Mittal¹ - Show less +3 more•Institutions (3)

Princeton University¹, Case Western Reserve University², Nanjing University³

06 Jun 2018-arXiv: Cryptography and Security

TL;DR: A novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation and surpasses existing systems in both efficiency and conciseness.

...read moreread less

Proceedings Article•DOI•

Evaluation and Enumeration Problems for Regular Path Queries

[...]

Wim Martens¹, Tina Trautner¹•Institutions (1)

University of Bayreuth¹

01 Jan 2018

TL;DR: This work study RPQ evaluation for simple paths from a parameterized complexity perspective and defines a class of simple transitive expressions that is prominent in practice and for which it can prove a dichotomy for the evaluation problem.

...read moreread less

Abstract: Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths.

...read moreread less

Proceedings Article•DOI•

Exploration of Graph Computing in Power System State Estimation

[...]

Chen Yuan, Yuqi Zhou, Guofang Zhang, Guangyi Liu, Renchang Dai, Xi Chen, Zhiwei Wang - Show less +3 more

01 Aug 2018

TL;DR: A graph based power system modeling is explored and a graph computing based state estimation is proposed to speed up its performance and the testing results of IEEE 14-bus, IEEE 118-bus systems and a provincial system in China verify the accuracy and high-performance of the proposed methodology.

...read moreread less

Abstract: With the increased complexity of power systems due to the integration of smart grid technologies and renewable energy resources, more frequent changes have been introduced to system status, and the traditional serial mode of state estimation algorithm cannot well meet the restrict time-constrained requirement for the future dynamic power grid, even with advanced computer hardware. To guarantee the grid’s reliability and minimize the impacts caused by system status fluctuations, a fast, even SCADA-rate, state estimator is urgently needed. In this paper, a graph based power system modeling is firstly explored and a graph computing based state estimation is proposed to speed up its performance. The power system is represented by a graph, which is a collection of vertices and edges, and the measurements are attributes of vertices and edges. Each vertex can independently implement local computation, like formulations of the node-based H matrix, gain matrix and right-hand-side (RHS) vector, only with the information on its connected edges and neighboring vertices. Then, by taking advantages of graph database, these node-based data are conveniently collected and stored in the compressed sparse row (CSR) format avoiding the complexity and heaviness introduced by the sparse matrices. With communications and synchronization, centralized computation of solving the weighted least square (WLS) state estimation is completed with hierarchical parallel computing. The proposed strategy is implemented on a graph database platform. The testing results of IEEE 14-bus, IEEE 118-bus systems and a provincial system in China verify the accuracy and high-performance of the proposed methodology.

...read moreread less

Journal Article•DOI•

Declarative and distributed graph analytics with GRADOOP

[...]

Martin Junghanns¹, Max Kießling¹, Niklas Teichmann¹, Gomez Kevin A¹, André Petermann¹, Erhard Rahm¹ - Show less +2 more•Institutions (1)

Leipzig University¹

01 Aug 2018

TL;DR: G radoop is demonstrated, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing with a rich graph data model and powerful graph operators.

...read moreread less

Abstract: We demonstrate Gradoop, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing. Using a rich graph data model and powerful graph operators, users can declaratively express graph analytical programs for distributed execution without needing advanced programming experience or a deeper understanding of the underlying system. Visitors of the demo can declare graph analytical programs using the Gradoop operators and also visually experience two of our advanced operators: graph pattern matching and graph grouping. We provide real world and artificial social network data with up to 10 billion edges and allow running the programs either locally or on a remote research cluster to demonstrate scalability.

...read moreread less

Collapse