Showing papers on "Graph database published in 2014"

PDF

Open Access

Report•DOI•

Large-scale Graph Computation on Just a PC

[...]

01 May 2014

TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.

...read moreread less

Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

907 citations

Proceedings Article•DOI•

FaRM: fast remote memory

[...]

Aleksandar Dragojevic¹, Dushyanth Narayanan¹, Orion Hodson¹, Miguel Castro¹•Institutions (1)

Microsoft¹

02 Apr 2014

TL;DR: The design and implementation of FaRM is described, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP.

...read moreread less

Abstract: We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free objects in the address space with location transparency. We expect this simple programming model to be sufficient for most application code. FaRM provides two mechanisms to improve performance where required: lock-free reads over RDMA, and support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook's. They both perform well, for example, a 20-machine cluster can perform 167 million key-value lookups per second with a latency of 31µs.

...read moreread less

686 citations

Proceedings Article•DOI•

Modeling and Discovering Vulnerabilities with Code Property Graphs

[...]

Fabian Yamaguchi¹, Nico Golde², Daniel Arp¹, Konrad Rieck¹•Institutions (2)

University of Göttingen¹, Qualcomm²

18 May 2014

TL;DR: This paper introduces a novel representation of source code called a code property graph that merges concepts of classic program analysis, namely abstract syntax trees, control flow graphs and program dependence graphs, into a joint data structure that enables it to elegantly model templates for common vulnerabilities with graph traversals that can identify buffer overflows, integer overflOWS, format string vulnerabilities, or memory disclosures.

...read moreread less

Abstract: The vast majority of security breaches encountered today are a direct result of insecure code. Consequently, the protection of computer systems critically depends on the rigorous identification of vulnerabilities in software, a tedious and error-prone process requiring significant expertise. Unfortunately, a single flaw suffices to undermine the security of a system and thus the sheer amount of code to audit plays into the attacker's cards. In this paper, we present a method to effectively mine large amounts of source code for vulnerabilities. To this end, we introduce a novel representation of source code called a code property graph that merges concepts of classic program analysis, namely abstract syntax trees, control flow graphs and program dependence graphs, into a joint data structure. This comprehensive representation enables us to elegantly model templates for common vulnerabilities with graph traversals that, for instance, can identify buffer overflows, integer overflows, format string vulnerabilities, or memory disclosures. We implement our approach using a popular graph database and demonstrate its efficacy by identifying 18 previously unknown vulnerabilities in the source code of the Linux kernel.

...read moreread less

461 citations

Journal Article•DOI•

Skew strikes back: new developments in the theory of join algorithms

[...]

Hung Q. Ngo¹, Christopher Ré², Atri Rudra¹•Institutions (2)

University at Buffalo¹, Stanford University²

28 Feb 2014

TL;DR: A survey of join algorithms with provable worst-case optimality runtime guarantees can be found in this paper, where the authors provide a simpler and unified description of these algorithms that they hope is useful for theory-minded readers, algorithm designers, and systems implementors.

...read moreread less

Abstract: Evaluating the relational join is one of the central algorithmic and most well-studied problems in database systems. A staggering number of variants have been considered including Block-Nested loop join, Hash-Join, Grace, Sort-merge (see Grafe [17] for a survey, and [4, 7, 24] for discussions of more modern issues). Commercial database engines use finely tuned join heuristics that take into account a wide variety of factors including the selectivity of various predicates, memory, IO, etc. This study of join queries notwithstanding, the textbook description of join processing is suboptimal. This survey describes recent results on join algorithms that have provable worst-case optimality runtime guarantees. We survey recent work and provide a simpler and unified description of these algorithms that we hope is useful for theory-minded readers, algorithm designers, and systems implementors. Much of this progress can be understood by thinking about a simple join evaluation problem that we illustrate with the so-called triangle query, a query that has become increasingly popular in the last decade with the advent of social networks, biological motifs, and graph databases [36, 37]

...read moreread less

208 citations

Proceedings Article•DOI•

Navigating the maze of graph analytics frameworks using massive graph datasets

[...]

Nadathur Satish¹, Narayanan Sundaram¹, Md. Mostofa Ali Patwary¹, Jiwon Seo², Jongsoo Park¹, M. Amber Hassaan³, Shubho Sengupta¹, Zhaoming Yin⁴, Pradeep Dubey¹ - Show less +5 more•Institutions (4)

Intel¹, Stanford University², University of Texas at Austin³, Georgia Institute of Technology⁴

18 Jun 2014

TL;DR: A quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap" is offered, and changes to alleviate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implementations are recommended.

...read moreread less

Abstract: Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among others) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap" between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users' choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get performance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implementations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific algorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.

...read moreread less

189 citations

Journal Article•DOI•

gStore: a graph-based SPARQL query engine

[...]

Lei Zou¹, M. Tamer Özsu², Lei Chen³, Xuchuan Shen¹, Ruizhe Huang¹, Dongyan Zhao¹ - Show less +2 more•Institutions (3)

Peking University¹, University of Waterloo², Hong Kong University of Science and Technology³

01 Aug 2014

TL;DR: This work develops an index, together with effective pruning rules and efficient search algorithms, and proposes techniques that use this infrastructure to answer aggregation queries and proposes an effective maintenance algorithm to handle online updates over RDF repositories.

...read moreread less

Abstract: We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.

...read moreread less

186 citations

Journal Article•DOI•

Compact representation of Web graphs with extended functionality

[...]

Nieves R. Brisaboa¹, Susana Ladra¹, Gonzalo Navarro²•Institutions (2)

University of A Coruña¹, University of Chile²

01 Jan 2014-Information Systems

TL;DR: The k^2-tree is presented, a novel Web graph representation based on a compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph and offers the least space usage while supporting fast navigation to predecessors and successors and sharply outperforming the others on the extended queries.

...read moreread less

139 citations

Proceedings Article•

Transition-based Knowledge Graph Embedding with Relational Mapping Properties

[...]

Miao Fan¹, Qiang Zhou¹, Emily Chang, Thomas Fang Zheng¹•Institutions (1)

Tsinghua University¹

01 Dec 2014

TL;DR: A superior model is proposed to leverage the structure of the knowledge graph via pre-calculating the distinct weight for each training triplet according to its relational mapping property, and is compared with the state-of-the-art method TransE and other prior arts.

...read moreread less

Abstract: Many knowledge repositories nowadays contain billions of triplets, i.e. (head-entity, relationship, tail-entity), as relation instances. These triplets form a directed graph with entities as nodes and relationships as edges. However, this kind of symbolic and discrete storage structure makes it difficult for us to exploit the knowledge to enhance other intelligenceacquired applications (e.g. the QuestionAnswering System), as many AI-related algorithms prefer conducting computation on continuous data. Therefore, a series of emerging approaches have been proposed to facilitate knowledge computing via encoding the knowledge graph into a low-dimensional embedding space. TransE is the latest and most promising approach among them, and can achieve a higher performance with fewer parameters by modeling the relationship as a transitional vector from the head entity to the tail entity. Unfortunately, it is not flexible enough to tackle well with the various mapping properties of triplets, even though its authors spot the harm on performance. In this paper, we thus propose a superior model called TransM to leverage the structure of the knowledge graph via pre-calculating the distinct weight for each training triplet according to its relational mapping property. In this way, the optimal function deals with each triplet depending on its own weight. We carry out extensive experiments to compare TransM with the state-of-the-art method TransE and other prior arts. The performance of each approach is evaluated within two different application scenarios on several benchmark datasets. Results show that the model we proposed significantly outperforms the former ones with lower parameter complexity as TransE.

...read moreread less

128 citations

Journal Article•DOI•

The more the merrier: efficient multi-source graph traversal

[...]

Manuel Then¹, Moritz Kaufmann¹, Fernando Chirigati², Tuan-Anh Hoang-Vu², Kien Pham², Alfons Kemper¹, Thomas Neumann¹, Huy T. Vo² - Show less +4 more•Institutions (2)

Technische Universität München¹, New York University²

01 Dec 2014

TL;DR: This paper proposes Multi-Source BFS, an algorithm that is designed to run multiple concurrent BFSs over the same graph on a single CPU core while scaling up as the number of cores increases, and demonstrates how a real graph analytics application---all-vertices closeness centrality---can be efficiently solved with MS-BFS.

...read moreread less

Abstract: Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph analytics algorithms are based on breadth-first search (BFS) graph traversal, which is not only time-consuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose Multi-Source BFS (MS-BFS), an algorithm that is designed to run multiple concurrent BFSs over the same graph on a single CPU core while scaling up as the number of cores increases. MS-BFS leverages the properties of small-world networks, which apply to many real-world graphs, and enables efficient graph traversal that: (i) shares common computation across concurrent BFSs; (ii) greatly reduces the number of random memory accesses; and (iii) does not incur synchronization costs. We demonstrate how a real graph analytics application---all-vertices closeness centrality---can be efficiently solved with MS-BFS. Furthermore, we present an extensive experimental evaluation with both synthetic and real datasets, including Twitter and Wikipedia, showing that MS-BFS provides almost linear scalability with respect to the number of cores and excellent scalability for increasing graph sizes, outperforming state-of-the-art BFS algorithms by more than one order of magnitude when running a large number of BFSs.

...read moreread less

113 citations

Journal Article•DOI•

Graph based Representation and Analysis of Text Document: A Survey of Techniques

[...]

Sheetal S. Sonawane, Parag Kulkarni

18 Jun 2014-International Journal of Computer Applications

TL;DR: The survey results shows that Graph based representation is appropriate way of representing text document and improved result of analysis over traditional model for different text applications.

...read moreread less

Abstract: A common and standard approach to model text document is bag-of-words. This model is suitable for capturing word frequency, however structural and semantic information is ignored. Graph representation is mathematical constructs and can model relationship and structural information effectively. A text can appropriately represented as Graph using vertex as feature term and edge relation can be significant relation between the feature terms. Text representation using Graph model provides computations related to various operations like term weight, ranking which is helpful in many applications in information retrieval. This paper presents a systematic survey of existing work on Graph based representation of text and also focused on Graph based analysis of text document for different operations in information retrieval. In this process taxonomy of Graph based representation and analysis of text document is derived and result of different methods of Graph based text representation and analysis are discussed. The survey results shows that Graph based representation is appropriate way of representing text document and improved result of analysis over traditional model for different text applications.

...read moreread less

104 citations

Proceedings Article•DOI•

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

[...]

Eriko Nurvitadhi¹, Gabriel Weisz², Yu Wang², Skand Hurkat³, Marie Nguyen², James C. Hoe², Jose F. Martinez³, Carlos Guestrin⁴ - Show less +4 more•Institutions (4)

Intel¹, Carnegie Mellon University², Cornell University³, University of Washington⁴

11 May 2014

TL;DR: GraphGen, a vertex-centric framework that targets FPGA for hardware acceleration of graph computations, is presented and design case studies using GraphGen to implement stereo matching and handwriting recognition graph applications on Terasic DE4 and Xilinx ML605 FPGa boards are reported.

...read moreread less

Abstract: Vertex-centric graph computations are widely used in many machine learning and data mining applications that operate on graph data structures. This paper presents GraphGen, a vertex-centric framework that targets FPGA for hardware acceleration of graph computations. GraphGen accepts a vertex-centric graph specification and automatically compiles it onto an application-specific synthesized graph processor and memory system for the target FPGA platform. We report design case studies using GraphGen to implement stereo matching and handwriting recognition graph applications on Terasic DE4 and Xilinx ML605 FPGA boards. Results show up to 14.6x and 2.9x speedups over software on Intel Core i7 CPU for the two applications, respectively.

...read moreread less

Book•

Neo4j in Action

[...]

Aleksa Vukotic, Nicki Watt, Tareq Abedrabbo, Dominic Fox, Jonas Partner - Show less +1 more

05 Dec 2014

TL;DR: Neo4j in Action is a comprehensive guide to Neo4j, aimed at application developers and software architects that explores the full power of native Java APIs for graph data manipulation and querying and also covers Cypher, Neo4J's graph query language.

...read moreread less

Abstract: SummaryNeo4j in Action is a comprehensive guide to Neo4j, aimed at application developers and software architects. Using hands-on examples, you'll learn to model graph domains naturally with Neo4j graph structures. The book explores the full power of native Java APIs for graph data manipulation and querying.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the TechnologyMuch of the data today is highly connectedfrom social networks to supply chains to software dependency managementand more connections are continually being uncovered. Neo4j is an ideal graph database tool for highly connected data. It is mature, production-ready, and unique in enabling developers to simply and efficiently model and query connected data. About the BookNeo4j in Action is a comprehensive guide to designing, implementing, and querying graph data using Neo4j. Using hands-on examples, you'll learn to model graph domains naturally with Neo4j graph structures. The book explores the full power of native Java APIs for graph data manipulation and querying. It also covers Cypher, Neo4j's graph query language. Along the way, you'll learn how to integrate Neo4j into your domain-driven app using Spring Data Neo4j, as well as how to use Neo4j in standalone server or embedded modes. Knowledge of Java basics is required. No prior experience with graph data or Neo4j is assumed. What's InsideGraph database patternsHow to model data in social networksHow to use Neo4j in your Java applicationsHow to configure and set up Neo4jAbout the AuthorsAleksa Vukotic is an architect specializing in graph data models. Nicki Watt, Dominic Fox, Tareq Abedrabbo, and Jonas Partner work at OpenCredo, a Neo Technology partner, and have been involved in many projects using Neo4j.Table of ContentsPART 1 INTRODUCTION TO NEO4JA case for a Neo4j databaseData modeling in Neo4jStarting development with Neo4j The power of traversalsIndexing the dataPART 2 APPLICATION DEVELOPMENT WITH NEO4JCypher: Neo4j query languageTransactionsTraversals in depthSpring Data Neo4jPART 3 NEO4J IN PRODUCTIONNeo4j: embedded versus server mode

...read moreread less

Proceedings Article•DOI•

A performance evaluation of open source graph databases

[...]

Robert McColl¹, David Ediger², Jason Poovey², Daniel P. Campbell², David A. Bader¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Georgia Tech Research Institute²

16 Feb 2014

TL;DR: A qualitative study and a performance comparison of 12 open source graph databases using four fundamental graph algorithms on networks containing up to 256 million edges are conducted.

...read moreread less

Abstract: With the proliferation of large, irregular, and sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query languages. Many of these platforms apply graph structures and analysis techniques to enable users to ingest, update, query, and compute on the topological structure of the network represented as sets of edges relating sets of vertices. To store and process Facebook-scale datasets, software and algorithms must be able to support data sources with billions of edges, update rates of millions of updates per second, and complex analysis kernels. These platforms must provide intuitive interfaces that enable graph experts and novice programmers to write implementations of common graph algorithms. In this paper, we conduct a qualitative study and a performance comparison of 12 open source graph databases using four fundamental graph algorithms on networks containing up to 256 million edges.

...read moreread less

Patent•

Graph database for services planning and configuration in network services domain

[...]

Geoffrey A. Mattson¹, Lei Qiu¹•Institutions (1)

Juniper Networks¹

27 Jun 2014

TL;DR: In this article, techniques are described for representing services, network resources, and relationships between such services and resources in a graph database with which to validate, provision, and manage the services in near real-time.

...read moreread less

Abstract: In general, techniques are described for representing services, network resources, and relationships between such services and resources in a graph database with which to validate, provision, and manage the services in near real-time. In one example, a controller device includes at least one processor; and at least one memory to store a graph database comprising a graph that represents network resources and relationships between network resources. The controller device receives, at an application programming interface, a data-interchange formatted message that indicates a service request to configure a network service; queries, at least a portion of the plurality of the graph, to determine whether a set of the plurality of network resources can satisfy the service request to provision the network service within the network; and configures the set of the plurality of network resources to provide the network service.

...read moreread less

Proceedings Article•DOI•

Scalable big graph processing in MapReduce

[...]

Lu Qin¹, Jeffrey Xu Yu², Lijun Chang³, Hong Cheng², Chengqi Zhang¹, Xuemin Lin³ - Show less +2 more•Institutions (3)

University of Technology, Sydney¹, The Chinese University of Hong Kong², University of New South Wales³

18 Jun 2014

TL;DR: This paper introduces a Scalable Graph processing Class SGC by relaxing some constraints in MMC to make it suitable for scalable graph processing, and defines two graph join operators in SGC, namely, EN join and NE join, using which a wide range of graph algorithms can be designed.

...read moreread less

Abstract: MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce Class MMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints in MMC can be hardly achieved simultaneously on graphs and the conditions in MRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints in MMC to make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join and NE join, using which a wide range of graph algorithms can be designed, including PageRank, breadth first search, graph keyword search, Connected Component (CC) computation, and Minimum Spanning Forest (MSF) computation. Remarkably, to the best of our knowledge, for the two fundamental graph problems CC and MSF computation, this is the first work that can achieve O(log(n)) MapReduce rounds with $O(n+m)$ total communication cost in each round and constant memory consumption on each machine, where $n$ and $m$ are the number of nodes and edges in the graph respectively. We conducted extensive performance studies using two web-scale graphs Twitter and Friendster with different graph characteristics. The experimental results demonstrate that our algorithms can achieve high scalability in big graph processing.

...read moreread less

Journal Article•DOI•

Schemaless and structureless graph querying

[...]

Shengqi Yang¹, Yinghui Wu¹, Huan Sun¹, Xifeng Yan¹•Institutions (1)

University of California, Santa Barbara¹

01 Mar 2014

TL;DR: The experimental results show that this new graph querying paradigm is promising: It identifies high-quality matches for both keyword and graph queries over real-life knowledge graphs, and outperforms existing methods significantly in terms of effectiveness and efficiency.

...read moreread less

Abstract: Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. We argue that for a user-friendly graph query engine, it must support various kinds of transformations such as synonym, abbreviation, and ontology. Furthermore, the derived query results must be ranked in a principled manner.In this paper, we introduce a novel framework enabling schemaless and structureless graph querying (SLQ), where a user need not describe queries precisely as required by most databases. The query engine is built on a set of transformation functions that automatically map keywords and linkages from a query to their matches in a graph. It automatically learns an effective ranking model, without assuming manually labeled training examples, and can efficiently return top ranked matches using graph sketch and belief propagation. The architecture of SLQ is elastic for "plug-in" new transformation functions and query logs. Our experimental results show that this new graph querying paradigm is promising: It identifies high-quality matches for both keyword and graph queries over real-life knowledge graphs, and outperforms existing methods significantly in terms of effectiveness and efficiency.

...read moreread less

Journal Article•DOI•

Event pattern matching over graph streams

[...]

Chunyao Song¹, Tingjian Ge¹, Cindy X. Chen¹, Jie Wang¹•Institutions (1)

University of Massachusetts Lowell¹

01 Dec 2014

TL;DR: The semantics and efficient online algorithms for this important and intriguing problem of event pattern matching are studied, and approaches are evaluated with extensive experiments over real world datasets in four different domains.

...read moreread less

Abstract: A graph is a fundamental and general data structure underlying all data applications. Many applications today call for the management and query capabilities directly on graphs. Real time graph streams, as seen in road networks, social and communication networks, and web requests, are such applications. Event pattern matching requires the awareness of graph structures, which is different from traditional complex event processing. It also requires a focus on the dynamicity of the graph, time order constraints in patterns, and online query processing, which deviates significantly from previous work on subgraph matching as well. We study the semantics and efficient online algorithms for this important and intriguing problem, and evaluate our approaches with extensive experiments over real world datasets in four different domains.

...read moreread less

Proceedings Article•DOI•

NoSQL Systems for Big Data Management

[...]

Venkat N. Gudivada¹, Dhana Rao¹, Vijay V. Raghavan²•Institutions (2)

Marshall University¹, University of Louisiana at Lafayette²

27 Jun 2014

TL;DR: A taxonomy and unified perspective on NoSQL systems is provided using multiple facets including system architecture, data model, query language, client API, scalability, and availability to help the reader in choosing an appropriate NoSQL system for a given application.

...read moreread less

Abstract: The advent of Big Data created a need for out-of-the-box horizontal scalability for data management systems. This ushered in an array of choices for Big Data management under the umbrella term NoSQL. In this paper, we provide a taxonomy and unified perspective on NoSQL systems. Using this perspective, we compare and contrast various NoSQL systems using multiple facets including system architecture, data model, query language, client API, scalability, and availability. We group current NoSQL systems into seven broad categories: Key-Value, Table-type/Column, Document, Graph, Native XML, Native Object, and Hybrid databases. We also describe application scenarios for each category to help the reader in choosing an appropriate NoSQL system for a given application. We conclude the paper by indicating future research directions.

...read moreread less

Book Chapter•DOI•

Superpixel Graph Label Transfer with Learned Distance Metric

[...]

Stephen Gould¹, Jiecheng Zhao¹, Xuming He¹, Xuming He², Yuhang Zhang³, Yuhang Zhang¹ - Show less +2 more•Institutions (3)

Australian National University¹, NICTA², Chalmers University of Technology³

06 Sep 2014

TL;DR: This work presents a fast approximate nearest neighbor algorithm for semantic segmentation that builds a graph over superpixels from an annotated set of training images and proposes to learn a distance metric that weights the edges in the graph.

...read moreread less

Abstract: We present a fast approximate nearest neighbor algorithm for semantic segmentation. Our algorithm builds a graph over superpixels from an annotated set of training images. Edges in the graph represent approximate nearest neighbors in feature space. At test time we match superpixels from a novel image to the training images by adding the novel image to the graph. A move-making search algorithm allows us to leverage the graph and image structure for finding matches. We then transfer labels from the training images to the image under test. To promote good matches between superpixels we propose to learn a distance metric that weights the edges in our graph. Our approach is evaluated on four standard semantic segmentation datasets and achieves results comparable with the state-of-the-art.

...read moreread less

Book Chapter•DOI•

Neo4EMF, A Scalable Persistence Layer for EMF Models

[...]

Amine Benelallam¹, Abel Gómez¹, Gerson Sunyé¹, Massimo Tisi¹, David Launay - Show less +1 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

21 Jul 2014

TL;DR: A scalable persistence layer for the de-facto standard MDE framework EMF that exploits the efficiency of graph databases in storing and accessing graph structures, as EMF models are.

...read moreread less

Abstract: Several industrial contexts require software engineering methods and tools able to handle large-size artifacts. The central idea of abstraction makes model-driven engineering (MDE) a promising approach in such contexts, but current tools do not scale to very large models (VLMs): already the task of storing and accessing VLMs from a persisting support is currently inefficient. In this paper we propose a scalable persistence layer for the de-facto standard MDE framework EMF. The layer exploits the efficiency of graph databases in storing and accessing graph structures, as EMF models are. A preliminary experimentation shows that typical queries in reverse-engineering EMF models have good performance on such persistence layer, compared to file-based backends.

...read moreread less

Journal Article•DOI•

Computing personalized PageRank quickly by exploiting graph structures

[...]

Takanori Maehara¹, Takuya Akiba², Yoichi Iwata², Ken-ichi Kawarabayashi¹•Institutions (2)

National Institute of Informatics¹, University of Tokyo²

01 Aug 2014

TL;DR: An algorithm for computing a tree decomposition, which is more efficient and scalable than any previous algorithm, and is the first time to use graph structures explicitly to solve PPR quickly.

...read moreread less

Abstract: We propose a new scalable algorithm that can compute Personalized PageRank (PPR) very quickly. The Power method is a state-of-the-art algorithm for computing exact PPR; however, it requires many iterations. Thus reducing the number of iterations is the main challenge.We achieve this by exploiting graph structures of web graphs and social networks. The convergence of our algorithm is very fast. In fact, it requires up to 7.5 times fewer iterations than the Power method and is up to five times faster in actual computation time.To the best of our knowledge, this is the first time to use graph structures explicitly to solve PPR quickly. Our contributions can be summarized as follows.1. We provide an algorithm for computing a tree decomposition, which is more efficient and scalable than any previous algorithm.2. Using the above algorithm, we can obtain a core-tree decomposition of any web graph and social network. This allows us to decompose a web graph and a social network into (1) the core, which behaves like an expander graph, and (2) a small tree-width graph, which behaves like a tree in an algorithmic sense.3. We apply a direct method to the small tree-width graph to construct an LU decomposition.4. Building on the LU decomposition and using it as pre-conditoner, we apply GMRES method (a state-of-the-art advanced iterative method) to compute PPR for whole web graphs and social networks.

...read moreread less

Patent•

Processing query to graph database

[...]

Oded Shmueli¹, Lila Shnaiderman¹•Institutions (1)

Technion – Israel Institute of Technology¹

24 Feb 2014

TL;DR: In this paper, a method of processing a query to a graph database using a plurality of processors is proposed, where each thread is associated with a unique thread identifier and each sub-graph is defined by one of the thread identifiers.

...read moreread less

Abstract: A method of processing a query to a graph database using a plurality of processors. The method comprises providing a plurality of threads to be executed on a plurality of processors, each the thread is associated with one of a plurality of unique thread identifiers, providing a graph database having a plurality of graph database nodes and a plurality of graph database edges, each the graph database edge represents a relationship between two of the plurality of graph database nodes, receiving a query tree that defines a tree comprising plurality of query nodes connected by a plurality of query tree edges, and searching at least part of the graph database for a match with the query tree, wherein the searching is executed by the plurality of the processors, and wherein each the processor searches one of a plurality of sub-graphs of the graph database, each the sub-graph is defined by one of the plurality of thread identifiers.

...read moreread less

Book•DOI•

Databases Theory and Applications

[...]

Hua Wang, Mohamed A. Sharaf

01 Jan 2014

TL;DR: This research presents a meta-modelling architecture for social media data management that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and managing social media accounts.

...read moreread less

Abstract: Data warehousing.- Database integration.- Mobile databases.- Cloud, distributed, and parallel databases.- High dimensional and temporal data.- Image/video retrieval and databases.- Database performance and tuning.- Privacy and security in databases.- Query processing and optimization.- Semi-structured data and XML.- Spatial data processing and management.- Stream and sensor data management.- Uncertain and probabilistic databases.- Web databases.- Graph databases.- Web service management.- Social media data management.

...read moreread less

Journal Article•DOI•

Vertexica: your relational friend for graph analytics!

[...]

Alekh Jindal¹, Praynaa Rawlani¹, Eugene Wu¹, Samuel Madden¹, Amol Deshpande², Michael Stonebraker¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Maryland, College Park²

01 Aug 2014

TL;DR: Vertexica is presented, a graph analytics tools on top of a relational database, which is user friendly and yet highly efficient, and has the ability to leverage the relational features and enable much more sophisticated graph analysis.

...read moreread less

Abstract: In this paper, we present Vertexica, a graph analytics tools on top of a relational database, which is user friendly and yet highly efficient. Instead of constraining programmers to SQL, Vertexica offers a popular vertex-centric query interface, which is more natural for analysts to express many graph queries. The programmers simply provide their vertex-compute functions and Vertexica takes care of efficiently executing them in the standard SQL engine. The advantage of using Vertexica is its ability to leverage the relational features and enable much more sophisticated graph analysis. These include expressing graph algorithms which are difficult in vertex-centric but straightforward in SQL and the ability to compose end-to-end data processing pipelines, including pre- and post- processing of graphs as well as combining multiple algorithms for deeper insights. Vertexica has a graphical user interface and we outline several demonstration scenarios including, interactive graph analysis, complex graph analysis, and continuous and time series analysis.

...read moreread less

Proceedings Article•DOI•

Pagrol: Parallel graph olap over large-scale attributed graphs

[...]

Zhengkui Wang¹, Qi Fan¹, Huiju Wang¹, Kian-Lee Tan¹, Divyakant Agrawal², Amr El Abbadi² - Show less +2 more•Institutions (2)

National University of Singapore¹, University of California, Santa Barbara²

01 Jan 2014

TL;DR: Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels and provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph.

...read moreread less

Abstract: Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.

...read moreread less

Patent•

Knowledge Graph Generator Enabled by Diagonal Search

[...]

Omer Sonmez, Zonghuan Wu, Serif Adali, Murat Kalender, Alper Kose - Show less +1 more

26 Sep 2014

TL;DR: In this article, the authors propose a method for building and managing a user-customizable knowledge base, the method comprising acquiring data related to a plurality of entities from a plethora of heterogeneous data sources based on a customized acquisition configuration, wherein the customized acquisition configuration specifies a distinct data wrapper for each of the data sources, extracting entity-related information from the data to form a number of graph databases, and integrating the graph databases by mapping relationships between the entities to create an entity-centric knowledge base.

...read moreread less

Abstract: A method for building and managing a user-customizable knowledge base, the method comprising acquiring data related to a plurality of entities from a plurality of heterogeneous data sources based on a customized acquisition configuration, wherein the customized acquisition configuration specifies a distinct data wrapper for each of the data sources, extracting entity-related information from the data to form a number of graph databases, and integrating the graph databases by mapping relationships between the entities to create an entity-centric knowledge base.

...read moreread less

Book Chapter•DOI•

IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud

[...]

Gábor Szárnyas¹, Benedek Izsó¹, István Ráth¹, Dénes Harmath, Gábor Bergmann¹, Dániel Varró², Dániel Varró¹, Dániel Varró³ - Show less +4 more•Institutions (3)

Budapest University of Technology and Economics¹, Université de Montréal², McGill University³

28 Sep 2014

TL;DR: This paper proposes a novel architecture for distributed and incremental queries, and conducts experiments to demonstrate that IncQuery-D, the prototype system, can scale up from a single workstation to a cluster that can handle very large models and complex incremental queries efficiently.

...read moreread less

Abstract: Queries are the foundations of data intensive applications. In model-driven software engineering (MDE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, traditional tools exhibit scalability issues that decrease productivity and increase costs [17]. While scalability is a hot topic in the database community and recent NoSQL efforts have partially addressed many shortcomings, this happened at the cost of sacrificing the ad-hoc query capabilities of SQL. Unfortunately, this is a critical problem for MDE applications due to their inherent workload complexity. In this paper, we aim to address both the scalability and ad-hoc querying challenges by adapting incremental graph search techniques – known from the EMF-IncQuery framework – to a distributed cloud infrastructure. We propose a novel architecture for distributed and incremental queries, and conduct experiments to demonstrate that IncQuery-D, our prototype system, can scale up from a single workstation to a cluster that can handle very large models and complex incremental queries efficiently.

...read moreread less

Journal Article•DOI•

Querying Regular Graph Patterns

[...]

Pablo Barceló¹, Leonid Libkin², Juan L. Reutter³•Institutions (3)

University of Chile¹, University of Edinburgh², Pontifical Catholic University of Chile³

01 Jan 2014-Journal of the ACM

TL;DR: This work provides a classification of patterns, and study standard graph queries on graph patterns based on regular expressions, and provides additional restrictions for tractability, and shows that some intractable cases can be naturally cast as instances of constraint satisfaction problems.

...read moreread less

Abstract: Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, that is, graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood.Our goal is to study the basics of querying graph patterns. The key features of patterns we consider here are node and label variables and edges specified by regular expressions. We provide a classification of patterns, and study standard graph queries on graph patterns. We give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower-complexity restrictions. Since our patterns are based on regular expressions, query answering for them can be captured by a new automata model. These automata have two modes of acceptance: one captures queries returning nodes, and the other queries returning paths. We study properties of such automata, and the key computational tasks associated with them. Finally, we provide additional restrictions for tractability, and show that some intractable cases can be naturally cast as instances of constraint satisfaction problems.

...read moreread less

Journal Article•DOI•

A Granular Computing approach to the design of optimized graph classification systems

[...]

Filippo Maria Bianchi¹, Lorenzo Livi¹, Antonello Rizzi¹, Alireza Sadeghian²•Institutions (2)

Sapienza University of Rome¹, Ryerson University²

01 Feb 2014

TL;DR: This paper proposes two graph embedding algorithms based on the Granular Computing paradigm, which are engineered as key procedures of a general-purpose graph classification system.

...read moreread less

Abstract: Research on Graph-based pattern recognition and Soft Computing systems has attracted many scientists and engineers in several different contexts. This fact is motivated by the reason that graphs are general structures able to encode both topological and semantic information in data. While the data modeling properties of graphs are of indisputable power, there are still different concerns about the best way to compute similarity functions in an effective and efficient manner. To this end, suited transformation procedures are usually conceived to address the well-known Inexact Graph Matching problem in an explicit embedding space. In this paper, we propose two graph embedding algorithms based on the Granular Computing paradigm, which are engineered as key procedures of a general-purpose graph classification system. Tests have been conducted on benchmarking datasets relying on both synthetic and real-world data, achieving competitive results in terms of test set classification accuracy.

...read moreread less

Patent•

Systems and Methods for Visualizing and Manipulating Graph Databases

[...]

Robert Chess Stetson, Jacob W. Aptekar

27 Jun 2014

TL;DR: In this paper, a graph database manipulation device includes a processor and a memory configured to store a GDB application, wherein the GDB manipulation application configures the processor to obtain a graph DB including a set of nodes and a sets of edges, determine a source node within the set of vertices, locate the related nodes based on the source node and the vertices of the edges, and recursively update the generated representation of the generated vertices from the perspective of the source nodes.

...read moreread less

Abstract: Systems and methods for visualizing and manipulating graph databases in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a graph database manipulation device includes a processor and a memory configured to store a graph database manipulation application, wherein the graph database manipulation application configures the processor to obtain a graph database including a set of nodes and a set of edges, determine a source node within the set of nodes, locate a set of related nodes based on the source node and the set of edges, recursively locate a set of sub-related nodes based on the set of related nodes and the set of edges, generate a representation of the set of related nodes from the perspective of the source node, and recursively update the generated representation of the set of sub-related nodes from the perspective of the source node and the related nodes.

...read moreread less

Collapse