scispace - formally typeset
Search or ask a question

Showing papers on "Graph (abstract data type) published in 2015"


Proceedings Article
25 Jan 2015
TL;DR: The aim of NR is to make it easy to discover key insights into the data extremely fast with little effort while also providing a medium for users to share data, visualizations, and insights.
Abstract: (NR) is the first interactive data repository with a web-based platform for visual interactive analytics. Unlike other data repositories (e.g., UCI ML Data Repository, and SNAP), the network data repository (networkrepository.com) allows users to not only download, but to interactively analyze and visualize such data using our web-based interactive graph analytics platform. Users can in real-time analyze, visualize, compare, and explore data along many different dimensions. The aim of NR is to make it easy to discover key insights into the data extremely fast with little effort while also providing a medium for users to share data, visualizations, and insights. Other key factors that differentiate NR from the current data repositories is the number of graph datasets, their size, and variety. While other data repositories are static, they also lack a means for users to collaboratively discuss a particular dataset, corrections, or challenges with using the data for certain applications. In contrast, NR incorporates many social and collaborative aspects that facilitate scientific research, e.g., users can discuss each graph, post observations, and visualizations.

1,767 citations


Posted Content
TL;DR: This paper develops an extension of Spectral Networks which incorporates a Graph Estimation procedure, that is test on large-scale classification problems, matching or improving over Dropout Networks with far less parameters to estimate.
Abstract: Deep Learning's recent successes have mostly relied on Convolutional Networks, which exploit fundamental statistical properties of images, sounds and video data: the local stationarity and multi-scale compositional structure, that allows expressing long range interactions in terms of shorter, localized interactions. However, there exist other important examples, such as text documents or bioinformatic data, that may lack some or all of these strong statistical regularities. In this paper we consider the general question of how to construct deep architectures with small learning complexity on general non-Euclidean domains, which are typically unknown and need to be estimated from the data. In particular, we develop an extension of Spectral Networks which incorporates a Graph Estimation procedure, that we test on large-scale classification problems, matching or improving over Dropout Networks with far less parameters to estimate.

1,418 citations


Proceedings ArticleDOI
10 Aug 2015
TL;DR: A unified framework to learn latent representations of sub-structures for graphs, inspired by latest advancements in language modeling and deep learning, which achieves significant improvements in classification accuracy over state-of-the-art graph kernels.
Abstract: In this paper, we present Deep Graph Kernels, a unified framework to learn latent representations of sub-structures for graphs, inspired by latest advancements in language modeling and deep learning. Our framework leverages the dependency information between sub-structures by learning their latent representations. We demonstrate instances of our framework on three popular graph kernels, namely Graphlet kernels, Weisfeiler-Lehman subtree kernels, and Shortest-Path graph kernels. Our experiments on several benchmark datasets show that Deep Graph Kernels achieve significant improvements in classification accuracy over state-of-the-art graph kernels.

1,074 citations


Journal ArticleDOI
TL;DR: It is demonstrated that human brain functional networks exhibit efficient small-world, assortative, hierarchical and modular organizations and possess highly connected hubs and that these findings are robust against different analytical strategies.
Abstract: Recent studies have suggested that the brain’s structural and functional networks (i.e., connectomics) can be constructed by various imaging technologies (e.g., EEG/MEG; structural, diffusion and functional MRI) and further characterized by graph theory. Given the huge complexity of network construction, analysis and statistics, toolboxes incorporating these functions are largely lacking. Here, we developed the GRaph thEoreTical Network Analysis (GRETNA) toolbox for imaging connectomics. The GRETNA contains several key features as follows: (i) an open-source, Matlab-based, cross-platform (Windows and UNIX OS) package with a graphical user interface; (ii) allowing topological analyses of global and local network properties with parallel computing ability, independent of imaging modality and species; (iii) providing flexible manipulations in several key steps during network construction and analysis, which include network node definition, network connectivity processing, network type selection and choice of thresholding procedure; (iv) allowing statistical comparisons of global, nodal and connectional network metrics and assessments of relationship between these network metrics and clinical or behavioral variables of interest; and (v) including functionality in image preprocessing and network construction based on resting-state functional MRI (R-fMRI) data. After applying the GRETNA to a publicly released R-fMRI dataset of 54 healthy young adults, we demonstrated that human brain functional networks exhibit efficient small-world, assortative, hierarchical and modular organizations and possess highly connected hubs and that these findings are robust against different analytical strategies. With these efforts, we anticipate that GRETNA will accelerate imaging connectomics in an easy, quick and flexible manner. GRETNA is freely available on the NITRC website (http://www.nitrc.org/projects/gretna/).

884 citations


Proceedings ArticleDOI
Arnab Sinha1, Zhihong Shen1, Yang Song1, Hao Ma1, Darrin Eide1, Bo-June (Paul) Hsu1, Kuansan Wang1 
18 May 2015
TL;DR: A knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation are demonstrated.
Abstract: In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph comprised of six types of entities that model the scholarly activities: field of study, author, institution, paper, venue, and event. In addition to obtaining these entities from the publisher feeds as in the previous effort, we in this version include data mining results from the Web index and an in-house knowledge base from Bing, a major commercial search engine. As a result of the Bing integration, the new MAS graph sees significant increase in size, with fresh information streaming in automatically following their discoveries by the search engine. In addition, the rich entity relations included in the knowledge base provide additional signals to disambiguate and enrich the entities within and beyond the academic domain. The number of papers indexed by MAS, for instance, has grown from low tens of millions to 83 million while maintaining an above 95% accuracy based on test data sets derived from academic activities at Microsoft Research. Based on the data set, we demonstrate two scenarios in this work: a knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation.

837 citations


Proceedings ArticleDOI
28 Jul 2015
TL;DR: This work proposes a novel semantic parsing framework for question answering using a knowledge base that leverages the knowledge base in an early stage to prune the search space and thus simplifies the semantic matching problem.
Abstract: We propose a novel semantic parsing framework for question answering using a knowledge base. We define a query graph that resembles subgraphs of the knowledge base and can be directly mapped to a logical form. Semantic parsing is reduced to query graph generation, formulated as a staged search problem. Unlike traditional approaches, our method leverages the knowledge base in an early stage to prune the search space and thus simplifies the semantic matching problem. By applying an advanced entity linking system and a deep convolutional neural network model that matches questions and predicate sequences, our system outperforms previous methods substantially, and achieves an F1 measure of 52.5% on the WEBQUESTIONS dataset.

806 citations


Proceedings ArticleDOI
31 Dec 2015
TL;DR: This system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps.
Abstract: We present a novel approach to real-time dense visual SLAM. Our system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps. This is accomplished by using dense frame-tomodel camera tracking and windowed surfel-based fusion coupled with frequent model refinement through non-rigid surface deformations. Our approach applies local model-to-model surface loop closure optimisations as often as possible to stay close to the mode of the map distribution, while utilising global loop closure to recover from arbitrary drift and maintain global consistency.

754 citations


Journal ArticleDOI
TL;DR: This work introduces a method based on quantum theory to reduce the number of layers to a minimum while maximizing the distinguishability between the multilayer network and the corresponding aggregated graph.
Abstract: Many complex systems can be represented as networks consisting of distinct types of interactions, which can be categorized as links belonging to different layers. For example, a good description of the full protein–protein interactome requires, for some organisms, up to seven distinct network layers, accounting for different genetic and physical interactions, each containing thousands of protein–protein relationships. A fundamental open question is then how many layers are indeed necessary to accurately represent the structure of a multilayered complex system. Here we introduce a method based on quantum theory to reduce the number of layers to a minimum while maximizing the distinguishability between the multilayer network and the corresponding aggregated graph. We validate our approach on synthetic benchmarks and we show that the number of informative layers in some real multilayer networks of protein–genetic interactions, social, economical and transportation systems can be reduced by up to 75%. Multilayer networks have been used to capture the structure of complex systems with different types of interactions, but often contain redundant information. Here, De Domenico et al. present a method based on quantum information, to identify the minimal configuration of layers to retain.

557 citations


Journal ArticleDOI
TL;DR: This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.
Abstract: What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100 s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, showing that Petuum allows ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

395 citations


Proceedings ArticleDOI
TL;DR: "Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
Abstract: For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

355 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper formulate data association as a Generalized Maximum Multi Clique problem (GMMCP) and shows that this is the ideal case of modeling tracking in real world scenario where all the pairwise relationships between targets in a batch of frames are taken into account.
Abstract: Data association is the backbone to many multiple object tracking (MOT) methods. In this paper we formulate data association as a Generalized Maximum Multi Clique problem (GMMCP). We show that this is the ideal case of modeling tracking in real world scenario where all the pairwise relationships between targets in a batch of frames are taken into account. Previous works assume simplified version of our tracker either in problem formulation or problem optimization. However, we propose a solution using GMMCP where no simplification is assumed in either steps. We show that the NP hard problem of GMMCP can be formulated through Binary-Integer Program where for small and medium size MOT problems the solution can be found efficiently. We further propose a speed-up method, employing Aggregated Dummy Nodes for modeling occlusion and miss-detection, which reduces the size of the input graph without using any heuristics. We show that, using the speedup method, our tracker lends itself to real-time implementation which is plausible in many applications. We evaluated our tracker on six challenging sequences of Town Center, TUD-Crossing, TUD-Stadtmitte, Parking-lot 1, Parking-lot 2 and Parking-lot pizza and show favorable improvement against state of art.

Journal ArticleDOI
01 Jul 2015
TL;DR: GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.
Abstract: Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operations in the backend. We thus get the productivity benefits of a vertex programming framework without sacrificing performance. GraphMat is a single-node multicore graph framework written in C++ which has enabled us to write a diverse set of graph algorithms with the same effort compared to other vertex programming frameworks. GraphMat performs 1.1-7X faster than high performance frameworks such as GraphLab, CombBLAS and Galois. GraphMat also matches the performance of MapGraph, a GPU-based graph framework, despite running on a CPU platform with significantly lower compute and bandwidth resources. It achieves better multicore scalability (13-15X on 24 cores) than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms. Since GraphMat performance depends mainly on a few scalable and well-understood sparse matrix operations, GraphMat can naturally benefit from the trend of increasing parallelism in future hardware.

Proceedings ArticleDOI
24 Jan 2015
TL;DR: This work evaluates Gunrock on five graph primitives and shows that Gunrock has at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.
Abstract: For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Journal ArticleDOI
TL;DR: A novel feature selection algorithm based on Ant Colony Optimization (ACO) called Advanced Binary ACO (ABACO), is presented and simulation results verify that the algorithm provides a suitable feature subset with good classification accuracy using a smaller feature set than competing feature selection methods.

Proceedings ArticleDOI
27 May 2015
TL;DR: This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies.
Abstract: The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

Posted Content
TL;DR: The robust Bayesian Committee Machine is introduced, a practical and scalable product-of-experts model for large-scale distributed GP regression and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters.
Abstract: To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.

Proceedings Article
07 Dec 2015
TL;DR: This work formulate and derive a highly efficient, conjugate gradient based alternating minimization scheme that solves optimizations with over 55 million observations up to 2 orders of magnitude faster than state-of-the-art (stochastic) gradient-descent based methods.
Abstract: Low rank matrix completion plays a fundamental role in collaborative filtering applications, the key idea being that the variables lie in a smaller subspace than the ambient space. Often, additional information about the variables is known, and it is reasonable to assume that incorporating this information will lead to better predictions. We tackle the problem of matrix completion when pairwise relationships among variables are known, via a graph. We formulate and derive a highly efficient, conjugate gradient based alternating minimization scheme that solves optimizations with over 55 million observations up to 2 orders of magnitude faster than state-of-the-art (stochastic) gradient-descent based methods. On the theoretical front, we show that such methods generalize weighted nuclear norm formulations, and derive statistical consistency guarantees. We validate our results on both real and synthetic datasets.

Book ChapterDOI
23 Mar 2015
TL;DR: In this article, a graph-induced multilinear encoding scheme from lattices was proposed, in which the arithmetic operations that are allowed are restricted through an explicitly defined directed graph (somewhat similar to the asymmetric variant of previous schemes).
Abstract: Graded multilinear encodings have found extensive applications in cryptography ranging from non-interactive key exchange protocols, to broadcast and attribute-based encryption, and even to software obfuscation. Despite seemingly unlimited applicability, essentially only two candidate constructions are known (GGH and CLT). In this work, we describe a new graph-induced multilinear encoding scheme from lattices. In a graph-induced multilinear encoding scheme the arithmetic operations that are allowed are restricted through an explicitly defined directed graph (somewhat similar to the “asymmetric variant” of previous schemes). Our construction encodes Learning With Errors (LWE) samples in short square matrices of higher dimensions. Addition and multiplication of the encodings corresponds naturally to addition and multiplication of the LWE secrets. Security of the new scheme is not known to follow from LWE hardness (or any other “nice” assumption), at present it requires making new hardness assumptions.

Journal ArticleDOI
02 Nov 2015-Sensors
TL;DR: The main contribution of the proposed methodology, when compared with the traditional vehicle routing problem’s (VRP) solutions, is the fact that the method solves some practical problems only encountered during the execution of the task with actual UAVs.
Abstract: This paper presents a solution for the problem of minimum time coverage of ground areas using a group of unmanned air vehicles (UAVs) equipped with image sensors. The solution is divided into two parts: (i) the task modeling as a graph whose vertices are geographic coordinates determined in such a way that a single UAV would cover the area in minimum time; and (ii) the solution of a mixed integer linear programming problem, formulated according to the graph variables defined in the first part, to route the team of UAVs over the area. The main contribution of the proposed methodology, when compared with the traditional vehicle routing problem’s (VRP) solutions, is the fact that our method solves some practical problems only encountered during the execution of the task with actual UAVs. In this line, one of the main contributions of the paper is that the number of UAVs used to cover the area is automatically selected by solving the optimization problem. The number of UAVs is influenced by the vehicles’ maximum flight time and by the setup time, which is the time needed to prepare and launch a UAV. To illustrate the methodology, the paper presents experimental results obtained with two hand-launched, fixed-wing UAVs.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.
Abstract: Piecewise smooth (PWS) images (e.g., depth maps or animation images) contain unique signal characteristics such as sharp object boundaries and slowly varying interior surfaces. Leveraging on recent advances in graph signal processing, in this paper, we propose to compress the PWS images using suitable graph Fourier transforms (GFTs) to minimize the total signal representation cost of each pixel block, considering both the sparsity of the signal’s transform coefficients and the compactness of transform description. Unlike fixed transforms, such as the discrete cosine transform, we can adapt GFT to a particular class of pixel blocks. In particular, we select one among a defined search space of GFTs to minimize total representation cost via our proposed algorithms, leveraging on graph optimization techniques, such as spectral clustering and minimum graph cuts. Furthermore, for practical implementation of GFT, we introduce two techniques to reduce computation complexity. First, at the encoder, we low-pass filter and downsample a high-resolution (HR) pixel block to obtain a low-resolution (LR) one, so that a LR-GFT can be employed. At the decoder, upsampling and interpolation are performed adaptively along HR boundaries coded using arithmetic edge coding, so that sharp object boundaries can be well preserved. Second, instead of computing GFT from a graph in real-time via eigen-decomposition, the most popular LR-GFTs are pre-computed and stored in a table for lookup during encoding and decoding. Using depth maps and computer-graphics images as examples of the PWS images, experimental results show that our proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.

Proceedings ArticleDOI
26 May 2015
TL;DR: It is shown that the use of rotation estimation to bootstrap iterative pose graph solvers entails significant boost in convergence speed and robustness.
Abstract: Pose graph optimization is the non-convex optimization problem underlying pose-based Simultaneous Localization and Mapping (SLAM). If robot orientations were known, pose graph optimization would be a linear least-squares problem, whose solution can be computed efficiently and reliably. Since rotations are the actual reason why SLAM is a difficult problem, in this work we survey techniques for 3D rotation estimation. Rotation estimation has a rich history in three scientific communities: robotics, computer vision, and control theory. We review relevant contributions across these communities, assess their practical use in the SLAM domain, and benchmark their performance on representative SLAM problems (Fig. 1). We show that the use of rotation estimation to bootstrap iterative pose graph solvers entails significant boost in convergence speed and robustness.

Journal ArticleDOI
TL;DR: A new co- expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) is developed by introducing quality control of co-expression similarities, parallelizing embedded network construction, and developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs).
Abstract: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Journal ArticleDOI
TL;DR: kml and kml3d are R packages providing an implementation of k-means designed to work specifically on trajectories (kml) or on joint trajectories(kml3D), and they offer graphic facilities to “visualize” the trajectories, either in 2D or 3D (joint-trajectories).
Abstract: Longitudinal studies are essential tools in medical research. In these studies, variables are not restricted to single measurements but can be seen as variable-trajectories, either single or joint. Thus, an important question concerns the identification of homogeneous patient trajectories.kml and kml3d are R packages providing an implementation of k-means designed to work specifically on trajectories (kml) or on joint trajectories (kml3d). They provide various tools to work on longitudinal data: imputation methods for trajectories (nine classic and one original), methods to define starting conditions in k-means (four classic and three original) and quality criteria to choose the best number of clusters (four classic and one original). In addition, they offer graphic facilities to “visualize” the trajectories, either in 2D (single trajectory) or 3D (joint-trajectories). The 3D graph representing the mean joint-trajectories of each cluster can be exported through LATEX in a 3D dynamic rotating PDF graph (Figures 1 and 9).

Posted Content
TL;DR: In this article, a graph-based three-stage pipeline is proposed to detect controversy in social media, which involves building a conversation graph about a topic, partitioning the conversation graph to identify potential sides of the controversy, and measuring the amount of controversy from characteristics of the graph.
Abstract: Which topics spark the most heated debates on social media? Identifying those topics is not only interesting from a societal point of view, but also allows the filtering and aggregation of social media content for disseminating news stories. In this paper, we perform a systematic methodological study of controversy detection by using the content and the network structure of social media. Unlike previous work, rather than study controversy in a single hand-picked topic and use domain specific knowledge, we take a general approach to study topics in any domain. Our approach to quantifying controversy is based on a graph-based three-stage pipeline, which involves (i) building a conversation graph about a topic; (ii) partitioning the conversation graph to identify potential sides of the controversy; and (iii) measuring the amount of controversy from characteristics of the graph. We perform an extensive comparison of controversy measures, different graph-building approaches, and data sources. We use both controversial and non-controversial topics on Twitter, as well as other external datasets. We find that our new random-walk-based measure outperforms existing ones in capturing the intuitive notion of controversy, and show that content features are vastly less helpful in this task.

Journal ArticleDOI
TL;DR: A population reference graph is introduced, which combines multiple reference sequences and catalogs of variation into an efficient hidden Markov model and improves the accuracy of genome inference and identified regions where the current set of reference sequences is substantially incomplete.
Abstract: Although much is known about human genetic variation, such information is typically ignored in assembling new genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogs of variation. The genomes of new samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5-Mb extended MHC region on human chromosome 6, combining 8 assembled haplotypes, the sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate using simulations, SNP genotyping, and short-read and long-read data how the method improves the accuracy of genome inference and identified regions where the current set of reference sequences is substantially incomplete.

Journal ArticleDOI
TL;DR: A taxonomy of three general classes of techniques for discovering roles that includes (i) graph-based roles, (ii) feature- based roles, and (iii) hybrid roles is proposed, which consists of two fundamental components: (a) role feature construction and (b) role assignment using the learned feature representation.
Abstract: Roles represent node-level connectivity patterns such as star-center, star-edge nodes, near-cliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are structurally similar. Roles have been mainly of interest to sociologists, but more recently, roles have become increasingly useful in other domains. Traditionally, the notion of roles were defined based on graph equivalences such as structural, regular, and stochastic equivalences. We briefly revisit these early notions and instead propose a more general formulation of roles based on the similarity of a feature representation (in contrast to the graph representation). This leads us to propose a taxonomy of three general classes of techniques for discovering roles that includes(i) graph-based roles, (ii) feature-based roles, and (iii) hybrid roles. We also propose a flexible framework for discovering roles using the notion of similarity on a feature-based representation. The framework consists of two fundamental components: (a) role featureconstruction and (b) role assignment using the learned feature representation. We discuss the different possibilities for discoveringfeature-based roles and the tradeoffs of the many techniques for computing them. Finally, we discuss potential applications and future directions and challenges.

Proceedings ArticleDOI
04 Oct 2015
TL;DR: Arabesque is presented, the first distributed data processing platform for implementing graph mining algorithms that automates the process of exploring a very large number of subgraphs and defines a high-level filter-process computational model that simplifies the development of scalableGraph mining algorithms.
Abstract: Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics.In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

Journal ArticleDOI
TL;DR: This article addresses the problem of inferring multiple undirected networks in situations where some of the networks may be unrelated, while others share common features, and proposes a Bayesian approach to inference on multiple Gaussian graphical models.
Abstract: In this article, we propose a Bayesian approach to inference on multiple Gaussian graphical models. Specifically, we address the problem of inferring multiple undirected networks in situations where some of the networks may be unrelated, while others share common features. We link the estimation of the graph structures via a Markov random field (MRF) prior, which encourages common edges. We learn which sample groups have a shared graph structure by placing a spike-and-slab prior on the parameters that measure network relatedness. This approach allows us to share information between sample groups, when appropriate, as well as to obtain a measure of relative network similarity across groups. Our modeling framework incorporates relevant prior knowledge through an edge-specific informative prior and can encourage similarity to an established network. Through simulations, we demonstrate the utility of our method in summarizing relative network similarity and compare its performance against related methods. We ...

Journal ArticleDOI
TL;DR: Results of this study show that pattern recognition and graph of brain network, on the basis of the resting state fMRI data, can efficiently assist in the diagnosis of Alzheimer's disease.

Journal ArticleDOI
TL;DR: Several choices of Lyapunov equations over various graph topologies are presented, which play an important role in controller design and stability analysis of multi-agent systems.