Showing papers on "Graph (abstract data type) published in 2008"

PDF

Open Access

Journal Article•DOI•

Fast unfolding of communities in large networks

[...]

Vincent D. Blondel¹, Jean-Loup Guillaume², Jean-Loup Guillaume¹, Renaud Lambiotte³, Renaud Lambiotte¹, Etienne Lefebvre¹ - Show less +2 more•Institutions (3)

Université catholique de Louvain¹, Pierre-and-Marie-Curie University², Imperial College London³

04 Mar 2008-arXiv: Physics and Society

TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.

...read moreread less

Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

...read moreread less

13,519 citations

Journal Article•DOI•

Fast unfolding of communities in large networks

[...]

Vincent D. Blondel¹, Jean-Loup Guillaume², Jean-Loup Guillaume¹, Renaud Lambiotte³, Renaud Lambiotte¹, Etienne Lefebvre¹ - Show less +2 more•Institutions (3)

Université catholique de Louvain¹, Pierre-and-Marie-Curie University², Imperial College London³

01 Oct 2008-Journal of Statistical Mechanics: Theory and Experiment

TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.

...read moreread less

Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

...read moreread less

11,078 citations

Journal Article•DOI•

On Consensus Algorithms for Double-Integrator Dynamics

[...]

Wei Ren¹•Institutions (1)

Utah State University¹

29 Aug 2008-IEEE Transactions on Automatic Control

TL;DR: This note shows that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected and for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound.

...read moreread less

Abstract: This note considers consensus algorithms for double-integrator dynamics. We propose and analyze consensus algorithms for double-integrator dynamics in four cases: 1) with a bounded control input, 2) without relative velocity measurements, 3) with a group reference velocity available to each team member, and 4) with a bounded control input when a group reference state is available to only a subset of the team. We show that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected. We further show that consensus is reached asymptotically for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound. We also show that consensus is reached asymptotically for the fourth case if and only if the group reference state flows directly or indirectly to all of the vehicles in the team.

...read moreread less

1,338 citations

Proceedings Article•DOI•

Understanding mobility based on GPS data

[...]

Yu Zheng¹, Quannan Li¹, Yukun Chen¹, Xing Xie¹, Wei-Ying Ma¹ - Show less +1 more•Institutions (1)

Microsoft¹

21 Sep 2008

TL;DR: An approach based on supervised learning to infer people's motion modes from their GPS logs is proposed, which identifies a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used.

...read moreread less

Abstract: Both recognizing human behavior and understanding a user's mobility from sensor data are critical issues in ubiquitous computing systems As a kind of user behavior, the transportation modes, such as walking, driving, etc, that a user takes, can enrich the user's mobility with informative knowledge and provide pervasive computing systems with more context information In this paper, we propose an approach based on supervised learning to infer people's motion modes from their GPS logs The contribution of this work lies in the following two aspects On one hand, we identify a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used On the other hand, we propose a graph-based post-processing algorithm to further improve the inference performance This algorithm considers both the commonsense constraint of real world and typical user behavior based on location in a probabilistic manner Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments As a result, based on the change point-based segmentation method and Decision Tree-based inference model, the new features brought an eight percent improvement in inference accuracy over previous result, and the graph-based post-processing achieve a further four percent enhancement

...read moreread less

1,054 citations

Proceedings Article•DOI•

Towards identity anonymization on graphs

[...]

Kun Liu¹, Evimaria Terzi¹•Institutions (1)

IBM¹

09 Jun 2008

TL;DR: This work formally defines the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations, and devise simple and efficient algorithms for solving this problem.

...read moreread less

Abstract: The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals. To address this issue, we study a specific graph-anonymization problem. We call a graph k-degree anonymous if for every node v, there exist at least k-1 other nodes in the graph with the same degree as v. This definition of anonymity prevents the re-identification of individuals by adversaries with a priori knowledge of the degree of certain nodes. We formally define the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations. We devise simple and efficient algorithms for solving this problem. Our algorithms are based on principles related to the realizability of degree sequences. We apply our methods to a large spectrum of synthetic and real datasets and demonstrate their efficiency and practical utility.

...read moreread less

819 citations

Journal Article•DOI•

Meta-regression in Stata

[...]

Roger M. Harbord¹, Julian P T Higgins•Institutions (1)

University of Bristol¹

01 Dec 2008-Stata Journal

TL;DR: A revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data, is presented, which involves improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values.

...read moreread less

Abstract: We present a revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data. The ma- jor revisions involve improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values, including an adjustment for multiple testing. We have also made additions to the output, added an option to produce a graph, and included support for the predict command. Stata 8.0 or above is required.

...read moreread less

794 citations

Journal Article•DOI•

Improved seam carving for video retargeting

[...]

Michael Rubinstein¹, Ariel Shamir², Shai Avidan³•Institutions (3)

Mitsubishi Electric Research Laboratories¹, Interdisciplinary Center Herzliya², Adobe Systems³

01 Aug 2008

TL;DR: This work replaces the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes and presents a novel energy criterion that improves the visual quality of the retargeted images and videos.

...read moreread less

Abstract: Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applications.

...read moreread less

775 citations

Journal Article•DOI•

Label Propagation through Linear Neighborhoods

[...]

Fei Wang¹, Changshui Zhang¹•Institutions (1)

Tsinghua University¹

01 Jan 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood, and can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness.

...read moreread less

Abstract: In many practical data mining applications such as text classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi supervised learning algorithms have aroused considerable interests from the data mining and machine learning fields. In recent years, graph-based semi supervised learning has been becoming one of the most active research areas in the semi supervised learning community. In this paper, a novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named linear neighborhood propagation (LNP), can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness. A theoretical analysis of the properties of LNP is presented in this paper. Furthermore, we also derive an easy way to extend LNP to out-of-sample data. Promising experimental results are presented for synthetic data, digit, and text classification tasks.

...read moreread less

720 citations

Proceedings Article•DOI•

Planetary-scale views on a large instant-messaging network

[...]

Jure Leskovec¹, Eric Horvitz²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

21 Apr 2008

TL;DR: It is found that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.

...read moreread less

Abstract: We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.

...read moreread less

641 citations

Journal Article•DOI•

An efficient error-propagation-based reduction method for large chemical kinetic mechanisms

[...]

P. Pepiot-Desjardins¹, Heinz Pitsch¹•Institutions (1)

Stanford University¹

01 Jul 2008-Combustion and Flame

TL;DR: An application of the reduction procedure is presented for autoignition using a large iso-octane mechanism, which is automatic, is fast, has moderate CPU and memory requirements, and compares favorably to other existing techniques.

...read moreread less

622 citations

Journal Article•DOI•

Reaching a Consensus in a Dynamically Changing Environment: A Graphical Approach

[...]

Ming Cao, A. Stephen Morse, Brian D. O. Anderson

01 Mar 2008-Siam Journal on Control and Optimization

TL;DR: Graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.

...read moreread less

Abstract: This paper presents new graph-theoretic results appropriate for the analysis of a variety of consensus problems cast in dynamically changing environments. The concepts of rooted, strongly rooted, and neighbor-shared are defined, and conditions are derived for compositions of sequences of directed graphs to be of these types. The graph of a stochastic matrix is defined, and it is shown that under certain conditions the graph of a Sarymsakov matrix and a rooted graph are one and the same. As an illustration of the use of the concepts developed in this paper, graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.

...read moreread less

Journal Article•DOI•

The discovery of structural form

[...]

Charles Kemp¹, Joshua B. Tenenbaum•Institutions (1)

Carnegie Mellon University¹

05 Aug 2008-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This work presents a computational model that learns structures of many different forms and that discovers which form is best for a given dataset and brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development.

...read moreread less

Abstract: Algorithms for finding structure in data have become increasingly important both as tools for scientific data analysis and as models of human learning, yet they suffer from a critical limitation Scientists discover qualitatively new forms of structure in observed data: For instance, Linnaeus recognized the hierarchical organization of biological species, and Mendeleev recognized the periodic structure of the chemical elements Analogous insights play a pivotal role in cognitive development: Children discover that object category labels can be organized into hierarchies, friendship networks are organized into cliques, and comparative relations (eg, "bigger than" or "better than") respect a transitive order Standard algorithms, however, can only learn structures of a single form that must be specified in advance: For instance, algorithms for hierarchical clustering create tree structures, whereas algorithms for dimensionality-reduction create low-dimensional spaces Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains Our approach brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development

...read moreread less

Journal Article•DOI•

Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph.

[...]

Yonghui Wu¹, Prasanna R. Bhat¹, Timothy J. Close¹, Stefano Lonardi¹•Institutions (1)

University of California, Riverside¹

10 Oct 2008-PLOS Genetics

TL;DR: This paper introduces a novel algorithm to order markers on a genetic linkage map based on a simple yet fundamental mathematical property that proves the validity of this property, and shows that it consistently outperforms the best available methods in the literature.

...read moreread less

Abstract: Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap.

...read moreread less

Journal Article•DOI•

Ontologizer 2.0—a multifunctional tool for GO term enrichment analysis and data exploration

[...]

Sebastian Bauer¹, Steffen Grossmann¹, Martin Vingron¹, Peter N. Robinson¹•Institutions (1)

Charité¹

15 Jul 2008-Bioinformatics

TL;DR: The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms.

...read moreread less

Abstract: Summary: The Ontologizer is a Java application that can be used to perform statistical analysis for overrepresentation of Gene Ontology (GO) terms in sets of genes or proteins derived from an experiment. The Ontologizer implements the standard approach to statistical analysis based on the one-sided Fisher's exact test, the novel parent–child method, as well as topology-based algorithms. A number of multiple-testing correction procedures are provided. The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms. Availability: The Ontologizer application is available under the terms of the GNU GPL. It can be started as a WebStart application from the project homepage, where source code is also provided: http://compbio.charite.de/ontologizer Requirements: Ontologizer requires a Java SE 5.0 compliant Java runtime engine and GraphViz for the optional graph visualization tool. Contact:sebastian.bauer@charite.de; peter.robinson@charite.de

...read moreread less

Journal Article•DOI•

Stabilization of Planar Collective Motion With Limited Communication

[...]

Rodolphe Sepulchre, Derek A. Paley¹, Naomi Ehrich Leonard²•Institutions (2)

University of Maryland, College Park¹, Princeton University²

08 Apr 2008-IEEE Transactions on Automatic Control

TL;DR: A design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed to show how previous results assuming all-to-all communication can be extended to a general communication framework.

...read moreread less

Abstract: This paper proposes a design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed. Relative equilibria either correspond to parallel motion of all particles with fixed relative spacing or to circular motion of all particles around the same circle. Particles exchange relative information according to a communication graph that can be undirected or directed and time-invariant or time-varying. The emphasis of this paper is to show how previous results assuming all-to-all communication can be extended to a general communication framework.

...read moreread less

Journal Article•DOI•

Laplacian Dynamics and Multiscale Modular Structure in Networks

[...]

Renaud Lambiotte, Jean-Charles Delvenne, Mauricio Barahona

09 Dec 2008-arXiv: Physics and Society

TL;DR: In this article, the stability of a network partition is defined in terms of the statistical properties of a dy namical process taking place on the graph, and the connection between community detection and Laplacian dynamics enables them to establish dynamically motivated stability measures linked to distinct null models.

...read moreread less

Abstract: Most methods proposed to uncover communities in complex networks rely on their structural properties. Here we introduce the stability of a network partition, a measure of its quality defined in terms of the statistical properties of a dy namical process taking place on the graph. The time-scale of the process acts as an intrinsic parameter that uncovers community structures at different resolutions. The stability extends and unifies standard notions for community detection: modularity and spectral partitioning can be seen as limiting cases of our dynamic measure. Similarly, recently proposed multi-resolution methods correspond to linearisations of the stability at short times. The connection between community detection and Laplacian dynamics enables us to establish dynamically motivated stability measures linked to distinct null models. We apply our method to find multi-scale partitions for different networks and show that the stability can be computed

...read moreread less

Proceedings Article•DOI•

Graphs-at-a-time: query language and access methods for graph databases

[...]

Huahai He¹, Ambuj K. Singh¹•Institutions (1)

University of California, Santa Barbara¹

09 Jun 2008

TL;DR: A graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs is presented and access methods of the selectionoperator are investigated.

...read moreread less

Abstract: With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

...read moreread less

Proceedings Article•DOI•

Non-negative Matrix Factorization on Manifold

[...]

Deng Cai¹, Xiaofei He², Xiaoyun Wu, Jiawei Han¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Zhejiang University²

15 Dec 2008

TL;DR: This paper construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure and demonstrates the success of this novel algorithm by applying it on real world problems.

...read moreread less

Abstract: Recently non-negative matrix factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two non-negative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural parts-based representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for parts-based data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems.

...read moreread less

Proceedings Article•DOI•

Topic modeling with network regularization

[...]

Qiaozhu Mei¹, Deng Cai¹, Duo Zhang¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

21 Apr 2008

TL;DR: A novel solution to the problem of topic modeling with network structure (TMN) is proposed, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data.

...read moreread less

Abstract: In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method bridges topic modeling and social network analysis, which leverages the power of both statistical topic models and discrete regularization. The output of this model well summarizes topics in text, maps a topic on the network, and discovers topical communities. With concrete selection of a topic model and a graph-based regularizer, our model can be applied to text mining problems such as author-topic analysis, community discovery, and spatial text mining. Empirical experiments on two different genres of data show that our approach is effective, which improves text-oriented methods as well as network-oriented methods. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure.

...read moreread less

Journal Article•DOI•

A Necessary and Sufficient Condition for Consensus Over Random Networks

[...]

Alireza Tahbaz-Salehi¹, Ali Jadbabaie¹•Institutions (1)

University of Pennsylvania¹

08 Apr 2008-IEEE Transactions on Automatic Control

TL;DR: In this paper, a necessary and sufficient condition for almost sure asymptotic consensus using simple ergodicity and probabilistic arguments is presented. This easily verifiable condition uses the spectrum of the average weight matrix.

...read moreread less

Abstract: We consider the consensus problem for stochastic discrete-time linear dynamical systems. The underlying graph of such systems at a given time instance is derived from a random graph process, independent of other time instances. For such a framework, we present a necessary and sufficient condition for almost sure asymptotic consensus using simple ergodicity and probabilistic arguments. This easily verifiable condition uses the spectrum of the average weight matrix. Finally, we investigate a special case for which the linear dynamical system converges to a fixed vector with probability 1.

...read moreread less

Journal Article•DOI•

Graph states for quantum secret sharing

[...]

Damian Markham¹, Barry C. Sanders²•Institutions (2)

University of Tokyo¹, University of Calgary²

10 Oct 2008-Physical Review A

TL;DR: This approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players.

...read moreread less

Abstract: We consider three broad classes of quantum secret sharing with and without eavesdropping and show how a graph state formalism unifies otherwise disparate quantum secret sharing models. In addition to the elegant unification provided by graph states, our approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players. Another innovation here is the introduction of embedded protocols within a larger graph state that serves as a one-way quantum-information processing system.

...read moreread less

Journal Article•DOI•

An annotated bibliography on guaranteed graph searching

[...]

Fedor V. Fomin¹, Dimitrios M. Thilikos²•Institutions (2)

University of Bergen¹, National and Kapodistrian University of Athens²

10 Jun 2008-Theoretical Computer Science

TL;DR: This annotated bibliography gives an elementary classification of problems and results related to graph searching and provides a source of bibliographical references on this field.

...read moreread less

Proceedings Article•DOI•

Graph summarization with bounded error

[...]

Saket Navlakha¹, Rajeev Rastogi², Nisheeth Shrivastava³•Institutions (3)

University of Maryland, College Park¹, Yahoo!², Bell Labs³

09 Jun 2008

TL;DR: This is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.

...read moreread less

Abstract: We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple real-life graph data sets.To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.

...read moreread less

Proceedings Article•DOI•

SPARQL basic graph pattern optimization using selectivity estimation

[...]

Markus Stocker¹, Andy Seaborne¹, Abraham Bernstein², Christoph Kiefer², Dave Reynolds¹ - Show less +1 more•Institutions (2)

Hewlett-Packard¹, University of Zurich²

21 Apr 2008

TL;DR: This paper formalizes the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data and defines and analyzes the characteristics of heuristics for selectivity-based static BGP optimization.

...read moreread less

Abstract: In this paper, we formalize the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data. We define and analyze the characteristics of heuristics for selectivity-based static BGP optimization. The heuristics range from simple triple pattern variable counting to more sophisticated selectivity estimation techniques. Customized summary statistics for RDF data enable the selectivity estimation of joined triple patterns and the development of efficient heuristics. Using the Lehigh University Benchmark (LUBM), we evaluate the performance of the heuristics for the queries provided by the LUBM and discuss some of them in more details.

...read moreread less

Proceedings Article•DOI•

The query-flow graph: model and applications

[...]

Paolo Boldi¹, Francesco Bonchi², Carlos Castillo², Debora Donato², Aristides Gionis², Sebastiano Vigna¹ - Show less +2 more•Institutions (2)

University of Milan¹, Yahoo!²

26 Oct 2008

TL;DR: This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users.

...read moreread less

Abstract: Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search engine results. Mining the wealth of information available in the query logs has many important applications including query-log analysis, user profiling and personalization, advertising, query recommendation, and more.In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same "search mission". Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path.The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world query-flow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.

...read moreread less

Proceedings Article•

On the approximability of influence in social networks

[...]

Ning Chen¹•Institutions (1)

University of Washington¹

20 Jan 2008

TL;DR: The main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor.

...read moreread less

Abstract: In this paper, we study the spread of influence through a social network, in a model initially studied by Kempe, Kleinberg and Tardos [14, 15]: We are given a graph modeling a social network, where each node v has a (fixed) threshold tv, such that the node will adopt a new product if tv of its neighbors adopt it. Our goal is to find a small set S of nodes such that targeting the product to S would lead to adoption of the product by a large number of nodes in the graph. We show strong inapproximability results for several variants of this problem. Our main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor. This implies similar results if only a fixed fraction of the network is ensured to adopt the product. Further, the hardness of approximation result continues to hold when all nodes have majority thresholds, or have constant degree and threshold two. The latter answers a complexity question proposed in [10, 29]. We also give some positive results for more restricted cases, such as when the underlying graph is a tree.

...read moreread less

Proceedings Article•DOI•

Mining significant graph patterns by leap search

[...]

Xifeng Yan¹, Hong Cheng², Jiawei Han², Philip S. Yu³•Institutions (3)

IBM¹, University of Illinois at Urbana–Champaign², University of Illinois at Chicago³

09 Jun 2008

TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

...read moreread less

Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

...read moreread less

Proceedings Article•DOI•

Preventing Memory Error Exploits with WIT

[...]

Periklis Akritidis¹, Cristian Cadar¹, Costin Raiciu¹, Manuel Costa¹, Miguel Castro¹ - Show less +1 more•Institutions (1)

Microsoft¹

18 May 2008

TL;DR: This work presents write integrity testing (WIT), a new technique that provides practical protection from memory errors that compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead.

...read moreread less

Abstract: Attacks often exploit memory errors to gain control over the execution of vulnerable programs. These attacks remain a serious problem despite previous research on techniques to prevent them. We present write integrity testing (WIT), a new technique that provides practical protection from these attacks. WIT uses points-to analysis at compile time to compute the control-flow graph and the set of objects that can be written by each instruction in the program. Then it generates code instrumented to prevent instructions from modifying objects that are not in the set computed by the static analysis, and to ensure that indirect control transfers are allowed by the control-flow graph. To improve coverage where the analysis is not precise enough, WIT inserts small guards between the original program objects. We describe an efficient implementation with optimizations to reduce space and time overhead. This implementation can be used in practice because it compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead. WIT's average runtime overhead is only 7% across a set of CPU intensive benchmarks and it is negligible when IO is the bottleneck.

...read moreread less

Proceedings Article•DOI•

Recognizing human actions using multiple features

[...]

Jingen Liu¹, Saad Ali¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Jun 2008

TL;DR: A framework that fuses multiple features for improved action recognition in videos by treating different features as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities.

...read moreread less

Abstract: In this paper, we propose a framework that fuses multiple features for improved action recognition in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use two types of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images, which aims to capture the shape deformation of the actor by considering actions as 3D objects (x, y, t). To optimally combine these features, we treat different features as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by converting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler embedding. The performance of the proposed framework is tested on publicly available data sets. The results demonstrate that fusion of multiple features helps in achieving improved performance, and allows retrieval of meaningful features and videos from the embedding space.

...read moreread less

Proceedings Article•DOI•

TALE: A Tool for Approximate Large Graph Matching

[...]

Yuanyuan Tian¹, Jignesh M. Patel¹•Institutions (1)

University of Michigan¹

07 Apr 2008

TL;DR: A novel indexing method that incorporates graph structural information in a hybrid index structure that achieves high pruning power and the index size scales linearly with the database size is proposed.

...read moreread less

Abstract: Large graph datasets are common in many emerging database applications, and most notably in large-scale scientific applications. To fully exploit the wealth of information encoded in graphs, effective and efficient graph matching tools are critical. Due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact, graph matching is required. Furthermore, many modern applications need to query large graphs, each of which has hundreds to thousands of nodes and edges. This paper presents a novel technique for approximate matching of large graph queries. We propose a novel indexing method that incorporates graph structural information in a hybrid index structure. This indexing technique achieves high pruning power and the index size scales linearly with the database size. In addition, we propose an innovative matching paradigm to query large graphs. This technique distinguishes nodes by their importance in the graph structure. The matching algorithm first matches the important nodes of a query and then progressively extends these matches. Through experiments on several real datasets, this paper demonstrates the effectiveness and efficiency of the proposed method.

...read moreread less

Collapse