scispace - formally typeset
Search or ask a question

Showing papers on "Graph (abstract data type) published in 2011"


Proceedings ArticleDOI
09 May 2011
TL;DR: G2o, an open-source C++ framework for optimizing graph-based nonlinear error functions, is presented and demonstrated that while being general g2o offers a performance comparable to implementations of state-of-the-art approaches for the specific problems.
Abstract: Many popular problems in robotics and computer vision including various types of simultaneous localization and mapping (SLAM) or bundle adjustment (BA) can be phrased as least squares optimization of an error function that can be represented by a graph. This paper describes the general structure of such problems and presents g2o, an open-source C++ framework for optimizing graph-based nonlinear error functions. Our system has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA. We provide evaluations on a wide range of real-world and simulated datasets. The results demonstrate that while being general g2o offers a performance comparable to implementations of state-of-the-art approaches for the specific problems.

2,192 citations


Journal ArticleDOI
TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Abstract: Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.

1,870 citations


01 Jan 2011
TL;DR: In this article, a new ACO model that overcomes the difficulties found when working with a huge construction graph is presented. But it is not suitable when the graph size can be a challenge for the computer memory and cannot be completely generated or stored in it.
Abstract: Ant Colony Optimization (ACO) has been successfully applied to those combinatorial optimization problems which can be translated into a graph exploration. Artificial ants build solutions step by step adding solution components that are represented by graph nodes. The existing ACO algorithms are suitable when the graph is not very large (thousands of nodes) but is not useful when the graph size can be a challenge for the computer memory and cannot be completely generated or stored in it. In this paper we study a new ACO model that overcomes the difficulties found when working with a huge construction graph. In addition to the description of the model, we analyze in the experimental section one technique used for dealing with this huge graph exploration. The results of the analysis can help to understand the meaning of the new parameters introduced and to decide which parameterization is more suitable for a given problem. For the experiments we use one real problem with capital importance in Software Engineering: refutation of safety properties in concurrent systems. This way, we foster an innovative research line related to the application of ACO to formal methods in Software Engineering.

1,323 citations


Posted Content
TL;DR: A strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality are observed, but it is shown that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure.
Abstract: We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the graph including the number of users and friendships, the degree distribution, path lengths, clustering, and mixing patterns. Our results center around three main observations. First, we characterize the global structure of the graph, determining that the social network is nearly fully connected, with 99.91% of individuals belonging to a single large connected component, and we confirm the "six degrees of separation" phenomenon on a global scale. Second, by studying the average local clustering coefficient and degeneracy of graph neighborhoods, we show that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure. Third, we characterize the assortativity patterns present in the graph by studying the basic demographic and network properties of users. We observe clear degree assortativity and characterize the extent to which "your friends have more friends than you". Furthermore, we observe a strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality, but we do not find any strong gender homophily. We compare our results with those from smaller social networks and find mostly, but not entirely, agreement on common structural network characteristics.

938 citations


Proceedings Article
27 Jul 2011
TL;DR: A robust method for collective disambiguation is presented, by harnessing context from knowledge bases and using a new form of coherence graph that significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
Abstract: Disambiguating named entities in natural-language text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.

898 citations


Proceedings Article
27 Jul 2011
TL;DR: It is shown that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for theknowledge base.
Abstract: We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL's earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.

614 citations


Journal ArticleDOI
Miao Zheng1, Jiajun Bu1, Chun Chen1, Can Wang1, Lijun Zhang1, Guang Qiu2, Deng Cai1 
TL;DR: A graph based algorithm, called graph regularized sparse coding, is proposed, to learn the sparse representations that explicitly take into account the local manifold structure of the data.
Abstract: Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing high-level semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual cortex, sparse coding has been shown useful for many applications. However, most of the existing approaches to sparse coding fail to consider the geometrical structure of the data space. In many real applications, the data is more likely to reside on a low-dimensional submanifold embedded in the high-dimensional ambient space. It has been shown that the geometrical information of the data is important for discrimination. In this paper, we propose a graph based algorithm, called graph regularized sparse coding, to learn the sparse representations that explicitly take into account the local manifold structure of the data. By using graph Laplacian as a smooth operator, the obtained sparse representations vary smoothly along the geodesics of the data manifold. The extensive experimental results on image classification and clustering have demonstrated the effectiveness of our proposed algorithm.

576 citations


Proceedings ArticleDOI
28 Mar 2011
TL;DR: N-Descent is presented, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures that typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.
Abstract: K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Existing methods for K-NNG construction either do not scale, or are specific to certain similarity measures. We present NN-Descent, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures. Our method is based on local search, has minimal space overhead and does not rely on any shared global index. Hence, it is especially suitable for large-scale applications where data structures need to be distributed over the network. We have shown with a variety of datasets and similarity measures that the proposed method typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.

554 citations


Journal ArticleDOI
TL;DR: In this article, a necessary and sufficient condition for consensusability under a common control protocol is given, which explicitly reveals how the intrinsic entropy rate of the agent dynamic and the communication graph jointly affect consensusability.
Abstract: This paper investigates the joint effect of agent dynamic, network topology and communication data rate on consensusability of linear discrete-time multi-agent systems. Neglecting the finite communication data rate constraint and under undirected graphs, a necessary and sufficient condition for consensusability under a common control protocol is given, which explicitly reveals how the intrinsic entropy rate of the agent dynamic and the communication graph jointly affect consensusability. The result is established by solving a discrete-time simultaneous stabilization problem. A lower bound of the optimal convergence rate to consensus, which is shown to be tight for some special cases, is provided as well. Moreover, a necessary and sufficient condition for formationability of multi-agent systems is obtained. As a special case, the discrete-time second-order consensus is discussed where an optimal control gain is designed to achieve the fastest convergence. The effects of undirected graphs on consensability/formationability and optimal convergence rate are exactly quantified by the ratio of the second smallest to the largest eigenvalues of the graph Laplacian matrix. An extension to directed graphs is also made. The consensus problem under a finite communication data rate is finally investigated.

537 citations


Journal ArticleDOI
TL;DR: This State‐of‐the‐Art Report surveys available techniques for the visual analysis of large graphs and discusses various graph algorithmic aspects useful for the different stages of the visual graph analysis process.
Abstract: The analysis of large graphs plays a prominent role in various fields of research and is relevant in many important application areas. Effective visual analysis of graphs requires appropriate visual presentations in combination with respective user interaction facilities and algorithmic graph analysis methods. How to design appropriate graph analysis systems depends on many factors, including the type of graph describing the data, the analytical task at hand and the applicability of graph analysis methods. The most recent surveys of graph visualization and navigation techniques cover techniques that had been introduced until 2000 or concentrate only on graph layouts published until 2002. Recently, new techniques have been developed covering a broader range of graph types, such as timevarying graphs. Also, in accordance with ever growing amounts of graph-structured data becoming available, the inclusion of algorithmic graph analysis and interaction techniques becomes increasingly important. In this State-of-the-Art Report, we survey available techniques for the visual analysis of large graphs. Our review first considers graph visualization techniques according to the type of graphs supported. The visualization techniques form the basis for the presentation of interaction approaches suitable for visual graph exploration. As an important component of visual graph analysis, we discuss various graph algorithmic aspects useful for the different stages of the visual graph analysis process. We also present main open research challenges in this field.

518 citations


Proceedings ArticleDOI
09 May 2011
TL;DR: The algorithm incrementally constructs a graph of trajectories through state space, while efficiently searching over candidate paths through the graph at each iteration results in a search tree in belief space that provably converges to the optimal path.
Abstract: In this paper we address the problem of motion planning in the presence of state uncertainty, also known as planning in belief space. The work is motivated by planning domains involving nontrivial dynamics, spatially varying measurement properties, and obstacle constraints. To make the problem tractable, we restrict the motion plan to a nominal trajectory stabilized with a linear estimator and controller. This allows us to predict distributions over future states given a candidate nominal trajectory. Using these distributions to ensure a bounded probability of collision, the algorithm incrementally constructs a graph of trajectories through state space, while efficiently searching over candidate paths through the graph at each iteration. This process results in a search tree in belief space that provably converges to the optimal path. We analyze the algorithm theoretically and also provide simulation results demonstrating its utility for balancing information gathering to reduce uncertainty and finding low cost paths.

Journal ArticleDOI
TL;DR: A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank.
Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

Journal ArticleDOI
01 Aug 2011
TL;DR: This work introduces, to its knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs, which focuses on the optimization opportunities presented by the large space of configuration parameters for these programs.
Abstract: MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. We introduce, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. We focus on the optimization opportunities presented by the large space of configuration parameters for these programs. We also introduce a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. All components have been prototyped for the popular Hadoop MapReduce system. The effectiveness of each component is demonstrated through a comprehensive evaluation using representative MapReduce programs from various application domains.

Proceedings ArticleDOI
24 Jul 2011
TL;DR: Experimental results show that the proposed graph-based collective EL method can achieve significant performance improvement over the traditional EL methods, and the purely collective nature of the inference algorithm, in which evidence for related EL decisions can be reinforced into high-probability decisions.
Abstract: Entity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional interdependence between different EL decisions, i.e., the entities in the same document should be semantically related to each other. In these cases, Collective Entity Linking, in which the name mentions in the same document are linked jointly by exploiting the interdependence between them, can improve the entity linking accuracy. This paper proposes a graph-based collective EL method, which can model and exploit the global interdependence between different EL decisions. Specifically, we first propose a graph-based representation, called Referent Graph, which can model the global interdependence between different EL decisions. Then we propose a collective inference algorithm, which can jointly infer the referent entities of all name mentions by exploiting the interdependence captured in Referent Graph. The key benefit of our method comes from: 1) The global interdependence model of EL decisions; 2) The purely collective nature of the inference algorithm, in which evidence for related EL decisions can be reinforced into high-probability decisions. Experimental results show that our method can achieve significant performance improvement over the traditional EL methods.

01 Jan 2011
TL;DR: A graph-theoretic definition of connectivity is provided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties.
Abstract: In this paper, we provide a theoretical framework for controlling graph connectivity in mobile robot networks. We discuss proximity-based communication models composed of disk-based or uniformly-fading-signal-strength communica- tion links. A graph-theoretic definition of connectivity is pro- vided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties. Based on these results, we discuss centralized and distributed algorithms to maintain, increase, and control connectivity in mobile robot networks. The various approaches discussed in this paper range from convex optimization and subgradient-descent algo- rithms, for the maximization of the algebraic connectivity of the network, to potential fields and hybrid systems that main- tain communication links or control the network topology in a least restrictive manner. Common to these approaches is the use of mobility to control the topology of the underlying com- munication network. We discuss applications of connectivity control to multirobot rendezvous, flocking and formation con- trol, where so far, network connectivity has been considered an assumption.

Journal ArticleDOI
12 Jul 2011
TL;DR: In this paper, a graph-theoretic definition of connectivity is provided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties.
Abstract: In this paper, we provide a theoretical framework for controlling graph connectivity in mobile robot networks. We discuss proximity-based communication models composed of disk-based or uniformly-fading-signal-strength communication links. A graph-theoretic definition of connectivity is provided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties. Based on these results, we discuss centralized and distributed algorithms to maintain, increase, and control connectivity in mobile robot networks. The various approaches discussed in this paper range from convex optimization and subgradient-descent algorithms, for the maximization of the algebraic connectivity of the network, to potential fields and hybrid systems that maintain communication links or control the network topology in a least restrictive manner. Common to these approaches is the use of mobility to control the topology of the underlying communication network. We discuss applications of connectivity control to multirobot rendezvous, flocking and formation control, where so far, network connectivity has been considered an assumption.

Proceedings Article
12 Dec 2011
TL;DR: A novel algorithm is proposed for solving the resulting optimization problem which is a regularized log-determinant program based on Newton's method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem.
Abstract: The l1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to other state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton's method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.

Proceedings Article
19 Jun 2011
TL;DR: A novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language, using graph-based label propagation for cross-lingual knowledge transfer.
Abstract: We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model (Berg-Kirkpatrick et al., 2010). Across eight European languages, our approach results in an average absolute improvement of 10.4% over a state-of-the-art baseline, and 16.7% over vanilla hidden Markov models induced with the Expectation Maximization algorithm.

Journal ArticleDOI
TL;DR: This work extends a common framework for graph-based image segmentation that includes the graph cuts, random walker, and shortest path optimization algorithms and proposes a new family of segmentation algorithms that fixes p to produce an optimal spanning forest but varies the power q beyond the usual watershed algorithm.
Abstract: In this work, we extend a common framework for graph-based image segmentation that includes the graph cuts, random walker, and shortest path optimization algorithms. Viewing an image as a weighted graph, these algorithms can be expressed by means of a common energy function with differing choices of a parameter q acting as an exponent on the differences between neighboring nodes. Introducing a new parameter p that fixes a power for the edge weights allows us to also include the optimal spanning forest algorithm for watershed in this same framework. We then propose a new family of segmentation algorithms that fixes p to produce an optimal spanning forest but varies the power q beyond the usual watershed algorithm, which we term the power watershed. In particular, when q=2, the power watershed leads to a multilabel, scale and contrast invariant, unique global optimum obtained in practice in quasi-linear time. Placing the watershed algorithm in this energy minimization framework also opens new possibilities for using unary terms in traditional watershed segmentation and using watershed to optimize more general models of use in applications beyond image segmentation.

Journal ArticleDOI
TL;DR: A method to detect co-saliency from an image pair that may have some objects in common and employ a normalized single-pair SimRank algorithm to compute the similarity score is introduced.
Abstract: In this paper, we introduce a method to detect co-saliency from an image pair that may have some objects in common. The co-saliency is modeled as a linear combination of the single-image saliency map (SISM) and the multi-image saliency map (MISM). The first term is designed to describe the local attention, which is computed by using three saliency detection techniques available in literature. To compute the MISM, a co-multilayer graph is constructed by dividing the image pair into a spatial pyramid representation. Each node in the graph is described by two types of visual descriptors, which are extracted from a representation of some aspects of local appearance, e.g., color and texture properties. In order to evaluate the similarity between two nodes, we employ a normalized single-pair SimRank algorithm to compute the similarity score. Experimental evaluation on a number of image pairs demonstrates the good performance of the proposed method on the co-saliency detection task.

Posted Content
01 Jul 2011
TL;DR: This paper uses a RGBD sensor as the input sensor, and presents learning algorithms to infer the activities of a person based on a hierarchical maximum entropy Markov model (MEMM), and infers the two-layered graph structure using a dynamic programming approach.
Abstract: Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured human activity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and pointcloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.

Book ChapterDOI
Charu C. Aggarwal1
01 Jan 2011
TL;DR: This book provides a data-centric view of online social networks; a topic which has been missing from much of the literature.
Abstract: The advent of online social networks has been one of the most exciting events in this decade. Many popular online social networks such as Twitter, LinkedIn, and Facebook have become increasingly popular. In addition, a number of multimedia networks such as Flickr have also seen an increasing level of popularity in recent years. Many such social networks are extremely rich in content, and they typically contain a tremendous amount of content and linkage data which can be leveraged for analysis. The linkage data is essentially the graph structure of the social network and the communications between entities; whereas the content data contains the text, images and other multimedia data in the network. The richness of this network provides unprecedented opportunities for data analytics in the context of social networks. This book provides a data-centric view of online social networks; a topic which has been missing from much of the literature. This chapter provides an overview of the key topics in this field, and their coverage in this book.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper presents a new, polynomial-time MWIS algorithm, and proves that it converges to an optimum, and demonstrates advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking.
Abstract: This paper addresses the problem of simultaneous tracking of multiple targets in a video We first apply object detectors to every video frame Pairs of detection responses from every two consecutive frames are then used to build a graph of tracklets The graph helps transitively link the best matching tracklets that do not violate hard and soft contextual constraints between the resulting tracks We prove that this data association problem can be formulated as finding the maximum-weight independent set (MWIS) of the graph We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum Similarity and contextual constraints between object detections, used for data association, are learned online from object appearance and motion properties Long-term occlusions are addressed by iteratively repeating MWIS to hierarchically merge smaller tracks into longer ones Our results demonstrate advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking We outperform the state of the art on the benchmark datasets

Proceedings ArticleDOI
24 Jan 2011
TL;DR: An open-source toolbox for drawing large-scale undirected graphs based on a previously implemented closed-source algorithm known as VxOrd, which is extended by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel implementation.
Abstract: We document an open-source toolbox for drawing large-scale undirected graphs. This toolbox is based on a previously implemented closed-source algorithm known as VxOrd. Our toolbox, which we call OpenOrd, extends the capabilities of VxOrd to large graph layout by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel implementation. At each level, vertices are grouped using force-directed layout and average-link clustering. The clustered vertices are then re-drawn and the process is repeated. When a suitable drawing of the coarsened graph is obtained, the algorithm is reversed to obtain a drawing of the original graph. This approach results in layouts of large graphs which incorporate both local and global structure. A detailed description of the algorithm is provided in this paper. Examples using datasets with over 600K nodes are given. Code is available at www.cs.sandia.gov/~smartin.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a statistical ranking method called HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications.
Abstract: We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian, which is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the l 2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained—if this is large, then it indicates that the data does not have a good global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency in the ranking data arises locally or globally. When applied to statistical ranking problems, Hodge decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature of the inconsistencies. An obvious advantage over the NP-hardness of Kemeny optimization is that HodgeRank may be easily computed via a linear least squares regression. We also discuss connections with well-known ordinal ranking techniques such as Kemeny optimization and Borda count from social choice theory.

Journal ArticleDOI
TL;DR: This paper introduces LEMON, a generic open source C++ library providing easy-to-use and efficient implementations of graph and network algorithms and related data structures and benchmarks show that it typically outperforms them in efficiency.

Journal ArticleDOI
TL;DR: The method reveals that the null behavior of various correlation properties is different from what previously believed, and highly sensitive to the particular network considered, and shows that important structural properties are currently based on incorrect expressions and provides the exact quantities that should replace them.
Abstract: In order to detect patterns in real networks, randomized graph ensembles that preserve only part of the topology of an observed network are systematically used as fundamental null models. However, the generation of them is still problematic. Existing approaches are either computationally demanding and beyond analytic control or analytically accessible but highly approximate. Here, we propose a solution to this long-standing problem by introducing a fast method that allows one to obtain expectation values and standard deviations of any topological property analytically, for any binary, weighted, directed or undirected network. Remarkably, the time required to obtain the expectation value of any property analytically across the entire graph ensemble is as short as that required to compute the same property using the adjacency matrix of the single original network. Our method reveals that the null behavior of various correlation properties is different from what was believed previously, and is highly sensitive to the particular network considered. Moreover, our approach shows that important structural properties (such as the modularity used in community detection problems) are currently based on incorrect expressions, and provides the exact quantities that should replace them.

Proceedings Article
07 Aug 2011
TL;DR: It is found that searching with jump points can speed up A* by an order of magnitude and more and report significant improvement over the current state of the art.
Abstract: Pathfinding in uniform-cost grid environments is a problem commonly found in application areas such as robotics and video games. The state-of-the-art is dominated by hierarchical pathfinding algorithms which are fast and have small memory overheads but usually return suboptimal paths. In this paper we present a novel search strategy, specific to grids, which is fast, optimal and requires no memory overhead. Our algorithm can be described as a macro operator which identifies and selectively expands only certain nodes in a grid map which we call jump points. Intermediate nodes on a path connecting two jump points are never expanded. We prove that this approach always computes optimal solutions and then undertake a thorough empirical analysis, comparing our method with related works from the literature. We find that searching with jump points can speed up A* by an order of magnitude and more and report significant improvement over the current state of the art.

Proceedings ArticleDOI
21 Aug 2011
TL;DR: ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local features with neighborhood features; and outputs regional features -- capturing "behavioral" information in large graphs, is proposed.
Abstract: Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.

Proceedings ArticleDOI
06 Jun 2011
TL;DR: The fastest known algorithm for computing approximately maximum s-t flows in a capacitated, undirected graph with n vertices and m edges takes O(mn 1/3e-11/3) time as mentioned in this paper.
Abstract: We introduce a new approach to computing an approximately maximum s-t flow in a capacitated, undirected graph. This flow is computed by solving a sequence of electrical flow problems. Each electrical flow is given by the solution of a system of linear equations in a Laplacian matrix, and thus may be approximately computed in nearly-linear time. Using this approach, we develop the fastest known algorithm for computing approximately maximum s-t flows. For a graph having n vertices and m edges, our algorithm computes a (1-e)-approximately maximum s-t flow in time ~O(mn1/3e-11/3). A dual version of our approach gives the fastest known algorithm for computing a (1+e)-approximately minimum s-t cut. It takes ~O(m+n4/3e-16/3) time. Previously, the best dependence on m and n was achieved by the algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute approximately maximum s-t flows in time ~O({m√ne-1), and approximately minimum s-t cuts in time ~O(m+n3/2e-3).