scispace - formally typeset
Search or ask a question

Showing papers on "Graph (abstract data type) published in 2008"


Journal ArticleDOI
TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

13,519 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

11,078 citations


Journal ArticleDOI
Wei Ren1
TL;DR: This note shows that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected and for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound.
Abstract: This note considers consensus algorithms for double-integrator dynamics. We propose and analyze consensus algorithms for double-integrator dynamics in four cases: 1) with a bounded control input, 2) without relative velocity measurements, 3) with a group reference velocity available to each team member, and 4) with a bounded control input when a group reference state is available to only a subset of the team. We show that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected. We further show that consensus is reached asymptotically for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound. We also show that consensus is reached asymptotically for the fourth case if and only if the group reference state flows directly or indirectly to all of the vehicles in the team.

1,338 citations


Proceedings ArticleDOI
Yu Zheng1, Quannan Li1, Yukun Chen1, Xing Xie1, Wei-Ying Ma1 
21 Sep 2008
TL;DR: An approach based on supervised learning to infer people's motion modes from their GPS logs is proposed, which identifies a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used.
Abstract: Both recognizing human behavior and understanding a user's mobility from sensor data are critical issues in ubiquitous computing systems As a kind of user behavior, the transportation modes, such as walking, driving, etc, that a user takes, can enrich the user's mobility with informative knowledge and provide pervasive computing systems with more context information In this paper, we propose an approach based on supervised learning to infer people's motion modes from their GPS logs The contribution of this work lies in the following two aspects On one hand, we identify a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used On the other hand, we propose a graph-based post-processing algorithm to further improve the inference performance This algorithm considers both the commonsense constraint of real world and typical user behavior based on location in a probabilistic manner Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments As a result, based on the change point-based segmentation method and Decision Tree-based inference model, the new features brought an eight percent improvement in inference accuracy over previous result, and the graph-based post-processing achieve a further four percent enhancement

1,054 citations


Proceedings ArticleDOI
Kun Liu1, Evimaria Terzi1
09 Jun 2008
TL;DR: This work formally defines the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations, and devise simple and efficient algorithms for solving this problem.
Abstract: The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals. To address this issue, we study a specific graph-anonymization problem. We call a graph k-degree anonymous if for every node v, there exist at least k-1 other nodes in the graph with the same degree as v. This definition of anonymity prevents the re-identification of individuals by adversaries with a priori knowledge of the degree of certain nodes. We formally define the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations. We devise simple and efficient algorithms for solving this problem. Our algorithms are based on principles related to the realizability of degree sequences. We apply our methods to a large spectrum of synthetic and real datasets and demonstrate their efficiency and practical utility.

819 citations


Journal ArticleDOI
TL;DR: A revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data, is presented, which involves improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values.
Abstract: We present a revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data. The ma- jor revisions involve improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values, including an adjustment for multiple testing. We have also made additions to the output, added an option to produce a graph, and included support for the predict command. Stata 8.0 or above is required.

794 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: This work replaces the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes and presents a novel energy criterion that improves the visual quality of the retargeted images and videos.
Abstract: Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applications.

775 citations


Journal ArticleDOI
TL;DR: A novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood, and can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness.
Abstract: In many practical data mining applications such as text classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi supervised learning algorithms have aroused considerable interests from the data mining and machine learning fields. In recent years, graph-based semi supervised learning has been becoming one of the most active research areas in the semi supervised learning community. In this paper, a novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named linear neighborhood propagation (LNP), can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness. A theoretical analysis of the properties of LNP is presented in this paper. Furthermore, we also derive an easy way to extend LNP to out-of-sample data. Promising experimental results are presented for synthetic data, digit, and text classification tasks.

720 citations


Proceedings ArticleDOI
21 Apr 2008
TL;DR: It is found that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
Abstract: We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.

641 citations


Journal ArticleDOI
TL;DR: An application of the reduction procedure is presented for autoignition using a large iso-octane mechanism, which is automatic, is fast, has moderate CPU and memory requirements, and compares favorably to other existing techniques.

622 citations


Journal ArticleDOI
TL;DR: Graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.
Abstract: This paper presents new graph-theoretic results appropriate for the analysis of a variety of consensus problems cast in dynamically changing environments. The concepts of rooted, strongly rooted, and neighbor-shared are defined, and conditions are derived for compositions of sequences of directed graphs to be of these types. The graph of a stochastic matrix is defined, and it is shown that under certain conditions the graph of a Sarymsakov matrix and a rooted graph are one and the same. As an illustration of the use of the concepts developed in this paper, graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.

Journal ArticleDOI
TL;DR: This work presents a computational model that learns structures of many different forms and that discovers which form is best for a given dataset and brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development.
Abstract: Algorithms for finding structure in data have become increasingly important both as tools for scientific data analysis and as models of human learning, yet they suffer from a critical limitation Scientists discover qualitatively new forms of structure in observed data: For instance, Linnaeus recognized the hierarchical organization of biological species, and Mendeleev recognized the periodic structure of the chemical elements Analogous insights play a pivotal role in cognitive development: Children discover that object category labels can be organized into hierarchies, friendship networks are organized into cliques, and comparative relations (eg, "bigger than" or "better than") respect a transitive order Standard algorithms, however, can only learn structures of a single form that must be specified in advance: For instance, algorithms for hierarchical clustering create tree structures, whereas algorithms for dimensionality-reduction create low-dimensional spaces Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains Our approach brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development

Journal ArticleDOI
TL;DR: This paper introduces a novel algorithm to order markers on a genetic linkage map based on a simple yet fundamental mathematical property that proves the validity of this property, and shows that it consistently outperforms the best available methods in the literature.
Abstract: Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap.

Journal ArticleDOI
TL;DR: The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms.
Abstract: Summary: The Ontologizer is a Java application that can be used to perform statistical analysis for overrepresentation of Gene Ontology (GO) terms in sets of genes or proteins derived from an experiment. The Ontologizer implements the standard approach to statistical analysis based on the one-sided Fisher's exact test, the novel parent–child method, as well as topology-based algorithms. A number of multiple-testing correction procedures are provided. The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms. Availability: The Ontologizer application is available under the terms of the GNU GPL. It can be started as a WebStart application from the project homepage, where source code is also provided: http://compbio.charite.de/ontologizer Requirements: Ontologizer requires a Java SE 5.0 compliant Java runtime engine and GraphViz for the optional graph visualization tool. Contact:sebastian.bauer@charite.de; peter.robinson@charite.de

Journal ArticleDOI
TL;DR: A design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed to show how previous results assuming all-to-all communication can be extended to a general communication framework.
Abstract: This paper proposes a design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed. Relative equilibria either correspond to parallel motion of all particles with fixed relative spacing or to circular motion of all particles around the same circle. Particles exchange relative information according to a communication graph that can be undirected or directed and time-invariant or time-varying. The emphasis of this paper is to show how previous results assuming all-to-all communication can be extended to a general communication framework.

Journal ArticleDOI
TL;DR: In this article, the stability of a network partition is defined in terms of the statistical properties of a dy namical process taking place on the graph, and the connection between community detection and Laplacian dynamics enables them to establish dynamically motivated stability measures linked to distinct null models.
Abstract: Most methods proposed to uncover communities in complex networks rely on their structural properties. Here we introduce the stability of a network partition, a measure of its quality defined in terms of the statistical properties of a dy namical process taking place on the graph. The time-scale of the process acts as an intrinsic parameter that uncovers community structures at different resolutions. The stability extends and unifies standard notions for community detection: modularity and spectral partitioning can be seen as limiting cases of our dynamic measure. Similarly, recently proposed multi-resolution methods correspond to linearisations of the stability at short times. The connection between community detection and Laplacian dynamics enables us to establish dynamically motivated stability measures linked to distinct null models. We apply our method to find multi-scale partitions for different networks and show that the stability can be computed

Proceedings ArticleDOI
09 Jun 2008
TL;DR: A graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs is presented and access methods of the selectionoperator are investigated.
Abstract: With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

Proceedings ArticleDOI
15 Dec 2008
TL;DR: This paper construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure and demonstrates the success of this novel algorithm by applying it on real world problems.
Abstract: Recently non-negative matrix factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two non-negative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural parts-based representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for parts-based data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems.

Proceedings ArticleDOI
21 Apr 2008
TL;DR: A novel solution to the problem of topic modeling with network structure (TMN) is proposed, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data.
Abstract: In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method bridges topic modeling and social network analysis, which leverages the power of both statistical topic models and discrete regularization. The output of this model well summarizes topics in text, maps a topic on the network, and discovers topical communities. With concrete selection of a topic model and a graph-based regularizer, our model can be applied to text mining problems such as author-topic analysis, community discovery, and spatial text mining. Empirical experiments on two different genres of data show that our approach is effective, which improves text-oriented methods as well as network-oriented methods. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure.

Journal ArticleDOI
TL;DR: In this paper, a necessary and sufficient condition for almost sure asymptotic consensus using simple ergodicity and probabilistic arguments is presented. This easily verifiable condition uses the spectrum of the average weight matrix.
Abstract: We consider the consensus problem for stochastic discrete-time linear dynamical systems. The underlying graph of such systems at a given time instance is derived from a random graph process, independent of other time instances. For such a framework, we present a necessary and sufficient condition for almost sure asymptotic consensus using simple ergodicity and probabilistic arguments. This easily verifiable condition uses the spectrum of the average weight matrix. Finally, we investigate a special case for which the linear dynamical system converges to a fixed vector with probability 1.

Journal ArticleDOI
TL;DR: This approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players.
Abstract: We consider three broad classes of quantum secret sharing with and without eavesdropping and show how a graph state formalism unifies otherwise disparate quantum secret sharing models. In addition to the elegant unification provided by graph states, our approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players. Another innovation here is the introduction of embedded protocols within a larger graph state that serves as a one-way quantum-information processing system.

Journal ArticleDOI
TL;DR: This annotated bibliography gives an elementary classification of problems and results related to graph searching and provides a source of bibliographical references on this field.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: This is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
Abstract: We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple real-life graph data sets.To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.

Proceedings ArticleDOI
21 Apr 2008
TL;DR: This paper formalizes the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data and defines and analyzes the characteristics of heuristics for selectivity-based static BGP optimization.
Abstract: In this paper, we formalize the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data. We define and analyze the characteristics of heuristics for selectivity-based static BGP optimization. The heuristics range from simple triple pattern variable counting to more sophisticated selectivity estimation techniques. Customized summary statistics for RDF data enable the selectivity estimation of joined triple patterns and the development of efficient heuristics. Using the Lehigh University Benchmark (LUBM), we evaluate the performance of the heuristics for the queries provided by the LUBM and discuss some of them in more details.

Proceedings ArticleDOI
26 Oct 2008
TL;DR: This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users.
Abstract: Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search engine results. Mining the wealth of information available in the query logs has many important applications including query-log analysis, user profiling and personalization, advertising, query recommendation, and more.In this paper we introduce the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same "search mission". Any path over the query-flow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path.The query-flow graph is an outcome of query-log mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a real-world query-flow graph from a large-scale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the query-flow graph goes beyond these two applications.

Proceedings Article
20 Jan 2008
TL;DR: The main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor.
Abstract: In this paper, we study the spread of influence through a social network, in a model initially studied by Kempe, Kleinberg and Tardos [14, 15]: We are given a graph modeling a social network, where each node v has a (fixed) threshold tv, such that the node will adopt a new product if tv of its neighbors adopt it. Our goal is to find a small set S of nodes such that targeting the product to S would lead to adoption of the product by a large number of nodes in the graph. We show strong inapproximability results for several variants of this problem. Our main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor. This implies similar results if only a fixed fraction of the network is ensured to adopt the product. Further, the hardness of approximation result continues to hold when all nodes have majority thresholds, or have constant degree and threshold two. The latter answers a complexity question proposed in [10, 29]. We also give some positive results for more restricted cases, such as when the underlying graph is a tree.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

Proceedings ArticleDOI
Periklis Akritidis1, Cristian Cadar1, Costin Raiciu1, Manuel Costa1, Miguel Castro1 
18 May 2008
TL;DR: This work presents write integrity testing (WIT), a new technique that provides practical protection from memory errors that compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead.
Abstract: Attacks often exploit memory errors to gain control over the execution of vulnerable programs. These attacks remain a serious problem despite previous research on techniques to prevent them. We present write integrity testing (WIT), a new technique that provides practical protection from these attacks. WIT uses points-to analysis at compile time to compute the control-flow graph and the set of objects that can be written by each instruction in the program. Then it generates code instrumented to prevent instructions from modifying objects that are not in the set computed by the static analysis, and to ensure that indirect control transfers are allowed by the control-flow graph. To improve coverage where the analysis is not precise enough, WIT inserts small guards between the original program objects. We describe an efficient implementation with optimizations to reduce space and time overhead. This implementation can be used in practice because it compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead. WIT's average runtime overhead is only 7% across a set of CPU intensive benchmarks and it is negligible when IO is the bottleneck.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A framework that fuses multiple features for improved action recognition in videos by treating different features as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities.
Abstract: In this paper, we propose a framework that fuses multiple features for improved action recognition in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use two types of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images, which aims to capture the shape deformation of the actor by considering actions as 3D objects (x, y, t). To optimally combine these features, we treat different features as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by converting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler embedding. The performance of the proposed framework is tested on publicly available data sets. The results demonstrate that fusion of multiple features helps in achieving improved performance, and allows retrieval of meaningful features and videos from the embedding space.

Proceedings ArticleDOI
07 Apr 2008
TL;DR: A novel indexing method that incorporates graph structural information in a hybrid index structure that achieves high pruning power and the index size scales linearly with the database size is proposed.
Abstract: Large graph datasets are common in many emerging database applications, and most notably in large-scale scientific applications. To fully exploit the wealth of information encoded in graphs, effective and efficient graph matching tools are critical. Due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact, graph matching is required. Furthermore, many modern applications need to query large graphs, each of which has hundreds to thousands of nodes and edges. This paper presents a novel technique for approximate matching of large graph queries. We propose a novel indexing method that incorporates graph structural information in a hybrid index structure. This indexing technique achieves high pruning power and the index size scales linearly with the database size. In addition, we propose an innovative matching paradigm to query large graphs. This technique distinguishes nodes by their importance in the graph structure. The matching algorithm first matches the important nodes of a query and then progressively extends these matches. Through experiments on several real datasets, this paper demonstrates the effectiveness and efficiency of the proposed method.