Showing papers on "Graph database published in 2005"

PDF

Open Access

Proceedings Article•

Bidirectional expansion for keyword search on graph databases

[...]

Varun Kacholia¹, Shashank Pandit¹, Soumen Chakrabarti¹, Sundararajarao Sudarshan¹, Rushi Desai¹, Hrishikesh Karambelkar¹ - Show less +2 more•Institutions (1)

Indian Institute of Technology Bombay¹

30 Aug 2005

TL;DR: This paper proposes a new search algorithm, Bidirectional Search, which improves on Backward Expanding search by allowing forward search from potential roots towards leaves, and devise a novel search frontier prioritization technique based on spreading activation.

...read moreread less

Abstract: Relational, XML and HTML data can be represented as graphs with entities as nodes and relationships as edges. Text is associated with nodes and possibly edges. Keyword search on such graphs has received much attention lately. A central problem in this scenario is to efficiently extract from the data graph a small number of the "best" answer trees. A Backward Expanding search, starting at nodes matching keywords and working up toward confluent roots, is commonly used for predominantly text-driven queries. But it can perform poorly if some keywords match many nodes, or some node has very large degree.In this paper we propose a new search algorithm, Bidirectional Search, which improves on Backward Expanding search by allowing forward search from potential roots towards leaves. To exploit this flexibility, we devise a novel search frontier prioritization technique based on spreading activation. We present a performance study on real data, establishing that Bidirectional Search significantly outperforms Backward Expanding search.

...read moreread less

545 citations

Journal Article•DOI•

Video summarization and scene detection by graph modeling

[...]

Chong-Wah Ngo¹, Yu-Fei Ma², Hong-Jiang Zhang²•Institutions (2)

City University of Hong Kong¹, Microsoft²

01 Feb 2005-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: In this application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.

...read moreread less

Abstract: We propose a unified approach for video summarization based on the analysis of video structures and video highlights. Two major components in our approach are scene modeling and highlight detection. Scene modeling is achieved by normalized cut algorithm and temporal graph analysis, while highlight detection is accomplished by motion attention modeling. In our proposed approach, a video is represented as a complete undirected graph and the normalized cut algorithm is carried out to globally and optimally partition the graph into video clusters. The resulting clusters form a directed temporal graph and a shortest path algorithm is proposed to efficiently detect video scenes. The attention values are then computed and attached to the scenes, clusters, shots, and subshots in a temporal graph. As a result, the temporal graph can inherently describe the evolution and perceptual importance of a video. In our application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.

...read moreread less

366 citations

Proceedings Article•DOI•

[...]

Xifeng Yan¹, Philip S. Yu², Jiawei Han¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

14 Jun 2005

TL;DR: This paper investigates the issues of substructure similarity search using indexed features in graph databases, and develops a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features.

...read moreread less

Abstract: Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental support needed in these applications is the efficient search of complex structured data. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently.In this paper, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, our structural filtering algorithm, called Grafil, can filter many graphs without performing pairwise similarity computations. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy for filtering. By examining the effect of different feature selection mechanisms, we develop a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features. We identify the criteria to form effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly. Moreover, the concept presented in Grafil can be applied to searching approximate non-consecutive sequences, trees, and other complicated structures as well.

...read moreread less

347 citations

Proceedings Article•DOI•

Robust Textual Inference via Graph Matching

[...]

Aria Haghighi¹, Andrew Y. Ng¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

06 Oct 2005

TL;DR: A learned graph matching approach to approximate entailment using the amount of the sentence's semantic content which is contained in the text, which outperforms Bag-Of-Words and TF-IDF models.

...read moreread less

Abstract: We present a system for deciding whether a given sentence can be inferred from text. Each sentence is represented as a directed graph (extracted from a dependency parser) in which the nodes represent words or phrases, and the links represent syntactic and semantic relationships. We develop a learned graph matching approach to approximate entailment using the amount of the sentence's semantic content which is contained in the text. We present results on the Recognizing Textual Entailment dataset (Dagan et al., 2005), and show that our approach outperforms Bag-Of-Words and TF-IDF models. In addition, we explore common sources of errors in our approach and how to remedy them.

...read moreread less

174 citations

Book Chapter•DOI•

Querying RDF data from a graph database perspective

[...]

Renzo Angles¹, Claudio Gutierrez¹•Institutions (1)

University of Chile¹

29 May 2005

TL;DR: This paper studies the RDF model from a database perspective, focuses on query languages, analyze current RDF trends, and proposes the incorporation to RDF query languages of primitives which are not present today, based on the experience and techniques of graph database research.

...read moreread less

Abstract: This paper studies the RDF model from a database perspective. From this point of view it is compared with other database models, particularly with graph database models, which are very close in motivations and use cases to RDF. We concentrate on query languages, analyze current RDF trends, and propose the incorporation to RDF query languages of primitives which are not present today, based on the experience and techniques of graph database research.

...read moreread less

172 citations

Journal Article•DOI•

Exact and approximate graph matching using random walks

[...]

Marco Gori¹, Marco Maggini¹, Lorenzo Sarti¹•Institutions (1)

University of Siena¹

01 Jul 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A general framework for graph matching which is suitable for different problems of pattern recognition, and is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks.

...read moreread less

Abstract: In this paper, we propose a general framework for graph matching which is suitable for different problems of pattern recognition. The pattern representation we assume is at the same time highly structured, like for classic syntactic and structural approaches, and of subsymbolic nature with real-valued features, like for connectionist and statistic approaches. We show that random walk based models, inspired by Google's PageRank, give rise to a spectral theory that nicely enhances the graph topological features at node level. As a straightforward consequence, we derive a polynomial algorithm for the classic graph isomorphism problem, under the restriction of dealing with Markovian spectrally distinguishable graphs (MSD), a class of graphs that does not seem to be easily reducible to others proposed in the literature. The experimental results that we found on different test-beds of the TC-15 graph database show that the defined MSD class "almost always" covers the database, and that the proposed algorithm is significantly more efficient than top scoring VF algorithm on the same data. Most interestingly, the proposed approach is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks. We consider the objects of the COIL-100 visual collection and provide a graph-based representation, whose node's labels contain appropriate visual features. We show that the adoption of classic bipartite graph matching algorithms offers a straightforward generalization of the algorithm given for graph isomorphism and, finally, we report very promising experimental results on the COIL-100 visual collection.

...read moreread less

166 citations

Book Chapter•DOI•

A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and gaston

[...]

Marc Wörlein¹, Thorsten Meinl¹, Ingrid Fischer¹, Michael Philippsen¹•Institutions (1)

University of Erlangen-Nuremberg¹

03 Oct 2005

TL;DR: This paper has re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort.

...read moreread less

Abstract: Several new miners for frequent subgraphs have been published recently. Whereas new approaches are presented in detail, the quantitative evaluations are often of limited value: only the performance on a small set of graph databases is discussed and the new algorithm is often only compared to a single competitor based on an executable. It remains unclear, how the algorithms work on bigger/other graph databases and which of their distinctive features is best suited for which database. We have re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort. This paper presents the results of a comparative benchmarking that ran the algorithms on a comprehensive set of graph databases.

...read moreread less

154 citations

GMO: A Graph Matching for Ontologies.

[...]

Wei Hu, Ningsheng Jian, Yuzhong Qu, Yanbing Wang¹•Institutions (1)

Southeast University¹

01 Jan 2005

TL;DR: GMO uses bipartite graphs to represent ontologies, and measures the structural similarity between graphs by a new measurement, and can take a set of matched pairs, which are typically previously found by other approaches, as external input in matching process.

...read moreread less

Abstract: Ontology matching is an important task to achieve interoperation between semantic web applications using different ontologies. Structural similarity plays a central role in ontology matching. However, the existing approaches rely heavily on lexical similarity, and they mix up lexical similarity with structural similarity. In this paper, we present a graph matching approach for ontologies, called GMO. It uses bipartite graphs to represent ontologies, and measures the structural similarity between graphs by a new measurement. Furthermore, GMO can take a set of matched pairs, which are typically previously found by other approaches, as external input in matching process. Our implementation and experimental results are given to demonstrate the effectiveness of the graph matching approach.

...read moreread less

141 citations

Journal Article•DOI•

Graph indexing based on discriminative frequent structure analysis

[...]

Xifeng Yan¹, Philip S. Yu², Jiawei Han¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

01 Dec 2005

TL;DR: This article proposes a novel indexing model based on discriminative frequent structures that are identified through a graph mining process and shows that the compact index built under this model can achieve better performance in processing graph queries.

...read moreread less

Abstract: Graphs have become increasingly important in modelling complicated structures and schemaless data such as chemical compounds, proteins, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via indices. In this article, we investigate the issues of indexing graphs and propose a novel indexing model based on discriminative frequent structures that are identified through a graph mining process. We show that the compact index built under this model can achieve better performance in processing graph queries. Since discriminative frequent structures capture the intrinsic characteristics of the data, they are relatively stable to database updates, thus facilitating sampling-based feature extraction and incremental index maintenance. Our approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be generalized and applied to indexing sequences, trees, and other complicated structures as well.

...read moreread less

107 citations

Journal Article•DOI•

A query language for biological networks

[...]

Ulf Leser¹•Institutions (1)

Humboldt University of Berlin¹

15 Jan 2005-Bioinformatics

TL;DR: The pathway query language (PQL) for querying large protein interaction or pathway databases is designed and implemented, based on a simple graph data model with extensions reflecting properties of biological objects.

...read moreread less

Abstract: Motivation: Many areas of modern biology are concerned with the management, storage, visualization, comparison and analysis of networks, but no appropriate query language for such complex data structures yet exists. Results: We have designed and implemented the pathway query language (PQL) for querying large protein interaction or pathway databases. PQL is based on a simple graph data model with extensions reflecting properties of biological objects. Queries match subgraphs in the database based on node properties and paths between nodes. The syntax is easy to learn for anybody familiar with SQL. As an important feature, a query may require a certain structure in the database to exist but return a different subgraph. We have tested PQL queries on networks of up to 16 000 nodes and found it to scale very well. Availability: The code is available on request from the author. Contact: leser@informatik.hu-berlin.de

...read moreread less

104 citations

Proceedings Article•DOI•

SECONDO: an extensible DBMS platform for research prototyping and teaching

[...]

Ralf Hartmut Güting¹, Victor Teixeira de Almeida¹, D. Ansorge¹, Thomas Behr¹, Z. Ding¹, T. Hose¹, F. Hoffmann¹, M. Spiekermann¹, U. Telle¹ - Show less +5 more•Institutions (1)

Rolf C. Hagen Group¹

05 Apr 2005

TL;DR: The goal of SECONDO is to provide a "generic" database system frame that can be filled with implementations of various DBMS data models, and it is believed to be an excellent tool for teaching database architecture and implementation concepts.

...read moreread less

Abstract: The goal of SECONDO is to provide a "generic" database system frame that can be filled with implementations of various DBMS data models. SECONDO was intended originally as a platform for implementing and experimenting with new kinds of data models, especially to support spatial, spatio-temporal, and graph database models. We now feel, SECONDO has a clean architecture, and it strike a reasonable balance between simplicity and sophistication. Since all the source code is accessible and to a large extent comprehensible for students, we believe it is also an excellent tool for teaching database architecture and implementation concepts. SECONDO runs on Windows, Linux, and Solaris platforms, and consists of three major components SECONDO kernel, optimizer, and graphical user interface.

...read moreread less

Proceedings Article•DOI•

STRG-Index: spatio-temporal region graph indexing for large video databases

[...]

Jeongkyu Lee¹, JungHwan Oh¹, Sae Hwang¹•Institutions (1)

University of Texas at Arlington¹

14 Jun 2005

TL;DR: A new graph-based data structure called Spatio-Temporal Region Graph (STRG) is proposed, which provides temporal features, which represent temporal relationships among spatial objects, and a new indexing method STRG-Index, which is faster and more accurate since it uses tree structure and clustering algorithm.

...read moreread less

Abstract: In this paper, we propose new graph-based data structure and indexing to organize and retrieve video data. Several researches have shown that a graph can be a better candidate for modeling semantically rich and complicated multimedia data. However, there are few methods that consider the temporal feature of video data, which is a distinguishable and representative characteristic when compared with other multimedia (i.e., images). In order to consider the temporal feature effectively and efficiently, we propose a new graph-based data structure called Spatio-Temporal Region Graph (STRG). Unlike existing graph-based data structures which provide only spatial features, the proposed STRG further provides temporal features, which represent temporal relationships among spatial objects. The STRG is decomposed into its subgraphs in which redundant subgraphs are eliminated to reduce the index size and search time, because the computational complexity of graph matching (subgraph isomorphism) is NP-complete. In addition, a new distance measure, called Extended Graph Edit Distance (EGED), is introduced in both non-metric and metric spaces for matching and indexing respectively. Based on STRG and EGED, we propose a new indexing method STRG-Index, which is faster and more accurate since it uses tree structure and clustering algorithm. We compare the STRG-Index with the M-tree, which is a popular tree-based indexing method for multimedia data. The STRG-Index outperforms the M-tree for various query loads in terms of cost and speed.

...read moreread less

Building a Multi-Scale Database with Scale-Transition Relationships

[...]

Thomas Devogele¹, Jenny Trevisan¹, Laurent Raynal¹•Institutions (1)

Institut géographique national¹

01 Jan 2005

TL;DR: This work has chosen to connect geographic data from mono-scale representations to build a multi-scale database with scale-transition relationships, which connect two sets of elements representing the same phenomenon of the real world and carry the sequence of multi- scale operations to navigate from one representation to another.

...read moreread less

Abstract: Building multiple representations is one of the key problems in GIS. To tackle this problem, we have chosen to connect geographic data from mono-scale representations to build a multi-scale database with scale-transition relationships. These scale-transition relationships connect two sets of elements (classes, types or objects) representing the same phenomenon of the real world and carry the sequence of multi-scale operations to navigate from one representation to another. From this concept, a process has been defined to build multi-scale databases, in three steps. The first step is dedicated to the declaration of correspondences and conflicts between input schemata by the means of scale-transition relationships. In the second step, conflicts are resolved and schemata are merged. Finally, the third step corresponds to data matching, with the help of geometric, topologic and semantic information. Scale-transition relationships between objects are created during this last step. To validate the process, a multi-scale database has been produced from two existing mono-scale sets of road network data. The first results of this kernel are satisfactory.

...read moreread less

Proceedings Article•DOI•

GraphMiner: a structural pattern-mining system for large disk-based graph databases and its applications

[...]

Wei Wang¹, Chen Wang¹, Yongtai Zhu¹, Baile Shi¹, Jian Pei², Xifeng Yan³, Jiawei Han³ - Show less +3 more•Institutions (3)

Fudan University¹, Simon Fraser University², University of Illinois at Urbana–Champaign³

14 Jun 2005

TL;DR: A demo of GraphMiner is described which showcases the technical details of the index structure and the mining algorithms including their efficient implementation, the mining performance and the comparison with some state-of-the-art methods.

...read moreread less

Abstract: Mining frequent structural patterns from graph databases is an important research problem with broad applications. Recently, we developed an effective index structure, ADI, and efficient algorithms for mining frequent patterns from large, disk-based graph databases [5], as well as constraint-based mining techniques. The techniques have been integrated into a research prototype system--- GraphMiner. In this paper, we describe a demo of GraphMiner which showcases the technical details of the index structure and the mining algorithms including their efficient implementation, the mining performance and the comparison with some state-of-the-art methods, the constraint-based graph-pattern mining techniques and the procedure of constrained graph mining, as well as mining real data sets in novel applications.

...read moreread less

Patent•

Object process graph system

[...]

Steven Allen Gold, David Marvin Baker, Vladimir Gusev, Hongping Liang

27 May 2005

TL;DR: In this paper, a software system is provided including an Object Process Graph for defining applications and a Dynamic Graph Interpreter that interprets object Process Graphs, making it possible to change any aspect of an application's data entry, processing or information display at any time.

...read moreread less

Abstract: A software system is provided including an Object Process Graph for defining applications and a Dynamic Graph Interpreter that interprets Object Process Graphs. An Object Process Graph defines all of an application's manipulations and processing steps and all of the application's data. An Object Process Graph is dynamic, making it possible to change any aspect of an application's data entry, processing or information display at any time. When an Object Process Graph is interpreted, it functions to accept data, process the data and produce information output. Modifications made to an Object Process Graph while it is being interpreted take affect immediately and can be saved. Object Process Graphs and Dynamic Graph Interpreters can be deployed on single user workstation computers or on distributed processing environments where central servers store Object Process Graphs and run Dynamic Graph Interpreters, and workstation computers access the servers via the intranet or local intranets.

...read moreread less

Proceedings Article•DOI•

Relational confidence bounds are easy with the bootstrap

[...]

Abhijit Pol¹, Chris Jermaine¹•Institutions (1)

University of Florida¹

14 Jun 2005

TL;DR: This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data and argues that the algorithms presented should be incorporated into any database system which is intended to support analytic processing.

...read moreread less

Abstract: Statistical estimation and approximate query processing have become increasingly prevalent applications for database systems. However, approximation is usually of little use without some sort of guarantee on estimation accuracy, or "confidence bound." Analytically deriving probabilistic guarantees for database queries over sampled data is a daunting task, not suitable for the faint of heart, and certainly beyond the expertise of the typical database system end-user. This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data. This statistical tool, called the bootstrap, is simple enough that it can be used by a data-base programmer with a rudimentary mathematical background, but general enough that it can be applied to almost any statistical inference problem. Given the power and ease-of-use of the bootstrap, we argue that the algorithms presented for supporting the bootstrap should be incorporated into any database system which is intended to support analytic processing.

...read moreread less

Proceedings Article•DOI•

Efficient algorithms for pattern matching on directed acyclic graphs

[...]

Li Chen, Amarnath Gupta, M.E. Kurul

05 Apr 2005

TL;DR: This paper presents a family of stack-based algorithms to handle path and twig pattern queries for directed acyclic graphs (DAGs) in particular and achieves a quadratic runtime complexity in the average size of the query variable bindings, optimal among the navigation-based graph matching algorithms.

...read moreread less

Abstract: Recently graph data models have become increasingly popular in many scientific fields. Efficient query processing over such data is critical. Existing works often rely on index structures that store pre-computed transitive relations to achieve efficient graph matching. In this paper, we present a family of stack-based algorithms to handle path and twig pattern queries for directed acyclic graphs (DAGs) in particular. With the worst-case space cost linearly bounded by the number of edges in the graph, our algorithms achieve a quadratic runtime complexity in the average size of the query variable bindings. This is optimal among the navigation-based graph matching algorithms.

...read moreread less

Patent•

Representing software development item relationships via a graph

[...]

Gina Venolia¹•Institutions (1)

Microsoft¹

05 Jul 2005

TL;DR: In this paper, the authors present a graph data structure where software development items can be represented as graph data structures and relationships between the represented items can also be detected and reflected in the graph.

...read moreread less

Abstract: Software development items can be represented in a graph data structure. Relationships between the represented items can be detected and reflected in the graph data structure. Queries can be run against the data structure to determine which software development items are related to each other. Implicit query can be implemented in a software development context. A graph browser can present panes showing related items.

...read moreread less

Journal Article•

Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalised Data Structures.

[...]

Jacob Koehler¹, Christopher J. Rawlings, P. J. Verrier, Rowan A. C. Mitchell, Andre Skusa, Alexander Rüegg, Stephan Philippi - Show less +3 more•Institutions (1)

Rothamsted Research¹

01 Jan 2005-in Silico Biology

TL;DR: This work describes how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems.

...read moreread less

Abstract: The structure of a closely integrated data warehouse is described that is designed to link different types and varying numbers of biological networks, sequence analysis methods and experimental results such as those coming from microarrays. The data schema is inspired by a combination of graph based methods and generalised data structures and makes use of ontologies and meta-data. The core idea is to consider and store biological networks as graphs, and to use generalised data structures (GDS) for the storage of further relevant information. This is possible because many biological networks can be stored as graphs: protein interactions, signal transduction networks, metabolic pathways, gene regulatory networks etc. Nodes in biological graphs represent entities such as promoters, proteins, genes and transcripts whereas the edges of such graphs specify how the nodes are related. The semantics of the nodes and edges are defined using ontologies of node and relation types. Besides generic attributes that most biological entities possess (name, attribute description), further information is stored using generalised data structures. By directly linking to underlying sequences (exons, introns, promoters, amino acid sequences) in a systematic way, close interoperability to sequence analysis methods can be achieved. This approach allows us to store, query and update a wide variety of biological information in a way that is semantically compact without requiring changes at the database schema level when new kinds of biological information is added. We describe how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems. The system is developed under the GPL license and can be downloaded from http://sourceforge.net/projects/ondex/

...read moreread less

Proceedings Article•

Learning web page scores by error back-propagation

[...]

Michelangelo Diligenti¹, Marco Gori¹, Marco Maggini¹•Institutions (1)

University of Siena¹

30 Jul 2005

TL;DR: A novel algorithm to learn a score distribution over the nodes of a labeled graph (directed or undirected) and the effectiveness of the proposed technique in reorganizing the rank accordingly to the examples provided in the training set is shown.

...read moreread less

Abstract: In this paper we present a novel algorithm to learn a score distribution over the nodes of a labeled graph (directed or undirected). Markov Chain theory is used to define the model of a random walker that converges to a score distribution which depends both on the graph connectivity and on the node labels. A supervised learning task is defined on the given graph by assigning a target score for some nodes and a training algorithm based on error backpropagation through the graph is devised to learn the model parameters. The trained model can assign scores to the graph nodes generalizing the criteria provided by the supervisor in the examples. The proposed algorithm has been applied to learn a ranking function for Web pages. The experimental results show the effectiveness of the proposed technique in reorganizing the rank accordingly to the examples provided in the training set.

...read moreread less

Proceedings Article•DOI•

Mining tree queries in a graph

[...]

Bart Goethals¹, Eveline Hoekx², Jan Van den Bussche²•Institutions (2)

University of Antwerp¹, University of Hasselt²

21 Aug 2005

TL;DR: An algorithm for mining tree-shaped patterns in a large graph that has a number of provable optimality properties, which are based on the theory of conjunctive database queries, is presented.

...read moreread less

Abstract: We present an algorithm for mining tree-shaped patterns in a large graph. Novel about our class of patterns is that they can contain constants, and can contain existential nodes which are not counted when determining the number of occurrences of the pattern in the graph. Our algorithm has a number of provable optimality properties, which are based on the theory of conjunctive database queries. We propose a database-oriented implementation in SQL, and report upon some initial experimental results obtained with our implementation on graph data about food webs, about protein interactions, and about citation analysis.

...read moreread less

Patent•

Collaborative filtering using random walks of Markov chains

[...]

Matthew Brand¹•Institutions (1)

Mitsubishi Electric Research Laboratories¹

18 Feb 2005

TL;DR: In this paper, a collaborative filtering method is used to convert a relational database to a graph of nodes connected by edges, and then the statistics of a Markov chain random walk on the graph are determined.

...read moreread less

Abstract: A collaborative filtering method first converts a relational database to a graph of nodes connected by edges. The relational database includes consumer attributes, product attributes, and product ratings. Statistics of a Markov chain random walk on the graph are determined. Then, in response to a query state, states of the Markov chain are determined according to the statistics to make a recommendation.

...read moreread less

Patent•

Partial pre-aggregation in relational database queries

[...]

Per-Ake Larson¹, Cesar A. Galindo-Legaria¹•Institutions (1)

Microsoft¹

17 Mar 2005

TL;DR: In this article, a query optimizer is provided to determine when it is economical to partially pre-aggregate data records and when it not, provided the query includes a final aggregation.

...read moreread less

Abstract: A partial pre-aggregation database operation improves processing efficiency of database queries by reducing the number of records input into a subsequent database operation, provided the query includes a final aggregation. A query optimizer is provided to determine when it is economical to partially pre-aggregate data records and when it is not. The partial pre-aggregation creates a record store in memory as input records are received. The record store is then used by another database operator, which saves the other database operator from having to re-create the record store.

...read moreread less

Proceedings Article•

Knowledge Representation Issues in Semantic Graphs for Relationship Detection

[...]

Marc Barthelemy, Edmond Chow, Tina Eliassi-Rad

02 Feb 2005

TL;DR: The concept of transitivity is used to evaluate the relevance of individual links in the semantic graph for detecting rela-tionships and new statistical measures for semantic graphs are proposed on graphs constructed from movies and terrorism data.

...read moreread less

Abstract: Biodefense Knowledge Center, Lawrence Livermore National Laboratory.An important task for Homeland Security is the prediction of threat vulnerabilities, such as through the de-tection of relationships between seemingly disjoint entities. A structure used for this task is a semantic graph,also known as a relational data graph or an attributed relational graph. These graphs encode relationships astyped links between a pair of typed nodes. Indeed, semantic graphs are very similar to semantic networks usedin AI. The node and link types are related through an ontology graph (also known as a schema). Furthermore,each node has a set of attributes associated with it (e.g., “age” may be an attribute of a node of type “person”).Unfortunately, the selection of types and attributes for both nodes and links depends on human expertise and issomewhat subjective and even arbitrary. This subjectiveness introduces biases into any algorithm that operateson semantic graphs. Here, we raise some knowledge representation issues for semantic graphs and providesome possible solutions using recently developed ideas in the ﬁeld of complex networks. In particular, we usethe concept of transitivity to evaluate the relevance of individual links in the semantic graph for detecting rela-tionships. We also propose new statistical measures for semantic graphs and illustrate these semantic measureson graphs constructed from movies and terrorism data.I. INTRODUCTION

...read moreread less

Proceedings Article•DOI•

Focused community discovery

[...]

K. Hildrum¹, Philip S. Yu¹•Institutions (1)

IBM¹

27 Nov 2005

TL;DR: Focused search allows for a much more scalable algorithm in which the time depends only on the size of the community, and not on the number of nodes in the graph, and so is scalable to arbitrarily large graphs.

...read moreread less

Abstract: We present a new approach to community discovery. Community discovery usually partitions the graph into communities or clusters. Focused community discovery allows the searcher to specify start points of interest, and find the community of those points. Focused search allows for a much more scalable algorithm in which the time depends only on the size of the community, and not on the number of nodes in the graph, and so is scalable to arbitrarily large graphs. Furthermore, our algorithm is robust to imperfect data, such as extra or missing edges in the graph. We show the effectiveness of our algorithm using both synthetic graphs and on the real-life Livejournal friends graph, a publicly-available social network consisting of over two million users and 13 million edges.

...read moreread less

Patent•

Efficient communication in a client-server scene graph system

[...]

Deron D. Johnson¹, Hideya Kawahara¹, Paul Byrne¹, Kevin Rushforth¹, Douglas C. Twilleager¹ - Show less +1 more•Institutions (1)

Sun Microsystems¹

09 Feb 2005

TL;DR: In this article, a system and method for communicating 3D branch graph data and updates to branch graphs data between clients and a display server in a 3D window system is presented.

...read moreread less

Abstract: A system and method for communicating 3D branch graph data and updates to branch graph data between clients and a display server in a 3D window system. A client locally creates a branch graph. When the client ready to make the branch graph live remote, it sends the branch graph to the display server using at least one batch protocol request. The display server builds a copy of the branch graph and attaches it to a centralized scene graph that it manages. The client may subsequently induce detachment of the branch graph from the scene graph. The client may buffer up changes to the local branch graph when its remote counterpart (in the display server) is not attached to the scene graph. The buffered changes may be sent to the display server using at least one batch protocol request when the client is again ready to make the branch graph live remote.

...read moreread less

Proceedings Article•

RankSQL: supporting ranking queries in relational database management systems

[...]

Chengkai Li¹, Mohamed A. Soliman², Kevin Chen-Chuan Chang¹, Ihab F. Ilyas²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Waterloo²

30 Aug 2005

TL;DR: The increasing importance of top-k queries warrants an efficient support of ranking in the relational database management system (RDBMS) and has recently gained the attention of the research community.

...read moreread less

Abstract: Ranking queries (or top-k queries) are dominant in many emerging applications, e.g., similarity queries in multimedia databases, searching Web databases, middleware, and data mining. The increasing importance of top-k queries warrants an efficient support of ranking in the relational database management system (RDBMS) and has recently gained the attention of the research community. Top-k queries aim at providing only the top k query results, according to a user-specified ranking function, which in many cases is an aggregate of multiple criteria. The following is an example top-k query.

...read moreread less

Proceedings Article•DOI•

Identifying critical focuses in research domains

[...]

Tsung Teng Chen¹, Liang Qi Xie¹•Institutions (1)

National Taipei University¹

06 Jul 2005

TL;DR: Graph theory and authoritative sources identification techniques are employed, augmented with visualization tools, to discover critical research focuses from the citation graph to locate important literature in various fields.

...read moreread less

Abstract: Citation analysis has been applied in various contexts for different purposes such as impact factors estimation, co-citation pattern analysis, community partitioning, and knowledge visualization etc. We employed graph theory and authoritative sources identification techniques, augmented with visualization tools to discover critical research focuses from the citation graph. The citation graph was built from data retrieved from the CiteSeer database via a querying robot. Two experiments were carried out to identify important research focuses from the citation graph with promising results. Established research focuses as well as new research focuses were successfully identified by the method we proposed and tried. Researchers new to a field may use this method to locate important literature in various fields, which in turn facilitates their learning and studying.

...read moreread less

Proceedings Article•DOI•

Segmentation of connected handwritten numerals by graph representation

[...]

Misako Suwa¹•Institutions (1)

Fujitsu¹

31 Aug 2005

TL;DR: A new algorithm for separating a touching pair of digits by using the graph-representation of the pattern, which can segment not only simply connected cases but also multiply connected ones.

...read moreread less

Abstract: This paper proposes a new algorithm for separating a touching pair of digits by using the graph-representation of the pattern. The segmentation can be regarded as grouping these edges and vertices into two disconnected sub-graphs. This process is executed by applying graph theory methods and certain heuristic rules. Since the boundaries of patterns are determined along the edges, the shapes of the segmented digits can be restored with high quality. The algorithm can segment not only simply connected cases but also multiply connected ones. The results of the performance evaluation using the NIST database are also presented.

...read moreread less

Patent•

Graph browser and implicit query for software development

[...]

Gina Venolia¹•Institutions (1)

Microsoft¹

05 Jul 2005

TL;DR: In this article, the authors present a graph data structure where software development items can be represented as graph data structures and relationships between the represented items can also be detected and reflected in the graph.

...read moreread less