scispace - formally typeset
Search or ask a question

Showing papers on "Graph database published in 2005"


Proceedings Article
30 Aug 2005
TL;DR: This paper proposes a new search algorithm, Bidirectional Search, which improves on Backward Expanding search by allowing forward search from potential roots towards leaves, and devise a novel search frontier prioritization technique based on spreading activation.
Abstract: Relational, XML and HTML data can be represented as graphs with entities as nodes and relationships as edges. Text is associated with nodes and possibly edges. Keyword search on such graphs has received much attention lately. A central problem in this scenario is to efficiently extract from the data graph a small number of the "best" answer trees. A Backward Expanding search, starting at nodes matching keywords and working up toward confluent roots, is commonly used for predominantly text-driven queries. But it can perform poorly if some keywords match many nodes, or some node has very large degree.In this paper we propose a new search algorithm, Bidirectional Search, which improves on Backward Expanding search by allowing forward search from potential roots towards leaves. To exploit this flexibility, we devise a novel search frontier prioritization technique based on spreading activation. We present a performance study on real data, establishing that Bidirectional Search significantly outperforms Backward Expanding search.

545 citations


Journal ArticleDOI
TL;DR: In this application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.
Abstract: We propose a unified approach for video summarization based on the analysis of video structures and video highlights. Two major components in our approach are scene modeling and highlight detection. Scene modeling is achieved by normalized cut algorithm and temporal graph analysis, while highlight detection is accomplished by motion attention modeling. In our proposed approach, a video is represented as a complete undirected graph and the normalized cut algorithm is carried out to globally and optimally partition the graph into video clusters. The resulting clusters form a directed temporal graph and a shortest path algorithm is proposed to efficiently detect video scenes. The attention values are then computed and attached to the scenes, clusters, shots, and subshots in a temporal graph. As a result, the temporal graph can inherently describe the evolution and perceptual importance of a video. In our application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.

366 citations


Proceedings ArticleDOI
14 Jun 2005
TL;DR: This paper investigates the issues of substructure similarity search using indexed features in graph databases, and develops a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features.
Abstract: Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental support needed in these applications is the efficient search of complex structured data. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently.In this paper, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, our structural filtering algorithm, called Grafil, can filter many graphs without performing pairwise similarity computations. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy for filtering. By examining the effect of different feature selection mechanisms, we develop a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features. We identify the criteria to form effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly. Moreover, the concept presented in Grafil can be applied to searching approximate non-consecutive sequences, trees, and other complicated structures as well.

347 citations


Proceedings ArticleDOI
06 Oct 2005
TL;DR: A learned graph matching approach to approximate entailment using the amount of the sentence's semantic content which is contained in the text, which outperforms Bag-Of-Words and TF-IDF models.
Abstract: We present a system for deciding whether a given sentence can be inferred from text. Each sentence is represented as a directed graph (extracted from a dependency parser) in which the nodes represent words or phrases, and the links represent syntactic and semantic relationships. We develop a learned graph matching approach to approximate entailment using the amount of the sentence's semantic content which is contained in the text. We present results on the Recognizing Textual Entailment dataset (Dagan et al., 2005), and show that our approach outperforms Bag-Of-Words and TF-IDF models. In addition, we explore common sources of errors in our approach and how to remedy them.

174 citations


Book ChapterDOI
29 May 2005
TL;DR: This paper studies the RDF model from a database perspective, focuses on query languages, analyze current RDF trends, and proposes the incorporation to RDF query languages of primitives which are not present today, based on the experience and techniques of graph database research.
Abstract: This paper studies the RDF model from a database perspective. From this point of view it is compared with other database models, particularly with graph database models, which are very close in motivations and use cases to RDF. We concentrate on query languages, analyze current RDF trends, and propose the incorporation to RDF query languages of primitives which are not present today, based on the experience and techniques of graph database research.

172 citations


Journal ArticleDOI
TL;DR: A general framework for graph matching which is suitable for different problems of pattern recognition, and is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks.
Abstract: In this paper, we propose a general framework for graph matching which is suitable for different problems of pattern recognition. The pattern representation we assume is at the same time highly structured, like for classic syntactic and structural approaches, and of subsymbolic nature with real-valued features, like for connectionist and statistic approaches. We show that random walk based models, inspired by Google's PageRank, give rise to a spectral theory that nicely enhances the graph topological features at node level. As a straightforward consequence, we derive a polynomial algorithm for the classic graph isomorphism problem, under the restriction of dealing with Markovian spectrally distinguishable graphs (MSD), a class of graphs that does not seem to be easily reducible to others proposed in the literature. The experimental results that we found on different test-beds of the TC-15 graph database show that the defined MSD class "almost always" covers the database, and that the proposed algorithm is significantly more efficient than top scoring VF algorithm on the same data. Most interestingly, the proposed approach is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks. We consider the objects of the COIL-100 visual collection and provide a graph-based representation, whose node's labels contain appropriate visual features. We show that the adoption of classic bipartite graph matching algorithms offers a straightforward generalization of the algorithm given for graph isomorphism and, finally, we report very promising experimental results on the COIL-100 visual collection.

166 citations


Book ChapterDOI
03 Oct 2005
TL;DR: This paper has re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort.
Abstract: Several new miners for frequent subgraphs have been published recently. Whereas new approaches are presented in detail, the quantitative evaluations are often of limited value: only the performance on a small set of graph databases is discussed and the new algorithm is often only compared to a single competitor based on an executable. It remains unclear, how the algorithms work on bigger/other graph databases and which of their distinctive features is best suited for which database. We have re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort. This paper presents the results of a comparative benchmarking that ran the algorithms on a comprehensive set of graph databases.

154 citations


01 Jan 2005
TL;DR: GMO uses bipartite graphs to represent ontologies, and measures the structural similarity between graphs by a new measurement, and can take a set of matched pairs, which are typically previously found by other approaches, as external input in matching process.
Abstract: Ontology matching is an important task to achieve interoperation between semantic web applications using different ontologies. Structural similarity plays a central role in ontology matching. However, the existing approaches rely heavily on lexical similarity, and they mix up lexical similarity with structural similarity. In this paper, we present a graph matching approach for ontologies, called GMO. It uses bipartite graphs to represent ontologies, and measures the structural similarity between graphs by a new measurement. Furthermore, GMO can take a set of matched pairs, which are typically previously found by other approaches, as external input in matching process. Our implementation and experimental results are given to demonstrate the effectiveness of the graph matching approach.

141 citations


Journal ArticleDOI
01 Dec 2005
TL;DR: This article proposes a novel indexing model based on discriminative frequent structures that are identified through a graph mining process and shows that the compact index built under this model can achieve better performance in processing graph queries.
Abstract: Graphs have become increasingly important in modelling complicated structures and schemaless data such as chemical compounds, proteins, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via indices. In this article, we investigate the issues of indexing graphs and propose a novel indexing model based on discriminative frequent structures that are identified through a graph mining process. We show that the compact index built under this model can achieve better performance in processing graph queries. Since discriminative frequent structures capture the intrinsic characteristics of the data, they are relatively stable to database updates, thus facilitating sampling-based feature extraction and incremental index maintenance. Our approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be generalized and applied to indexing sequences, trees, and other complicated structures as well.

107 citations


Journal ArticleDOI
TL;DR: The pathway query language (PQL) for querying large protein interaction or pathway databases is designed and implemented, based on a simple graph data model with extensions reflecting properties of biological objects.
Abstract: Motivation: Many areas of modern biology are concerned with the management, storage, visualization, comparison and analysis of networks, but no appropriate query language for such complex data structures yet exists. Results: We have designed and implemented the pathway query language (PQL) for querying large protein interaction or pathway databases. PQL is based on a simple graph data model with extensions reflecting properties of biological objects. Queries match subgraphs in the database based on node properties and paths between nodes. The syntax is easy to learn for anybody familiar with SQL. As an important feature, a query may require a certain structure in the database to exist but return a different subgraph. We have tested PQL queries on networks of up to 16 000 nodes and found it to scale very well. Availability: The code is available on request from the author. Contact: leser@informatik.hu-berlin.de

104 citations


Proceedings ArticleDOI
05 Apr 2005
TL;DR: The goal of SECONDO is to provide a "generic" database system frame that can be filled with implementations of various DBMS data models, and it is believed to be an excellent tool for teaching database architecture and implementation concepts.
Abstract: The goal of SECONDO is to provide a "generic" database system frame that can be filled with implementations of various DBMS data models. SECONDO was intended originally as a platform for implementing and experimenting with new kinds of data models, especially to support spatial, spatio-temporal, and graph database models. We now feel, SECONDO has a clean architecture, and it strike a reasonable balance between simplicity and sophistication. Since all the source code is accessible and to a large extent comprehensible for students, we believe it is also an excellent tool for teaching database architecture and implementation concepts. SECONDO runs on Windows, Linux, and Solaris platforms, and consists of three major components SECONDO kernel, optimizer, and graphical user interface.

Proceedings ArticleDOI
14 Jun 2005
TL;DR: A new graph-based data structure called Spatio-Temporal Region Graph (STRG) is proposed, which provides temporal features, which represent temporal relationships among spatial objects, and a new indexing method STRG-Index, which is faster and more accurate since it uses tree structure and clustering algorithm.
Abstract: In this paper, we propose new graph-based data structure and indexing to organize and retrieve video data. Several researches have shown that a graph can be a better candidate for modeling semantically rich and complicated multimedia data. However, there are few methods that consider the temporal feature of video data, which is a distinguishable and representative characteristic when compared with other multimedia (i.e., images). In order to consider the temporal feature effectively and efficiently, we propose a new graph-based data structure called Spatio-Temporal Region Graph (STRG). Unlike existing graph-based data structures which provide only spatial features, the proposed STRG further provides temporal features, which represent temporal relationships among spatial objects. The STRG is decomposed into its subgraphs in which redundant subgraphs are eliminated to reduce the index size and search time, because the computational complexity of graph matching (subgraph isomorphism) is NP-complete. In addition, a new distance measure, called Extended Graph Edit Distance (EGED), is introduced in both non-metric and metric spaces for matching and indexing respectively. Based on STRG and EGED, we propose a new indexing method STRG-Index, which is faster and more accurate since it uses tree structure and clustering algorithm. We compare the STRG-Index with the M-tree, which is a popular tree-based indexing method for multimedia data. The STRG-Index outperforms the M-tree for various query loads in terms of cost and speed.

01 Jan 2005
TL;DR: This work has chosen to connect geographic data from mono-scale representations to build a multi-scale database with scale-transition relationships, which connect two sets of elements representing the same phenomenon of the real world and carry the sequence of multi- scale operations to navigate from one representation to another.
Abstract: Building multiple representations is one of the key problems in GIS. To tackle this problem, we have chosen to connect geographic data from mono-scale representations to build a multi-scale database with scale-transition relationships. These scale-transition relationships connect two sets of elements (classes, types or objects) representing the same phenomenon of the real world and carry the sequence of multi-scale operations to navigate from one representation to another. From this concept, a process has been defined to build multi-scale databases, in three steps. The first step is dedicated to the declaration of correspondences and conflicts between input schemata by the means of scale-transition relationships. In the second step, conflicts are resolved and schemata are merged. Finally, the third step corresponds to data matching, with the help of geometric, topologic and semantic information. Scale-transition relationships between objects are created during this last step. To validate the process, a multi-scale database has been produced from two existing mono-scale sets of road network data. The first results of this kernel are satisfactory.

Proceedings ArticleDOI
14 Jun 2005
TL;DR: A demo of GraphMiner is described which showcases the technical details of the index structure and the mining algorithms including their efficient implementation, the mining performance and the comparison with some state-of-the-art methods.
Abstract: Mining frequent structural patterns from graph databases is an important research problem with broad applications. Recently, we developed an effective index structure, ADI, and efficient algorithms for mining frequent patterns from large, disk-based graph databases [5], as well as constraint-based mining techniques. The techniques have been integrated into a research prototype system--- GraphMiner. In this paper, we describe a demo of GraphMiner which showcases the technical details of the index structure and the mining algorithms including their efficient implementation, the mining performance and the comparison with some state-of-the-art methods, the constraint-based graph-pattern mining techniques and the procedure of constrained graph mining, as well as mining real data sets in novel applications.

Patent
27 May 2005
TL;DR: In this paper, a software system is provided including an Object Process Graph for defining applications and a Dynamic Graph Interpreter that interprets object Process Graphs, making it possible to change any aspect of an application's data entry, processing or information display at any time.
Abstract: A software system is provided including an Object Process Graph for defining applications and a Dynamic Graph Interpreter that interprets Object Process Graphs. An Object Process Graph defines all of an application's manipulations and processing steps and all of the application's data. An Object Process Graph is dynamic, making it possible to change any aspect of an application's data entry, processing or information display at any time. When an Object Process Graph is interpreted, it functions to accept data, process the data and produce information output. Modifications made to an Object Process Graph while it is being interpreted take affect immediately and can be saved. Object Process Graphs and Dynamic Graph Interpreters can be deployed on single user workstation computers or on distributed processing environments where central servers store Object Process Graphs and run Dynamic Graph Interpreters, and workstation computers access the servers via the intranet or local intranets.

Proceedings ArticleDOI
14 Jun 2005
TL;DR: This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data and argues that the algorithms presented should be incorporated into any database system which is intended to support analytic processing.
Abstract: Statistical estimation and approximate query processing have become increasingly prevalent applications for database systems. However, approximation is usually of little use without some sort of guarantee on estimation accuracy, or "confidence bound." Analytically deriving probabilistic guarantees for database queries over sampled data is a daunting task, not suitable for the faint of heart, and certainly beyond the expertise of the typical database system end-user. This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data. This statistical tool, called the bootstrap, is simple enough that it can be used by a data-base programmer with a rudimentary mathematical background, but general enough that it can be applied to almost any statistical inference problem. Given the power and ease-of-use of the bootstrap, we argue that the algorithms presented for supporting the bootstrap should be incorporated into any database system which is intended to support analytic processing.

Proceedings ArticleDOI
05 Apr 2005
TL;DR: This paper presents a family of stack-based algorithms to handle path and twig pattern queries for directed acyclic graphs (DAGs) in particular and achieves a quadratic runtime complexity in the average size of the query variable bindings, optimal among the navigation-based graph matching algorithms.
Abstract: Recently graph data models have become increasingly popular in many scientific fields. Efficient query processing over such data is critical. Existing works often rely on index structures that store pre-computed transitive relations to achieve efficient graph matching. In this paper, we present a family of stack-based algorithms to handle path and twig pattern queries for directed acyclic graphs (DAGs) in particular. With the worst-case space cost linearly bounded by the number of edges in the graph, our algorithms achieve a quadratic runtime complexity in the average size of the query variable bindings. This is optimal among the navigation-based graph matching algorithms.

Patent
Gina Venolia1
05 Jul 2005
TL;DR: In this paper, the authors present a graph data structure where software development items can be represented as graph data structures and relationships between the represented items can also be detected and reflected in the graph.
Abstract: Software development items can be represented in a graph data structure. Relationships between the represented items can be detected and reflected in the graph data structure. Queries can be run against the data structure to determine which software development items are related to each other. Implicit query can be implemented in a software development context. A graph browser can present panes showing related items.

Journal Article
TL;DR: This work describes how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems.
Abstract: The structure of a closely integrated data warehouse is described that is designed to link different types and varying numbers of biological networks, sequence analysis methods and experimental results such as those coming from microarrays. The data schema is inspired by a combination of graph based methods and generalised data structures and makes use of ontologies and meta-data. The core idea is to consider and store biological networks as graphs, and to use generalised data structures (GDS) for the storage of further relevant information. This is possible because many biological networks can be stored as graphs: protein interactions, signal transduction networks, metabolic pathways, gene regulatory networks etc. Nodes in biological graphs represent entities such as promoters, proteins, genes and transcripts whereas the edges of such graphs specify how the nodes are related. The semantics of the nodes and edges are defined using ontologies of node and relation types. Besides generic attributes that most biological entities possess (name, attribute description), further information is stored using generalised data structures. By directly linking to underlying sequences (exons, introns, promoters, amino acid sequences) in a systematic way, close interoperability to sequence analysis methods can be achieved. This approach allows us to store, query and update a wide variety of biological information in a way that is semantically compact without requiring changes at the database schema level when new kinds of biological information is added. We describe how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems. The system is developed under the GPL license and can be downloaded from http://sourceforge.net/projects/ondex/

Proceedings Article
30 Jul 2005
TL;DR: A novel algorithm to learn a score distribution over the nodes of a labeled graph (directed or undirected) and the effectiveness of the proposed technique in reorganizing the rank accordingly to the examples provided in the training set is shown.
Abstract: In this paper we present a novel algorithm to learn a score distribution over the nodes of a labeled graph (directed or undirected). Markov Chain theory is used to define the model of a random walker that converges to a score distribution which depends both on the graph connectivity and on the node labels. A supervised learning task is defined on the given graph by assigning a target score for some nodes and a training algorithm based on error backpropagation through the graph is devised to learn the model parameters. The trained model can assign scores to the graph nodes generalizing the criteria provided by the supervisor in the examples. The proposed algorithm has been applied to learn a ranking function for Web pages. The experimental results show the effectiveness of the proposed technique in reorganizing the rank accordingly to the examples provided in the training set.

Proceedings ArticleDOI
21 Aug 2005
TL;DR: An algorithm for mining tree-shaped patterns in a large graph that has a number of provable optimality properties, which are based on the theory of conjunctive database queries, is presented.
Abstract: We present an algorithm for mining tree-shaped patterns in a large graph. Novel about our class of patterns is that they can contain constants, and can contain existential nodes which are not counted when determining the number of occurrences of the pattern in the graph. Our algorithm has a number of provable optimality properties, which are based on the theory of conjunctive database queries. We propose a database-oriented implementation in SQL, and report upon some initial experimental results obtained with our implementation on graph data about food webs, about protein interactions, and about citation analysis.

Patent
18 Feb 2005
TL;DR: In this paper, a collaborative filtering method is used to convert a relational database to a graph of nodes connected by edges, and then the statistics of a Markov chain random walk on the graph are determined.
Abstract: A collaborative filtering method first converts a relational database to a graph of nodes connected by edges. The relational database includes consumer attributes, product attributes, and product ratings. Statistics of a Markov chain random walk on the graph are determined. Then, in response to a query state, states of the Markov chain are determined according to the statistics to make a recommendation.

Patent
17 Mar 2005
TL;DR: In this article, a query optimizer is provided to determine when it is economical to partially pre-aggregate data records and when it not, provided the query includes a final aggregation.
Abstract: A partial pre-aggregation database operation improves processing efficiency of database queries by reducing the number of records input into a subsequent database operation, provided the query includes a final aggregation. A query optimizer is provided to determine when it is economical to partially pre-aggregate data records and when it is not. The partial pre-aggregation creates a record store in memory as input records are received. The record store is then used by another database operator, which saves the other database operator from having to re-create the record store.

Proceedings Article
02 Feb 2005
TL;DR: The concept of transitivity is used to evaluate the relevance of individual links in the semantic graph for detecting rela-tionships and new statistical measures for semantic graphs are proposed on graphs constructed from movies and terrorism data.
Abstract: Biodefense Knowledge Center, Lawrence Livermore National Laboratory.An important task for Homeland Security is the prediction of threat vulnerabilities, such as through the de-tection of relationships between seemingly disjoint entities. A structure used for this task is a semantic graph,also known as a relational data graph or an attributed relational graph. These graphs encode relationships astyped links between a pair of typed nodes. Indeed, semantic graphs are very similar to semantic networks usedin AI. The node and link types are related through an ontology graph (also known as a schema). Furthermore,each node has a set of attributes associated with it (e.g., “age” may be an attribute of a node of type “person”).Unfortunately, the selection of types and attributes for both nodes and links depends on human expertise and issomewhat subjective and even arbitrary. This subjectiveness introduces biases into any algorithm that operateson semantic graphs. Here, we raise some knowledge representation issues for semantic graphs and providesome possible solutions using recently developed ideas in the field of complex networks. In particular, we usethe concept of transitivity to evaluate the relevance of individual links in the semantic graph for detecting rela-tionships. We also propose new statistical measures for semantic graphs and illustrate these semantic measureson graphs constructed from movies and terrorism data.I. INTRODUCTION

Proceedings ArticleDOI
K. Hildrum1, Philip S. Yu1
27 Nov 2005
TL;DR: Focused search allows for a much more scalable algorithm in which the time depends only on the size of the community, and not on the number of nodes in the graph, and so is scalable to arbitrarily large graphs.
Abstract: We present a new approach to community discovery. Community discovery usually partitions the graph into communities or clusters. Focused community discovery allows the searcher to specify start points of interest, and find the community of those points. Focused search allows for a much more scalable algorithm in which the time depends only on the size of the community, and not on the number of nodes in the graph, and so is scalable to arbitrarily large graphs. Furthermore, our algorithm is robust to imperfect data, such as extra or missing edges in the graph. We show the effectiveness of our algorithm using both synthetic graphs and on the real-life Livejournal friends graph, a publicly-available social network consisting of over two million users and 13 million edges.

Patent
09 Feb 2005
TL;DR: In this article, a system and method for communicating 3D branch graph data and updates to branch graphs data between clients and a display server in a 3D window system is presented.
Abstract: A system and method for communicating 3D branch graph data and updates to branch graph data between clients and a display server in a 3D window system. A client locally creates a branch graph. When the client ready to make the branch graph live remote, it sends the branch graph to the display server using at least one batch protocol request. The display server builds a copy of the branch graph and attaches it to a centralized scene graph that it manages. The client may subsequently induce detachment of the branch graph from the scene graph. The client may buffer up changes to the local branch graph when its remote counterpart (in the display server) is not attached to the scene graph. The buffered changes may be sent to the display server using at least one batch protocol request when the client is again ready to make the branch graph live remote.

Proceedings Article
30 Aug 2005
TL;DR: The increasing importance of top-k queries warrants an efficient support of ranking in the relational database management system (RDBMS) and has recently gained the attention of the research community.
Abstract: Ranking queries (or top-k queries) are dominant in many emerging applications, e.g., similarity queries in multimedia databases, searching Web databases, middleware, and data mining. The increasing importance of top-k queries warrants an efficient support of ranking in the relational database management system (RDBMS) and has recently gained the attention of the research community. Top-k queries aim at providing only the top k query results, according to a user-specified ranking function, which in many cases is an aggregate of multiple criteria. The following is an example top-k query.

Proceedings ArticleDOI
06 Jul 2005
TL;DR: Graph theory and authoritative sources identification techniques are employed, augmented with visualization tools, to discover critical research focuses from the citation graph to locate important literature in various fields.
Abstract: Citation analysis has been applied in various contexts for different purposes such as impact factors estimation, co-citation pattern analysis, community partitioning, and knowledge visualization etc. We employed graph theory and authoritative sources identification techniques, augmented with visualization tools to discover critical research focuses from the citation graph. The citation graph was built from data retrieved from the CiteSeer database via a querying robot. Two experiments were carried out to identify important research focuses from the citation graph with promising results. Established research focuses as well as new research focuses were successfully identified by the method we proposed and tried. Researchers new to a field may use this method to locate important literature in various fields, which in turn facilitates their learning and studying.

Proceedings ArticleDOI
Misako Suwa1
31 Aug 2005
TL;DR: A new algorithm for separating a touching pair of digits by using the graph-representation of the pattern, which can segment not only simply connected cases but also multiply connected ones.
Abstract: This paper proposes a new algorithm for separating a touching pair of digits by using the graph-representation of the pattern. The segmentation can be regarded as grouping these edges and vertices into two disconnected sub-graphs. This process is executed by applying graph theory methods and certain heuristic rules. Since the boundaries of patterns are determined along the edges, the shapes of the segmented digits can be restored with high quality. The algorithm can segment not only simply connected cases but also multiply connected ones. The results of the performance evaluation using the NIST database are also presented.

Patent
Gina Venolia1
05 Jul 2005
TL;DR: In this article, the authors present a graph data structure where software development items can be represented as graph data structures and relationships between the represented items can also be detected and reflected in the graph.
Abstract: Software development items can be represented in a graph data structure. Relationships between the represented items can be detected and reflected in the graph data structure. Queries can be run against the data structure to determine which software development items are related to each other. Implicit query can be implemented in a software development context. A graph browser can present panes showing related items.