scispace - formally typeset
Open AccessProceedings ArticleDOI

A query language for analyzing networks

TLDR
A data model and a query language for facilitating the analysis of networks that provides for a closure property, in which the output of every query can be stored in the database and used for further querying.
Abstract
With more and more large networks becoming available, mining and querying such networks are increasingly important tasks which are not being supported by database models and querying languages. This paper wants to alleviate this situation by proposing a data model and a query language for facilitating the analysis of networks. Key features include support for executing external tools on the networks, flexible contexts on the network each resulting in a different graph, primitives for querying subgraphs (including paths) and transforming graphs. The data model provides for a closure property, in which the output of every query can be stored in the database and used for further querying.

read more

Content maybe subject to copyright    Report

A Query Language for Analyzing Networks
Anton Dries Siegfried Nijssen Luc De Raedt
K.U.Leuven, Celestijnenlaan 200A, Leuven, Belgium
{anton.dries,siegfried.nijssen,luc.deraedt}@cs.kuleuven.be
ABSTRACT
With more and more large networks becoming available,
mining and querying such networks are increasingly impor-
tant tasks which are not being supported by database models
and querying languages. This paper wants to alleviate this
situation by proposing a data model and a query language
for facilitating the analysis of networks. Key features in-
clude support for executing external tools on the networks,
flexible contexts on the network each resulting in a different
graph, primitives for querying subgraphs (including paths)
and transforming graphs. The data model provides for a
closure property, in which the output of every query can be
stored in the database and used for further querying.
Categories and Subject Descriptors
H.2.1 [Database Management]: Logical Design—Data
Models
General Terms
Design, Theory
Keywords
Graph databases, Inductive databases, Data mining
1. INTRODUCTION
In many applications it is increasingly common to rep-
resent data as a large graph or network: examples include
social networks, bibliographic networks and biological net-
works. Modeling the data as graphs is convenient in these
applications as traditional graph-based concepts, such as
paths, cliques, node degrees, edge degrees, and so on, are
useful in their analysis. An increasing number of tools have
been developed that operate on graphs, such as algorithms
for finding cliques [21], important nodes [4, 18], important
connections [18], or methods for classifying [18] or clustering
nodes [12]. The main challenge in these applications is often
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CIKM’09, November 2–6, 2009, Hong Kong, China.
Copyright 2009 ACM 978-1-60558-512-3/09/11 ...$10.00.
to analyze the network in order to discover new knowledge
and use that knowledge to improve the network.
Discovering new knowledge in databases (also known as
KDD) typically involves a process in which multiple oper-
ations are repeatedly performed on the data, and as new
insights are gained, the data is being transformed and ex-
tended. As one example consider a bibliographical network
with authors, papers, and citations such as that gathered by
Citeseer or Google Scholar. Important problems in such bib-
liographical networks include: entity resolution [3], which is
concerned with detecting which nodes (authors or papers)
refer to the same entity, and collective classification [18],
where the task is to categorize papers according to their
subject. This type of analysis typically requires one to call
multiple tools and to perform a lot of transformations on
the data. The key contribution of this paper is a new data
model and accompanying query language that is meant to
support such processes. This is in line with the work on in-
ductive databases, which provide an integration of databases
and data mining. Support for data mining can be realized by
extending database languages such as SQL with primitives
that integrate data mining deeply in the query language, cf.
M-SQL [10], MineRule [16], DMQL1 [7] and SiQL [20], or
by support for executing algorithms more as black boxes,
cf. SINDBAD [20] or Oracle’s Data Mining Extensions. All
these inductive database approaches are extensions of the
relational model (and often also of SQL). To the best of our
knowledge, there are no inductive database languages yet
that specifically target network data. It is also clear that for
networked data directly applying the relational model is not
an option. The reason is that the relational model does not
support any data structures or operations related to graphs
(such as subgraph matching or dealing with paths). In ad-
dition, the existing inductive database extensions typically
only operate on a single table in a relational database and
it is unclear how to apply this to networked data.
To deal with graphs, a number of graph databases, with
corresponding query languages, have already been proposed
[2]. These graph databases, however, lack some of the func-
tionality that is needed in an inductive database. Before
being able to apply a data mining algorithm on some data,
the data must be in the correct format and must contain the
relevant descriptors. Therefore, it is essential that an induc-
tive database supports pre-processing of the data it contains
and also accommodates multiple views on the same data-
base, allowing to treat the database different in one con-
text than in another. Our database model and language
naturally support this requirement by employing a uniform

representation of nodes and edges, which allows one to eas-
ily define different contexts on the network. Each context
corresponds to a particular subnetwork. Using our uniform
representation it becomes easy to, for instance, swap the
roles of the nodes and the edges or realize other kinds of
transformations. Our model also supports the closure prop-
erty as it is essential that the user be able to execute a tool
on the output of another tool. Therefore, each query starts
from a network and produces an output that can be con-
sidered part of a network as well. The database is also not
focused on one particular type of graph, but is able to work
with directed, undirected, weighted or unweighted graphs,
or hypergraphs and to easily switch between these models.
To deal with these challenges, we contribute a novel data
model in which the database can be considered one network,
that is, a large graph. We propose a query language, called
BiQL, based on the well-known SQL query language that
can be used to perform basic operations on the network to
support the manipulation and transformation of the net-
work. The language that we introduce provides the ability
to call external data mining tools in a way similar to com-
mercial systems (for instance, the Data Mining Extensions of
Oracle) or research systems (for instance, SINDBAD [20]).
It does not incorporate data mining primitives more deeply
as it is still unclear what these primitives should look like in
general, even in the case of a dataset consisting of a single
table in a relational database: for instance, MineRule [16]
and DMQL1 [7] focus on a small number of data mining
operations.
The outline of this paper is as follows. In Section 2 we
provide a set of queries that illustrate common queries in a
data analysis setting. In Section 3 we summarize the type
of functionality that is required to support these queries and
the functionality required for supporting data mining. We
provide a summary of a literature study which shows that
existing graph databases are lacking. In Section 4 we pro-
pose our data model, which includes a structural part, an
integrity part and a basic manipulation part. We study the
operational semantics and relationships of our model to the
relational model in Section 5. Extensions of the query lan-
guage are discussed in Section 6. In Section 7 we show that
we can indeed express the queries we wish to support. In
Section 8 we conclude.
2. EXAMPLE QUERIES
Typical applications in network analysis include biblio-
graphical and biological networks. In bibliographical net-
works one typically has entities such as Authors, Papers and
possibly Venues. Furthermore, there are relationships such
as an author having authored a paper, a paper citing an-
other paper, and a paper being published in a venue. Stan-
dard examples of databases in this domain include Citeseer,
DBLP and Google Scholar. Similarly, in biological networks,
there are entities such as Proteins, Tissues and Genes; re-
lationships of interest include proteins being expressed in
tissues, genes coding for proteins, and proteins interacting
with other proteins. The mining of such networks (and many
others, such as social networks and the Internet) is a chal-
lenging and important task. It also imposes new demands
on the underlying database model, which are well illustrated
by the following queries and operations, set in the context
of bibliographic datasets.
Figure 1: An incomplete citation graph, where the locations
in papers of citations are indicated; gray circles indicate clus-
ters of citations we wish to find.
Example 1. Can we define a co-authorship relation and
a citation relation, which states which authors cite each other?
The main feature of this query is that it adds additional
edges in the graph.
Example 2. Can we find occurrences of the following pat-
tern: author X cites author Y, which cites author X, possibly
indirectly?
In this query we are provided with a graph pattern, which
we wish to match with the data.
Example 3. Assume given a tool for computing quasi-
cliques in a graph (for instance [21]). Can we find quasi
cliques in the co-authorship graph?
This query applies an external tool on a graph extracted
from the database.
Example 4. Can we find the influence graph of a paper,
that is, the set of papers that have been influenced by a paper,
that is, the recursive closure of papers referencing the paper?
In this query we are provided with a complicated graph pat-
tern, which we wish to use to group related nodes together.
Example 5. Consider the following problem. In each pa-
per, multiple other papers are cited; each citation occurs at
a certain position (for instance, one paper is cited in the
introduction, while another is cited in the conclusions). It
is reasonable to assume that papers which are cited close to
each other, are related to each other. Can we cluster the
citations in every paper?
An example is given in Figure 1. One possible way to an-
swer this query is by using a graph clustering algorithm [12],
which clusters nodes in a graph. In this case we would like
to cluster the citations of every paper, and hence, we need
to treat the citations as nodes. To deal with this query we
need a uniform representation of nodes and edges, as this
provides the flexibility to specify what are considered to be
edges, and what are considered to be nodes.
3. REQUIREMENTS & RELATED WORK
3.1 Requirements and Design Choices
The main motivation and target application for our data
model and query language is supporting exploratory data
analysis on networked data. This results in the following
requirements and design choices.

Small is beautiful.
In line with the relational database model, we believe that
the data model should have the smallest possible number of
concepts and primitives necessary for the representation and
manipulation of the data. This property has to a large ex-
tent facilitated the development of the theory and the imple-
mentations of relational databases. This is a primary design
principle that we used throughout the development of our
data model.
As a consequence, we do not wish to introduce special
language constructs to deal with complicated types of net-
works (directed, undirected, labeled, hypergraphs, etc.) or
sets of graphs; we do not wish to treat attributes of edges
differently than attributes of nodes. When introducing a
novel type of graph, it should not be necessary to extend
the basic data types.
Uniform representation of nodes and edges.
The most immediate consequence of the former choice is
that we wish edges and nodes to be represented in a uniform
way. We will do this by representing both edges and nodes
as objects that are linked together by links that have no
specific semantics. A uniform representation however does
not only avoid having to introduce special syntax for differ-
ent types of graphs or attributes, it also allows one to easily
generate different views on a network. For instance, in a
bibliographic database, we may have objects such as papers,
authors and citations. In one context one could analyze the
co-author relationship, in which case the authors are viewed
as nodes and the papers as edges, while in another context,
one could be more interested in citation-analysis, in which
case the papers are the nodes and the citations the edges.
Another example in which this interchangeability is impor-
tant is given in Example 5 of the previous paragraph. The
power of the uniform representation is hence also that it en-
ables us to keep the underlying low-level network the same;
only the interpretation of this network differs.
Flexible Contexts.
In order to support the knowledge discovery process, the
user must be given the ability to extract different parts of
the network and to transform them into the particular data
format or graph model that the external data mining algo-
rithms needs. This explains why we introduce the notion
of contexts and classes. Contexts define the subgraphs of
interest that can be used by, for example, graph mining al-
gorithms or visualization tools. Contexts allow the user to
extract information and present it into, for example, a tradi-
tional edge-labeled graph by selecting which objects should
act as edges and which ones as nodes. Class definitions can
also be used to create hypergraphs, sets of graphs, directed
or undirected as well as other representations of the net-
work. Using contexts is beneficial because it allows us to
combine a general, application-independent underlying data
structure with the most natural, possibly specific graph rep-
resentations required for each application individually.
External calls.
While in, for example, relational databases there exists
a set of core primitives in which all database operations of
interest can be expressed, such a set of primitives is not
yet available for data mining. This is why we have chosen
to incorporate data mining by providing support for calling
external procedures. The inputs to these external tools can
be any type of network that, in principle, can be passed on
as a context to the external tool.
Closure property.
Not only the inputs to data analysis algorithms matter
but also the outputs. Therefore, it is essential that the re-
sult of any operation, query or external call can be used as
the starting point for further queries and operations. This
can be realized by enforcing a closure property which states
that the result of any operation or query generates infor-
mation that can be added to the database network (either
permanently or temporarily). The information created by
the query combined with the original database can there-
fore be queried again.
This closure property can be combined with contexts to
provide integration with data mining tools. We can for ex-
ample transform part of our network into a set of graphs
in which we want to look for frequent patterns [11]. The
results of the frequent pattern miner can then be added to
the database network and used in subsequent queries. By
integrating existing tools for graph mining and data min-
ing the system becomes a powerful inductive database for
information networks.
SQL-based.
To represent queries, a language is needed. There are
many possible languages that could be taken as starting
point, such as SQL, relational algebra or Datalog. Our ap-
proach is similar to that of the relational model: we aimed
for a data model on which multiple equivalent ways to rep-
resent queries can be envisioned. Therefore, we employ an
SQL-like language, but we will also show how to represent
queries in a small extension of Datalog. In the end, this
similarity to the relational model, both in the choice for a
data model with a small number of types, and in the range
of query languages feasible on top of it, should make the
model more convenient for the many users familiar with the
relational model.
Semi-structured data.
Data and information in many networks of interest comes
from a wide variety of sources and is often heterogeneous in
nature. Hence, it is impractical to require that the user for-
mally defines the database schema describing the structure
of the database. Our model therefore does not impose this
requirement but is based instead on a semi-structured data
model that supports working with heterogeneous data in a
flexible way.
3.2 Related work
A number of query languages for graph databases have
been proposed, many of which have been described in a
recent survey [2]. However, none of these languages was
designed for supporting knowledge discovery processes and
each language satisfies at most a few of the above mentioned
properties. For instance, GraphDB [5] and GOQL [19] are
based on an object-oriented approach, with provisions for
specific types of objects for use in networks such as nodes,
edges and paths. This corresponds to a more structured
data model that does not uniformly represent nodes and
edges. In addition, these languages target other applica-
tions: GraphDB has a strong focus on representing spatially
embedded networks such as highway systems or power lines,
while GOQL [19], which extends the Object Query Language

(OQL), is meant for querying and traversing paths in small
multimedia presentation graphs. Both languages devote a
lot of attention to querying and manipulating paths: for
example, GraphDB supports regular expressions and path
rewriting operations.
GraphQL [8] provides a query language that is based on
formal languages for strings. It provides an easy, yet power-
ful way of specifying graph patterns based on graph struc-
ture and node and edge attributes. In this model graphs are
the basic unit and graph specific optimizations for graph
structure queries are proposed. The main objective of this
language is to be general and to work well on both large
sets of small graphs as well as small sets of large graphs.
However, extending existing graphs is not possible in this
language; flexible contexts are not supported.
PQL [13] is an SQL-based query language focussed on
dealing with querying biological pathway data. It is mainly
focussed on finding paths in these graphs and it provides a
special path expression syntax to this end. The expressivity
of this language is, however, limited and it has no support
for complex graph operations.
GOOD [6] was one of the first systems that used graphs
as its underlying representation. Its main focus was on the
development of a database system that could be used in
a graphical interface. To this end it defines a graphical
transformation language, which provides limited support for
graph pattern queries. This system forms the basis of a large
group of other graph-oriented object data models such as
Gram [1] and GDM [9].
Hypernode [14] uses a representation based on hypern-
odes, which make it possible to embed graphs as nodes in
other graphs. This recursive nature makes them very well
suited for representing arbitrarily complex objects, for exam-
ple as underlying structure of an object database. However,
the data model is significantly different from a traditional
network structure, which makes it less suitable for modeling
information networks as encountered in data mining.
A similar, but slightly less powerful representation based
on hypergraphs is used in GROOVY [15]. This system is
primarily intended as an object-oriented data model using
hypergraphs as its formal model. It has no support for graph
specific queries and operations.
More recently, approaches based on XML and RDF such
as SPARQL [17], are being developed. They use a semi-
structured data model to query graph networks in heteroge-
nous web environments; support for creating new nodes and
flexible contexts is not provided.
While most of the systems discussed here use a graph-
based data model and are capable of representing complex
forms of information, none of them uses a uniform repre-
sentation of edges and nodes (and its resulting flexible con-
texts), nor supports integration of KDD tools.
4. DATA MODEL
Our data model consists of several parts: (1) the structural
part of the data model; (2) the manipulation part of the data
model; (3) the integrity part of the data model.
4.1 Data Structures
The main choice we have to make in our data model is
based on reconciling two requirements:
representing a large number of graph and edge types;
supporting graph theoretic concepts, such as paths or
subgraphs.
Addressing these requirements, the data structure that we
propose consists of the following components.
The object store, which contains all objects in a data-
base. Objects are uniquely identified by an object
identifier. Each object can contain an arbitrary list
of attribute-value pairs describing its features.
The link store, which contains directed links between ob-
jects. They can be represented as (ordered) pairs of
object identifiers, and do not have any attributes.
The domain store, which contains named sets of objects.
A domain name allows users to identify a set of objects.
The context store, which contains named sets of domain
names. A context name allows users to identify a set
of domains in a query. Each domain in a context is
given a name within that context. Hence, each context
consists of λ
1
7→ λ
2
pairs, where λ
2
is a name occurring
in the domain store, and λ
1
is the name given to this
domain within the context. Optionally a context is
assigned to a class (see Section 4.3).
The main design choice in this data structure is not to allow
attributes on links. This ensures that within our data model
links are very light and implicit. Between every pair of ob-
jects a link may or may not exist, but we do not specify how
links are stored. In our query language, a basic operation is
to check if a link exists between two objects.
Domains are used to group nodes of a certain type to-
gether, such as authors or papers in our example. Whereas
nodes do not have names that can be used in the query sys-
tem, domains have names that can be used. Domains are
grouped in contexts; domains can be in an arbitrary number
of contexts.
One may think of the objects as nodes in a graph, and
of the links as unlabeled binary edges between these nodes.
However, this raises the question how we represent edge la-
beled graphs or hypergraphs. This is clarified in the follow-
ing example.
Example 6 (Edge Labeled Graph). Assume given the
objects and links in Table 1, belonging to domains X and Y ,
together constituting the context G. Then we can visualize
context G as given in Figure 2. In this example, one may
think of nodes A, ..., E as authors, and as the edges express-
ing strengths of co-authorships.
Hence, the main choice that we have made is, in a sense,
that also edges are represented as objects. An edge object
is linked to the nodes it connects. Even though this may
not seem intuitive, or could seem a bloated representation,
the advantages of this choice outweigh the disadvantages
because:
by treating both edges and nodes as objects, we obtain
simplicity and uniformity in dealing with attributes;
it is straightforward to treat (hyper)edges as nodes (or
nodes as (hyper)edges);
it is straightforward to link two edges, for instance,
when one wishes to express a similarity relationship
between two edges.

obj-id features
X1 {label=A, color=red}
X2 {label=B, color=yellow}
X3 {label=C, color=blue}
X4 {label=D, color=yellow}
X5 {label=E, color=red}
Y1 {weight=0.2}
Y2 {weight=0.5}
Y3 {weight=0.8}
Y4 {weight=0.2}
Y5 {weight=0.9}
(a) Object store
Y1 X1
Y1 X2
Y2 X1
Y2 X3
Y3 X2
Y3 X3
Y4 X3
Y4 X4
Y5 X3
Y5 X5
(b) Link store
name objects
X { X1, X2, X3, X4, X5 }
Y { Y1, Y2, Y3, Y4, Y5 }
(c) Domain store
name domains type
G { Nodes 7→ X, Edges 7→ Y} labeled graph
(d) Context store
Table 1: Example database
0.2
0.9
0.8
0.5
0.2
A
D
E
B
C
Figure 2: A visualization of context G in the example data-
base in Table 1, where we use domain X as nodes and do-
main Y as edges.
This flexibility is further illustrated in the following ex-
ample.
Example 7 (Edge Labeled Graph Set). Assume given
in addition to the objects and links in Table 1 the objects and
links in Table 2, which are part of domain Z and are used
to define context S. Then we can visualize this context S as
given in Figure 3. In this example, both Z1 and Z2 can be
thought of as identifying subgraphs of graph G in Figure 2.
4.2 Data Manipulation
Now that we have a basic understanding of how the data is
organized in the database, we can focus on querying this in-
formation. In this paragraph we introduce the main compo-
nents of the BiQL query language, which allows for querying
using an SQL-like notation; we will show the relationship of
BiQL to the traditional relational model, including a more
detailed discussion of its operational semantics, in Section 5.
To store the result of a query as a set of domains, we
use the CREATE statement preceding the query. In general a
query looks like this.
CREATE <names of new domains> AS
SELECT <definitions of domains>
FROM <selection from domains>
WHERE <predicate on attributes of objects>
obj-id features
Z1 {name=part1}
Z2 {name=part2}
(a) Object store
name objects
Z { Z1, Z2 }
(b) Domain store
Z1 X1
Z1 X2
Z1 X3
Z1 Y1
Z1 Y2
Z1 Y3
Z2 X2
Z2 X3
Z2 X4
Z2 X5
Z2 Y3
Z2 Y4
Z2 Y5
(c) Link store
name domains type
S { Nodes 7→ X, Edges 7→ Y, labeled graphset
Parts 7→ Z}
(d) Context store
Table 2: Example database; only elements additional to Ta-
ble 1 are listed.
0.8
0.2
0.5
0.2
0.9
0.8
B
A
C
D
E
B
C
part1 part2
Figure 3: The example database in Table 2 conceived as
graph set.
A query can be used to define multiple new domains at once
by listing multiple names and definitions in the CREATE and
SELECT statements, respectively. However in this paper we
will only focus on the basic case in which a query only creates
a single domain.
A simple example of such a query is this:
CREATE Y’ AS
SELECT E
FROM Y E
WHERE E.weight > 0.4
This statement creates a new domain Y’; the objects that
are inserted in this domain are obtained by letting a variable
E range over the objects in domain Y; those objects which
have a weight attribute with a value higher than 0.4 are
inserted.
For Y defined as in Table 1, the resulting domain contains
the following set of identifiers.
Y
0
= {Y 2, Y 3, Y 5}
We can define a new context using this domain, using the
following statement.
CREATE G’ AS INSTANCE labeled_graph
WITH X as Nodes, Y’ as Edges
Figure 4 illustrates this for the graphs from Figures 2 and
3.
Below we provide more extensive details for the FROM,
WHERE and SELECT parts of a query.

Citations
More filters
Journal ArticleDOI

Query languages for graph databases

TL;DR: A brief survey of many of the graph query languages that have been proposed, focussing on the core functionality provided in these languages and issues such as expressive power and the computational complexity of query evaluation.
Journal ArticleDOI

Open challenges for data stream mining research

TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.
Proceedings ArticleDOI

Querying graph databases

TL;DR: This work study the problem of querying graph databases, and, in particular, the expressiveness and complexity of evaluation for several general-purpose query languages, such as the regular path queries and its extensions with conjunctions and inverses.
Posted Content

G-CORE: A Core for Future Graph Query Languages

TL;DR: G-CORE as mentioned in this paper is a graph query language with two key characteristics: it should be composable, meaning that graphs are the input and the output of queries, and it should treat paths as first-class citizens.
Proceedings ArticleDOI

G-CORE: A Core for Future Graph Query Languages

TL;DR: G-CORE is reported on a community effort between industry and academia to shape the future of graph query languages, and strikes a careful balance between path query expressivity and evaluation complexity.
References
More filters
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more
- 01 Jan 1998 - 
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Journal ArticleDOI

Collective Classification in Network Data

TL;DR: This article introduces four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data.
Journal ArticleDOI

Multilevelk-way Partitioning Scheme for Irregular Graphs

TL;DR: This paper presents and study a class of graph partitioning algorithms that reduces the size of the graph by collapsing vertices and edges, they find ak-way partitioning of the smaller graph, and then they uncoarsen and refine it to construct ak- way partitioning for the original graph.
Frequently Asked Questions (16)
Q1. What contributions have the authors mentioned in the paper "A query language for analyzing networks" ?

This paper wants to alleviate this situation by proposing a data model and a query language for facilitating the analysis of networks. The data model provides for a closure property, in which the output of every query can be stored in the database and used for further querying. 

Nevertheless, there remain several important issues for further research. The first of these concerns the development of an efficient and scalable implementation, which should enable us to experiment with several application databases and to realize a true inductive database. In future work the authors aim to extend this prototype to a scalable and fully functional system containing the different extensions described in Section 6 of this paper. These topics will be studied further in the European BISON project. 

By integrating existing tools for graph mining and data mining the system becomes a powerful inductive database for information networks. 

Their main mechanism for ensuring that contexts satisfy the requirements of particular graph types are classes and integrity constraints. 

Class definitions can also be used to create hypergraphs, sets of graphs, directed or undirected as well as other representations of the network. 

The main choice the authors have to make in their data model is based on reconciling two requirements:• representing a large number of graph and edge types;• supporting graph theoretic concepts, such as paths or subgraphs. 

Motivated by the need to have database support for the analysis and mining of large networks the authors contributed a novel data model and query language (BiQL) that can act as the basis for an inductive database system. 

The main motivation and target application for their data model and query language is supporting exploratory data analysis on networked data. 

Their database model and language naturally support this requirement by employing a uniformrepresentation of nodes and edges, which allows one to easily define different contexts on the network. 

Using contexts is beneficial because it allows us to combine a general, application-independent underlying data structure with the most natural, possibly specific graph representations required for each application individually. 

The main challenge in these applications is oftenPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. 

In line with the relational database model, the authors believe that the data model should have the smallest possible number of concepts and primitives necessary for the representation and manipulation of the data. 

While most of the systems discussed here use a graphbased data model and are capable of representing complex forms of information, none of them uses a uniform representation of edges and nodes (and its resulting flexible contexts), nor supports integration of KDD tools. 

To deal with this query the authors need a uniform representation of nodes and edges, as this provides the flexibility to specify what are considered to be edges, and what are considered to be nodes. 

it is essential that an inductive database supports pre-processing of the data it contains and also accommodates multiple views on the same database, allowing to treat the database different in one context than in another. 

The language that the authors introduce provides the ability to call external data mining tools in a way similar to commercial systems (for instance, the Data Mining Extensions of Oracle) or research systems (for instance, SINDBAD [20]).