scispace - formally typeset
Open AccessJournal ArticleDOI

Expressive Languages for Path Queries over Graph-Structured Data

TLDR
A class of extended CRPQs, called ECRPZs, are proposed, which add regular relations on tuples of paths, and allow path variables in the heads of queries, and are studied for their usefulness in querying graph structured data.
Abstract
For many problems arising in the setting of graph querying (such as finding semantic associations in RDF graphs, exact and approximate pattern matching, sequence alignment, etc.), the power of standard languages such as the widely studied conjunctive regular path queries (CRPQs) is insufficient in at least two ways. First, they cannot output paths and second, more crucially, they cannot express relationships among paths.We thus propose a class of extended CRPQs, called ECRPQs, which add regular relations on tuples of paths, and allow path variables in the heads of queries. We provide several examples of their usefulness in querying graph structured data, and study their properties. We analyze query evaluation and representation of tuples of paths in the output by means of automata. We present a detailed analysis of data and combined complexity of queries, and consider restrictions that lower the complexity of ECRPQs to that of relational conjunctive queries. We study the containment problem, and look at further extensions with first-order features, and with nonregular relations that add arithmetic constraints on the lengths of paths and numbers of occurrences of labels.

read more

Content maybe subject to copyright    Report

Expressive Languages for Path Queries over
Graph-Structured Data
Pablo Barcel´o
Dept. of Computer Science, Univ. of Chile
pbarcelo@dcc.uchile.cl
Carlos Hurtado
Fac. Ingenier´ıa y Ciencias, Univ. A. Iba˜nez
carlos.hurtado@uai.cl
Leonid Libkin
Sch. of Informatics, Univ. of Edinburgh
libkin@inf.ed.ac.uk
Peter Wood
Dept. of CS and Inf. Syst., Birkbeck, U. London
ptw@dcs.bbk.ac.uk
ABSTRACT
For many problems arising in the setting of graph
querying (such as finding semantic associations in RDF
graphs, exact and a pproximate pattern matching, se-
quence alignment, etc.), the power of standard lan-
guages such as the widely studied conjunctive regu-
lar path queries (CRPQs) is ins ufficient in at least two
ways. First, they cannot output paths and second, more
crucially, they cannot express relations among paths.
We thus propose a class of extended CRPQs, called
ECRPQs, which add regular relations on tuples of
paths, and allow path variables in the heads of queries.
We provide several examples of their usefulness in
querying g raph structured data, and study their proper-
ties. We analyze query e valuation and representation o f
tuples of paths in the output by means of automata. We
present a detailed analysis of data and combined com-
plexity of queries, and c onsider restrictions tha t lower
the complexity of ECRPQs to that of relational con-
junctive queries. We study the containment problem,
and look at further extensions with first-order features,
and with non-regular re lations that express arithmetic
properties of paths, based on the lengths and numbers
of occurrences of labels.
Categories and Subject Descriptors. H.2.1 [Database
Management]: Logical Design—Data Models; F.1.1
[Computation by abstract devices]: Models of
Computation—Automata
General Terms. Theory, Languages, Algorithms
Keywords. Graph databases, conjunctive queries, reg-
ular relations, regular path queries
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PODS’10, June 6–11, 2010, Indianapolis, Indiana, USA.
Copyright 2010 ACM 978-1-4503-0033-9/10/06 ...$5.00.
1. Introduction
For graph-structured data, queries that allow users to
sp e c ify the ty pes of paths in which they are interested
have always played a central role. Most commonly, the
sp e c ification of such paths has been by means of regu-
lar expressions over the alphabet o f edge labels [2, 10,
13, 16, 29]. The output o f a query is typically a set of
tuples of nodes that are connected in some way by the
paths specified. The canonical class of queries with this
functionality are the conjunctive regular path queries
(CRPQs), which have been the subject of much inves-
tigation, e.g. [10, 14, 16].
However, the rapid increase in the size and co mplexity
of graph-structured data (e.g. in the Semantic Web, or
in biological applications) has raised the need for ad-
ditional functionality in query lang uages. Specifically,
in many examples, the minimum requirements of suffi-
ciently expressive queries are: (a) the ability to define
complex semantic relationships between paths and (b)
the ability to include paths in the output of the query.
Neither of these is supported by CRPQ s.
There are multiple examples of queries that require
these new capabilities. For example, [5] introduces
a query language for RDF/S in which paths can be
compared based on specific semantic associations. In
handling biological se quences one often needs to com-
pare paths based on similarity (e.g., edit distance) [20].
Paths can be compared with respec t to other parame-
ters, e.g., lengths or numbers of occurrences of labels,
which can be useful in route-finding applica tions [6 ].
As for the ability to output paths, this has been pro-
posed, for example, as an extensio n to the SPARQL
query language the standard for retrieving RDF data
[24]. However, [24] only propos e d a declarative lan-
guage, and left most basic questions unexplored (e.g.,
what should an output be if ther e are infinitely many
paths between nodes?). Other applications for this
new functionality include determining the provenance
of data or artifacts [21], finding associations in linked
data [27], biological data [26 ] or social (or criminal) net-

works [32], a s well as performing semantic searches over
web-derived knowledge [36].
While the need for the extended functionality of graph
query languages is well-documented (and sometimes is
even incorporated into a programming syntax), the ba-
sic theoretical properties of such language s are com-
pletely unexplored. We do not know whether queries
can be meaningfully evaluated, what their complexity
is, whether they can be optimized, etc.
Our main goals, therefore, are to formally define exten-
sions of graph queries that can express complex seman-
tic a ssociations between paths and output paths to the
user, and to study them, c oncentrating on query evalu-
ation and its complexity, as well as some static analysis
problems.
We work with the cla ss of extended conjunctive regular
path queries or (ECRPQs), which generalize CRPQs by
allowing them to express the kind of sema ntic associa-
tion properties we e xplained a bove. That is, we allow
(i) n-tuples of path labels to be checked for conformity
to n-ary path languages, and (ii) paths, rather than
simply nodes, to be output. Conformity with respect
to n- ary languages is given, following the idea behind
CRPQs, with respect to n-ary regular relations.
As an e xample, consider a graph G w ith a single edge
label, defining the student-advisor relationship. Using
CRPQs, one can express many queries, such as find-
ing academic ancestors, or p e ople whose sets o f aca-
demic parents and grandparents intersect, or checking
whether Van Gucht and Tannen have a common aca-
demic ancestor (and if so, who that person is). However,
with CRPQs we cannot express queries asking for pair s
of scientists who have the same-length path to Tarski,
for example, nor can one as k for the precise paths by
which Van Gucht and Tannen are related to their com-
mon academic ancestor. With ECRPQs, we can express
such queries.
While leaving the above queries to the reader as an
exercise, we now outline a few examples of problems
where the p ower of ECRPQ s is required. They will be
fully developed in Section 3, after we have presented the
syntax and semantics of ECRPQs.
(i) Pattern matching Given an alphabet Σ and a set
of variables V, a patt ern is a string over Σ V. A pat-
tern defines a pattern language by instantiating vari-
ables with strings in Σ
. Pattern languages need not be
context-free: e.g., the language of squared words over Σ
can be expressed by the pattern XX, where X V. But
finding nodes x and y connected by a path whose label
is in the language of squared words can be expressed by
the ECRPQ:
Ans(x, y) (x, π
1
, z), (z, π
2
, y), π
1
= π
2
where x, y and z are node variables and π
1
and π
2
are
path variables. Variables z, π
1
, and π
2
are meant to
be existentially qua ntified. What makes this different
from CRPQs is the binary relation π
1
= π
2
on paths:
it states that the paths b e tween x and an intermediate
node z, and b e tween z and y are the same.
(ii) Semantic web associations In RDF/S, prop erties
can be declared to be subproperties of other proper-
ties. This is used in [5] to define a notion of semantic
association based on ρ-isomorphic property sequences:
two sequences a re ρ-isomorphic if they are of the same
length and the properties at the same position in each
sequence are subproperties of one ano ther. Such pairs
of sequences can be found by a modification of the pre-
vious query with a different binary relation expressing
the fact that the paths are ρ-isomorphic.
(iii) Approximate matching Approxima te string match-
ing [19, 23] and (biological) sequence alignment [20] are
both based on the notion of edit distance. The relation
representing pairs of sequence s that have edit distance
at most k from one another, for some fixed k, is regular
[18]. So given a graph representing a pair of sequences,
an ECRPQ can determine whether they have edit dis-
tance at most k. We show in Section 3.1 that we can
also output the actual gaps and mismatches in the se-
quences using an ECRPQ.
Outline of the results After we formally define ECR-
PQs, we present an algorithm for q uery evaluation. It
turns out that the sets of labels of paths satisfying a
query are regular, and thus the evaluatio n algorithm
constructs automata to represent such sets.
We then investigate the complexity of query evaluation.
As yardsticks, we consider relational languag e s as well
as CRPQs. For conjunctive queries, combined complex-
ity is NP-complete, while it jumps to Pspace-complete
for relatio nal calculus. Hence we cannot hope to get
anything below NP for ECRPQs, and we hope not to
exceed the complexity of relational queries in a reason-
able class. As for data complexity, it is known to be
NLogspace-complete for CRPQs, so this will serve as
another be nchmark.
It turns out that the data complexity of ECRPQs
matches that of CRPQs, but combined complexity goes
up from NP to Pspace, matching relational calculus in-
stead. In this case it is natural to look for restrictions.
A standard o ne for CQs is a restriction to acyclicity.
This works for CRPQs combined complexity becomes
tractable but does not work for ECRPQs, as the com-
bined complexity remains Pspace-complete. However,
if our regular relations can only talk about lengths of
paths, then the complexity of ECRPQs drops to NP,
matching the complexity of the usual relatio nal CQs.
We then look at extensions of CRPQs and ECRPQs:
with negation and universal quantification, and with
some non- regular relations. For the former, we get sur-
prisingly reasonable bounds for CRPQs, but the com-
plexity becomes too high when both neg ation and re-
lations on paths are allowed. For the latter, we look
at extensions with linea r constraints on path lengths,

and prove some good complexity bounds (tractable
data complexity and NP combined complexity). We
also look at relations that compare numbers of occur-
rences of labe ls in paths, and prove some low complexity
bounds for queries with such relations.
While query containment is known to be decidable for
CRPQs, we s how that ECRPQs share more properties
with full relational calculus: containment for them be-
comes undecidable. We recover decidability in one im-
portant subc ase though.
Organization In the next sec tion, we present back-
ground mater ial on graphs, regular relations and CR-
PQs. Section 3 introduces ECRPQs and looks at their
applications in more detail. In Section 4, we consider
the evaluation of E CRPQs. Section 5 deals with the
data and combined complexity of ECRPQs. In Sec-
tion 6 we look at query containment, and in Section 7
we consider extensions with negation, and with non-
regular features.
2. Preliminaries
Labeled graphs and paths Queries in our set-
ting will be evaluated over labeled database graphs
(db-graphs), that naturally model semistructured data .
For mally, if Σ is a finite alphabet, then a Σ-labeled db-
graph G (or simply db- graph if Σ is clear from the con-
text) is a pair (V, E), such that V is a finite set of nodes
and E V × Σ × V is a set of directed edges labeled in
Σ.
A path ρ between nodes v
0
and v
m
in G is a sequence
v
0
a
0
v
1
a
1
v
2
· · · v
m1
a
m1
v
m
, where m 0, so that all
the v
i
’s are in V , all the a
j
’s are letters of Σ, and
(v
i
, a
i
, v
i+1
) is in E for each i < m. The label of such a
path ρ, denoted by λ(ρ), is the str ing a
0
· · · a
m1
Σ
.
We also define the empty path as (v, ε, v) for each v V ;
the lab e l of such a path is the empty string ε.
Note that a Σ-labeled db-graph G can be naturally
viewed a s a nondeterministic finite automaton (NFA)
over alphabet Σ without initial and final states. Its
states are nodes in V , and its transitions are edges in
E. We use this equivalence in several constructions in
the pape r.
Regular relations As our plan is to extend the notion
of recognizability from string languages to n-ary string
relations, we now give the standard definition of regular
relations over Σ [15, 18, 8]. Let be a symbol not
in Σ. We denote the extended alphabet {⊥}) by
Σ
. Let ¯s = (s
1
, . . . , s
n
) be an n- tuple of strings over
alphabet Σ. We construct a string s] over alphabet
)
n
, whose length is the maximum of the s
j
’s, and
whose i-th symbol is a tuple (c
1
, . . . , c
n
), where each
c
k
is the i-th symbol of s
k
, if the length of s
k
is at
least i, or otherwise. In other words, we pad shorter
strings with the symbo l , and then view the n strings
as one string over the alphabet of n-tuples of letters.
An n-ary relation S on Σ
is regular, if the set {[¯s] |
¯s S} o f strings ove r alphabe t (Σ
)
n
is regular (i.e.,
accepted by an automaton over
)
n
, or given by a
regular expression over
)
n
). We sha ll often use the
same le tter for both a regular expression over
)
n
and the relation over Σ
it denotes, as doing so will not
lead to any ambiguity.
As an example, consider a binary relation s s
, saying
that s is a prefix o f s
. The automaton recognizing this
relation accepts if it reads a sequence of letters of the
form (a, a), for a Σ, possibly followed by a sequence
of letters of the form (, b), for b Σ. As another ex-
ample, consider a binary relation el(s, s
) (equal length)
saying that |s| = |s
|. This relation is recognized by an
automaton that accepts if it does not see any letters
involving the s ymbol.
To understand which relations on strings are regular,
it is often useful to provide a model-theor e tic cha rac-
terization of this class. In the following we assume fa-
miliarity with first-order logic (FO). Consider the FO-
structure M
univ
= hΣ
, , el, (P
a
)
aΣ
i with domain
Σ
, where and el are as above, and P
a
(s) is true
iff the last letter for s is a. This is known as a uni-
versal automatic structure due to the following [8, 9]:
an n-ary r elation S on Σ
is regular iff there exists
an FO formula φ
S
(x
1
, . . . , x
n
) over M
univ
such that
S = {¯s
)
n
| M
univ
|= φ
S
(¯s)}.
In particular, regular relations are closed under all
Boolean combinations, product, and projection. Fur -
thermore, using the above result it is quite easy to show
that an n-ary rela tion is regular, by exhibiting FO for-
mulae defining them (see [8, 9, 7] for examples). For
example, |s| < |s
| is a regular relation definable by
φ(x, y) = y
(y
y y
6= y el(y
, x)). On the
other hand, more elaborate techniques have to be used
to prove that an n-ary relation on Σ is not regular. Ex-
amples of this kind include the binary relation
ss
, that
consists of all pairs (s
1
, s
2
) such that s
1
is a subsequence
of s
2
, and the ternary relation that contains all tuples
(s
1
, s
2
, s
3
) such that s
1
s
2
= s
3
.
Conjunctive regular path queries A basic querying
mechanism for graph databases is the class of regular
path queries [3, 11] that retrieve all pairs of objects in
a db-graph connected by a path conforming to some
regular expression. Howe ver, it has been argued (e.g.
[30]) that in order to make regular path q ueries useful in
practice, they should be extended with several features,
one of them being the possibility of using conjunctions
of atoms. This extension yields the class of conjunctive
regular path queries, which we formally define below
(see also [13, 29, 16, 10]).
Fix a countable set of node variables (typically denoted
by x, y, z, . . .), and a countable set of path variables (de-
noted by π, ω, χ, . . .). A conjunctive regular path query
(CRPQ) Q over a finite alphabet Σ is an ex pression of

the form:
Ans(¯z)
^
1im
(x
i
, π
i
, y
i
),
^
1jt
L
j
(ω
j
), (1)
such that
(i) m > 0, t 0,
(ii) each L
j
is a regular expression over Σ,
(iii) ¯x = (x
1
, . . . , x
m
), ¯y = (y
1
, . . . , y
m
) and ¯z are tu-
ples of node va riables,
(iv) {π
1
, . . . , π
m
} are distinct path variables,
(v) {ω
1
, . . . , ω
t
} are distinct path variables and each
ω
j
is among the π
i
’s, and
(vi) ¯z is a tuple of node variables among ¯x and ¯y.
The atom Ans(¯z) is the head of the query, the expres-
sion on the right of the is its body. The query Q is
Boolean if its head is of the form Ans(), i.e. ¯z is the
empty tuple.
Intuitively, s uch a query Q selects tuples ¯z for which
there exist values of the remaining node va riables fro m
¯x and ¯y and paths π
i
between x
i
and y
i
whose labels
satisfy the regular expressions L
1
to L
t
. Formally, to
define the semantics of CRPQs Q of the form (1), we
first introduce a relation (G, σ, µ) |= Q, where σ is a
mapping from ¯x, ¯y to the set of nodes of a db-graph
G = (V, E), and µ is a mapping from {π
1
, . . . , π
m
} to
paths in G. This relation ho lds iff µ(π
i
) is a path in
G from σ(x
i
) to σ(y
i
), for 1 i m, and the label of
each path µ(ω
j
) is in the language of L
j
, for 1 j t.
We now define Q(G) to be the se t of tuples σ(¯z) such
that (G, σ, µ) |= Q. If Q is Boolean, we let Q(G) be true
if (G, σ, µ) |= Q for some σ and µ (that is, as usual, the
empty tuple models the Boolean constant true, and the
empty set models the Boolean consta nt false).
Remark: Our syntax differs slightly from the usual
CRPQ syntax in the literature (see e.g. [16, 10]). The
reason is that we make explicit use of path va riables
in the queries to treat CRPQs and ECRP Qs in a
uniform manner while the standard approach is to
refer to paths only implicitly.
3. Extended Conjunctive Regular Path
Queries
Our goal is to extend the class of CRPQs in two ways.
First, we want to allow free path variables in the heads of
queries. Second, we want the bodies of q ueries to permit
checking relations on sets of paths rather than just con-
formance of individual paths to reg ular languages. This
leads to the definition of a class of extended CRQPs.
An extended conjunctive regular path query (ECRPQ)
Q over Σ is an expression of the form:
Ans(¯z, ¯χ)
^
1im
(x
i
, π
i
, y
i
),
^
1jt
R
j
(¯ω
j
), (2)
such that
(i) m > 0, t 0 ,
(ii) each R
j
is a r egular ex pression that defines a reg-
ular relation over Σ,
(iii) ¯x = (x
1
, . . . , x
m
) and ¯y = (y
1
, . . . , y
m
) are tuples
of node variables,
(iv) ¯π = (π
1
, . . . , π
m
) is a tuple o f dis tinct path vari-
ables,
(v) {¯ω
1
, . . . , ¯ω
t
} are distinct tuples of path variables,
such that each ¯ω
j
is a tuple of variables from ¯π, of
the same arity as R
j
,
(vi) ¯z is a tuple of node variables among ¯x, ¯y, and
(vii) ¯χ is a tuple of path variables among those in ¯π.
Note that this is similar to the definition o f CRPQs; the
main differences between (1) and (2) are:
ECRPQs can check whether a tuple of paths be-
longs to a regular relation, rather than just check-
ing whether a path belongs to a regular language;
and
outputs of ECRPQ s may c ontain both nodes
and paths, while outputs of CRPQs contain only
nodes.
The head, the body, and the notion of B oolean ECRPQs
are defined in the standard way. The relational part of
an ECRPQ Q (2) is
V
1im
(x
i
, π
i
, y
i
).
The s e mantics of ECRPQs is defined by a natural ex-
tension of the semantics of CRPQs. For an ECRPQ
Q of the form (2), a db-graph G and mappings σ from
node variables to nodes and µ from path variables to
paths, we write (G, σ, µ) |= Q if
µ(π
i
) is a path in G from σ(x
i
) to σ(y
i
), for 1
i m, and
for each ¯ω
j
= (π
j
1
, . . . , π
j
k
), the tuple of strings
consisting of labels of µ(π
j
1
), . . . , µ(π
j
k
) belongs
to the relation R
j
.
The output of Q on G (where the head of Q is Ans(¯z, ¯χ))
is defined as
Q(G) = {(σ(¯z), µ( ¯χ)) | (G, σ, µ) |= Q}.
Note that the implicit existential quantification over
path variables that appear in the body but not in the
head is quantification over a potentially infinite set, a s
there are infinitely many paths in a ny cyclic db-graph.
Fro m now on, we identify the class of CRPQ s with the
restriction of the class of ECRPQs to queries that do not

use regular relations of arity 2. This is mo re general
than the definition of the previous section, since we now
allow CRPQs to output paths.
It is easy to prove that the class of ECRPQs is strictly
more expressive than the class of CRPQs. Formally,
Proposition 3.1. There is a Boolean ECRPQ Q that is
not equivalent to any CRPQ Q
.
3.1 Applications of ECRPQs
In this section, we show that ECRPQs can express
queries found in a wide variety of application areas, in-
cluding finding associations in semantic web (or linked)
data, pattern matching, approximate string matching,
and biologica l sequence alignment.
Finding semantic web associations In a query lan-
guage for RDF/S introduced in [5], paths can be com-
pared based on specific semantic associations. Edges
correspond to RDF prop e rties and paths to property
sequences. A property a can be declared to be a sub-
property of property b, which we denote by a b. Two
property sequences u and v are called ρ-isomorphic iff
u = u
1
, . . . , u
n
and v = v
1
, . . . , v
n
, for some n, and
u
i
v
i
or v
i
u
i
for every i n [5]. Nodes x and y
are called ρ-isoAssociated iff x and y are the origins of
two ρ-isomorphic property sequences.
Finding nodes which are ρ-isoAssoc iated cannot be
done in a query language supporting only conventional
regular expressions, not least because doing so requires
checking that two paths ar e of equal length. However,
pairs of ρ-isomorphic sequences can be expressed us-
ing the regular relation R given by the following reg-
ular expression:
S
a,bΣ: (abba)
(a, b)
. Then an
ECRPQ returning pairs of nodes x and y that are ρ-
isoAssociated can be written as fo llows:
Ans(x, y) (x, π
1
, z
1
), (y, π
2
, z
2
), R(π
1
, π
2
)
Path variables in an ECRPQ can also be used to return
the actual paths found by the query, a mechanism found
in the query languages proposed in [2, 5, 21, 24]. For
example, in [5] a ρ-query can take a pair of nodes u, v
and return the prope rty sequences relating them. This
too can be e xpressed by an ECRPQ:
Ans(π
1
, π
2
) (u, π
1
, z
1
), (v, π
2
, z
2
), R(π
1
, π
2
)
where the regular relation R is defined as above.
Pattern matching Let Σ be a finite alphabet and V
be a countable set of variables such that Σ V = . A
pattern α is a string over ΣV. It denotes the language
L
Σ
(α) obtained by applying substitutions σ : V Σ
to α. As we remarked already, such languages need not
even be context-free.
However, for each pattern α = α
1
· · · α
n
, where every
α
i
Σ V, we can define an ECRPQ Q
α
(x, y) which
finds pairs of nodes connected by a path in L
Σ
(α) (note
that this property is not definable by a CRPQ).
Indeed, the relational part of Q
α
is
(x
0
, π
1
, x
1
), . . . , (x
n1
, π
n
, x
n
). If α
i
is a letter,
then Q
α
contains the atom a(π
i
), and if α
i
is a
variable, then it contains Σ
(π
i
). Finally, to ens ure
equality of variables, for every two α
i
, α
j
which are
the same variable, the query Q
α
contains a conjunct
π
i
= π
j
. It is clea r that Q
α
indeed finds nodes
connected by paths from L
Σ
(α).
In fa c t, EC RPQs can express queries corresponding to
a larger class of lang uages than the pattern languages.
Regular expressions with backreferencing [4], as pro-
vided by egrep and Perl, for example, are in some sense
a generalization of patterns in that substitutions of vari-
ables are restricted by regular expressio ns : the syntax
(e)%X, wher e e is a regular expression and X is a vari-
able, binds a string w L(e) to X. Subsequent uses of
X in the expression then match w. It s hould be clear
that we can ea sily extend the above construction of an
ECRPQ for patterns to one that corresponds to a re g-
ular expression with backreferencing.
In fact, ECRPQs can match patterns, such as a
n
b
n
c
n
,
where a, b, c Σ and n N, that cannot be denoted by
regular expressions with backreferencing, with the help
of the equal length pre dicate:
Ans(x, y) (x, π
1
, z
1
), (z
1
, π
2
, z
2
), (z
2
, π
3
, y),
a
(π
1
), b
(π
2
), c
(π
3
), el(π
1
, π
2
), el(π
2
, π
3
),
where el(π, π
) is a shorthand for (
S
a,bΣ
(a, b))
(π, π
).
Approximate matching and sequence alignment
We treat approximate string matching and (biological)
sequence alignment together because both are bas ed on
the notion of edit distance betwe e n strings. We consider
the three edit operations of insertion, deletion and sub-
stitution, defined as follows. Let s, s
Σ
. Applying
an edit operation to s yielding s
can be modeled as a
binary relation ; over Σ
such that x ; y holds iff
there exist u, v Σ
, a, b Σ, with a 6= b, such that
one of the following is satisfied:
x = uav, y = ubv (substitution)
x = uav, y = uv (deletion)
x = uv, y = ubv (insertion)
Let
k
; stand for the compo sition of ; with itself k
times. The edit distance d
e
(x, y) between x a nd y is the
minimum number k of edit operations such that x
k
; y.
We define a relation D
k
between string s as follows:
(x, y) D
k
iff d
e
(x, y) k. This relation is regular
(indeed, it is easy to see that it is accepted by a two-tape
transducer, and the difference between the lengths of x
and y is bounded by k; then it follows from the fact that
rational re lations of such bounded distance are regular
[18]).
We now consider the use of edit distance in finding

Citations
More filters
Journal Article

ACM Transactions on Database Systems

TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ à ¼ à à 0
Journal ArticleDOI

Foundations of Modern Query Languages for Graph Databases

TL;DR: In this paper, the authors present a survey of the fundamental graph querying functionalities, such as graph patterns and navigational expressions, which are used in modern graph query languages such as SPARQL, Cypher and Gremlin.
Posted Content

Foundations of Modern Query Languages for Graph Databases

TL;DR: The importance of formalisation for graph query languages is discussed, with a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
Proceedings ArticleDOI

Querying graph databases

TL;DR: This work study the problem of querying graph databases, and, in particular, the expressiveness and complexity of evaluation for several general-purpose query languages, such as the regular path queries and its extensions with conjunctions and inverses.
Proceedings ArticleDOI

Adding regular expressions to graph reachability and pattern queries

TL;DR: A class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types are proposed.
References
More filters
Book

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

TL;DR: In this paper, the authors introduce suffix trees and their use in sequence alignment, core string edits, alignments and dynamic programming, and extend the core problems to extend the main problems.
Journal ArticleDOI

The Lorel Query Language for Semistructured Data

TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.
Journal ArticleDOI

Integer Programming with a Fixed Number of Variables

TL;DR: It is shown that the integer linear programming problem with a fixed number of variables is polynomially solvable.