scispace - formally typeset
Search or ask a question
Book ChapterDOI

Partitioning into colorful components by minimum edge deletions

TL;DR: This work presents NP-hardness as well as fixed-parameter tractability results for different variants of Colorful Components and develops an efficient and very accurate heuristic algorithm clearly outperforming a previous min-cut-based heuristic on multiple sequence alignment data.
Abstract: The NP-hard Colorful Components problem is, given a vertex-colored graph, to delete a minimum number of edges such that no connected component contains two vertices of the same color. It has applications in multiple sequence alignment and in multiple network alignment where the colors correspond to species. We initiate a systematic complexity-theoretic study of Colorful Components by presenting NP-hardness as well as fixed-parameter tractability results for different variants of Colorful Components. We also perform experiments with our algorithms and additionally develop an efficient and very accurate heuristic algorithm clearly outperforming a previous min-cut-based heuristic on multiple sequence alignment data.

Summary (2 min read)

1 Introduction

  • The authors study a maximum parsimony approach to the discovery of heterogeneous components in vertex-colored graphs: Colorful Components Instance: Colorful Components is an edge modification problem originating from biological applications in sequence and network alignment as described next.
  • The first application of Colorful Components stems from Multiple Sequence Alignment.
  • Thus, it is a special case of the well-known NP-hard Multicut problem, which has as input an undirected graph and a set of vertex pairs and asks for a minimum number of edges to delete to disconnect each given vertex pair.
  • First, the authors observe that Colorful Components is NP-hard even in trees.

2 Computational Hardness

  • The authors present hardness results for two restricted variants of Colorful Components.
  • Proposition 1. Colorful Components is NP-hard even in trees with diameter four.
  • Cj of φ containing the variables xp, xq, and xr, the authors connect the three corresponding variable cycles by a clause gadget.
  • Now, since at least 6m edges are deleted in the variable cycles, this means that for each clause Cj exactly four edges incident with aj are deleted by S. Consequently, for each variable cycle either all even or all odd edges are deleted.
  • Altogether, this shows the correctness of the reduction.

3 Algorithms

  • While Theorem 1 shows that Colorful Components is NP-hard for three colors, for two colors it can be solved in polynomial time via computing a maximum matching in bipartite graphs.
  • First, the authors describe a simple O(ck ·m)-time search tree algorithm.
  • Now, branch into the c cases to destroy this bad path by edge deletion, and for each case recursively solve the resulting instance.
  • In the first case, the authors have visited at most c+1 vertices until a vertex pair with the same color has been found.
  • The authors note that Rule 1 provides a trivial kernelization [9]5 for Colorful Components with respect to the combined parameter (k, c): obviously, after exhaustive data reduction, the instance has at most 2kc vertices, since an edge deletion can produce at most two colorful components, each of size at most c.

4 Formulation as Weighted Multi-Multiway Cut

  • In the Colorful Components formulation, it is not possible to simplify a graph based on the knowledge that two vertices belong to the same connected component; the authors would like to be able to merge two such vertices.
  • For this, the authors first need to allow not just a single color per vertex, but a set; moreover, they need to allow edge weights.
  • Using the merge operation, the authors can do a simple branching on an edge [3]: either delete the edge, or merge its endpoints; in the experimental part this will be referred to as edge branching.
  • Note that merging does not necessarily decrease the parameter; but it is easy to see that if the authors branch on each edge of a forbidden path successively, then the last edge of the path cannot be merged since it connects vertices with an intersecting color set.
  • The factor 3 has been tuned heuristically.

5 Experiments

  • The authors performed experiments with instances from the multiple sequence alignment application.
  • The source code and the test instances are available under the GNU GPL license at http://fpt.akt.tu-berlin.de/colcom/.
  • To efficiently find data reduction opportunities with Rule 2 and Rule 3, the authors try starting with each vertex and successively add more vertices with disjoint colors that minimize the cut to other edges, until they have either found a reduction opportunity or no more vertices can be added.
  • For the heuristics, the authors compare the solution quality for the 112 instances for which they know the optimal solution.
  • Finally, for the instances for which an exact solution was found, the authors compared the solution quality of the alignments obtained by using DIALIGN with and without the partial alignment columns indicated by an exact solution for Colorful Components, by the merge heuristic, and by the min-cut heuristic.

6 Outlook

  • In preliminary experiments with network alignment data, the authors found that allowing only one protein of each species to be matched was, while a natural model, too strict.
  • Generalizing Color Components to allow a constant number of occurrences of each color for the connected components could result in improved network alignments.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Partitioning into Colorful Components by
Minimum Edge Deletions
Sharon Bruckner
1?
, Falk H¨uffner
2??
, Christian Komusiewicz
2
, Rolf Niedermeier
2
,
Sven Thiel
3
, and Johannes Uhlmann
2? ? ?
1
Institut ur Mathematik, Freie Universit¨at Berlin, sharonb@mi.fu-berlin.de
2
Institut f¨ur Softwaretechnik und Theoretische Informatik, TU Berlin,
{falk.hueffner,christian.komusiewicz,rolf.niedermeier}@tu-berlin.de
johannes.uhlmann@campus.tu-berlin.de
3
Institut f¨ur Informatik, Friedrich-Schiller-Universit¨at Jena,
sven.thiel@uni-jena.de
Abstract.
The NP-hard Colorful Components problem is, given a
vertex-colored graph, to delete a minimum number of edges such that
no connected component contains two vertices of the same color. It has
applications in multiple sequence alignment and in multiple network align-
ment where the colors correspond to species. We initiate a systematic
complexity-theoretic study of Colorful Components by presenting
NP-hardness as well as fixed-parameter tractability results for differ-
ent variants of Colorful Components. We also perform experiments
with our algorithms and additionally develop an efficient and very accu-
rate heuristic algorithm clearly outperforming a previous min-cut-based
heuristic on multiple sequence alignment data.
1 Introduction
We study a maximum parsimony approach to the discovery of heterogeneous
components in vertex-colored graphs:
Colorful Components
Instance:
An undirected graph
G
= (
V, E
) and a coloring of the ver-
tices χ : V {1, . . . , c}.
Task:
Find a minimum subset of edges
E
0
E
such that in
G
0
=
(
V, E \ E
0
), all connected components are colorful, that is, they do not
contain two vertices of the same color.
Such an edge set
E
0
is called a solution, and we denote its size by
k
. Color-
ful Components is an edge modification problem originating from biological
applications in sequence and network alignment as described next.
The first application of Colorful Components stems from Multiple Se-
quence Alignment. This is the process of aligning at least three protein, DNA, or
?
Supported by project NANOPOLY (PITN-GA-2009–238700).
??
Supported by DFG project PABI (NI 369/7-2).
? ? ?
Supported by DFG project PABI (NI 369/7-2).

2 Proc. CPM 2012, Vol. 7354 of LNCS
RNA sequences such that positions believed to be homologous, that is, resulting
from inheritance from a common ancestor, are written in a common column.
This serves to illustrate the similarity or dissimilarity between the sequences and
makes it possible to investigate their evolutionary relationship. Corel et al.
[6]
present an algorithm for this problem where a central step is to find connected
subgraphs in graphs whose vertices are positions of the sequences, edges indicate
that a pair of positions should be aligned, and the colors one-to-one correspond
to sequences. These subgraphs correspond to partial alignment columns and
thus may contain at most one vertex from each input sequence. This yields the
Colorful Components problem. The solution of Colorful Components
is then used by the DIALIGN software to compute a multiple alignment. Corel
et al.
[6]
solve Colorful Components using a greedy algorithm, subsequently
called “min-cut heuristic”: Find two vertices of the same color in some connected
component, find a minimum edge cut between them, and remove it; repeat this
until all connected components are colorful.
A second biological motivation for Colorful Components arises in Network
Alignment for multiple protein–protein interaction (PPI) networks. We propose a
method for network alignment that is based on solving Colorful Components.
Given networks
G
i
= (
V
i
, E
i
) and a similarity relation
S
between the proteins
of different species, first create a network whose vertex set is
S
V
i
and in which
vertex
v V
i
receives color
i
. Then, add an edge
{u, v}
if
uSv
. The detected
colorful components are then sets of matched proteins. Every protein appears in
exactly one component, and every component has at most one protein from each
species, which is a very strict model. The results can then be viewed as functional
orthologs [
14
], or they can form the basis for further analysis. Deni´elou et al.
[7]
suggest a three-step framework for network alignment where the first step is to
aggregate the proteins from the different species into subsets, and Colorful
Components offers a way of performing this task that results in consistent,
disjoint aggregated groups.
Related combinatorial problems. Colorful Components can be seen as the
problem of destroying by edge deletions all bad paths, that is, simple paths between
two vertices of the same color. Thus, it is a special case of the well-known NP-hard
Multicut problem, which has as input an undirected graph and a set of vertex
pairs and asks for a minimum number of edges to delete to disconnect each given
vertex pair. Multicut is fixed-parameter tractable with respect to the number
k
of edge deletions, with a running time of 2
O(k
3
)
· |V |
O(1)
[4, 13].
Colorful Components is also a special case of Multi-Multiway Cut [
1
].
This problem asks to disconnect by edge deletions all paths between vertices from
the same vertex set of some given vertex sets. Thus, Colorful Components is
the special case where the vertex sets form a partition. Finally, there is related
work on “clustering with diversity” [
11
] which extends a traditional clustering
problem by asking that in each resulting cluster all points of the underlying
colored metric space must have different colors.

Proc. CPM 2012, Vol. 7354 of LNCS 3
Contributions. On the theoretical side, we present a first systematic study on
the computational complexity of Colorful Components, exhibiting both
tractable and intractable cases. First, we observe that Colorful Components
is NP-hard even in trees. Then, we present a complexity dichotomy concerning
the number
c
of colors showing that Colorful Components is polynomial-
time solvable for two or less colors and NP-hard otherwise. For three or more
colors, we also obtain super-polynomial running time lower bounds (based on
the Exponential Time Hypothesis) even in the case that the input graph has
bounded degree. On the positive side, we present fixed-parameter algorithms with
running time 2
c
·|V |
O(1)
for Colorful Components in trees and with running
time
O
((
c
1)
k
· |E|
) in general graphs. In experimental work we demonstrate
that, somewhat surprisingly, we can get better results by solving the more general
Weighted Multi-Multiway Cut problem, since this allows us to merge
vertices. We take advantage of this in data reduction rules, a simplified branching,
and a new heuristic. With the branching algorithms, we can solve to optimality
more than half of the instances generated from the BAliBASE 3.0 benchmark [
15
]
each time within five minutes on a standard PC, with up to 5 000 vertices and
13 000 edges. Our heuristic has an average error of 0.6 %, a large improvement
over the 29.2 % of the previously suggested min-cut heuristic [
6
]. We also show
the strength of the developed data reduction rules.
Preliminaries. We consider only undirected and simple graphs
G
= (
V, E
)
where
n
:=
|V |
and
m
:=
|E|
. We assume that
n
=
O
(
m
) since isolated vertices
can be removed from the input in linear time. A bad path is a simple (that is,
cycle-free) path between two vertices of the same color. The length of a path is
the number of its edges. An edge cut is a set of edges whose deletion increases
the number of connected components. For a nonnegative number
t
, a graph is
t-edge connected if it does not have an edge cut of size less than t.
The Exponential Time Hypothesis (ETH) states that, for all
x
3,
x
-SAT,
which asks whether a boolean input formula in conjunctive normal form with
n
variables and
m
clauses and at most
x
variables per clause is satisfiable, cannot
be solved within a running time of 2
o(n)
or 2
o(m)
; see Lokshtanov et al.
[12]
for a recent survey. A problem is fixed-parameter tractable with respect to a
parameter
k
if it can be solved in
f
(
k
)
· n
O(1)
time for an arbitrary (typically
exponential) function f in k.
2 Computational Hardness
In this section, we present hardness results for two restricted variants of Color-
ful Components.
First, we consider the special case where the input graph is a tree. For
obtaining our hardness result, we exploit the connection between Colorful
Components and Multicut. Note that Multicut is NP-hard and MaxSNP-
hard even if the input is a star, that is, a tree consisting of a central vertex with
attached degree-1 vertices [
8
]. Multicut in stars can be reduced to Colorful

4 Proc. CPM 2012, Vol. 7354 of LNCS
Components as follows: for every pair
{s, t}
to be disconnected, create degree-1
vertices
s
0
and
t
0
attached to
s
and
t
, respectively, and color
s
0
and
t
0
with the
same unique color. Each original vertex gets a further unique color. Since this
reduction produces trees whose diameter is four, we arrive at the following.
Proposition 1.
Colorful Components is NP-hard even in trees with diam-
eter four.
In stars, however, Colorful Components turns out to be polynomial-time
solvable: If the central vertex
v
has two neighbors with the same color, one can
delete the edge between
v
and one of the two identically colored degree-one
vertices. If
v
has no two neighbors of the same color, then every connected
component is colorful.
Second, we study the computational complexity of Colorful Components
if the number of colors is fixed. This is of interest since the number of colors may
be small in practical cases.
Theorem 1.
Colorful Components with three colors in graphs with max-
imum degree six is NP-hard; it cannot be solved in 2
o(k)
· n
O(1)
, 2
o(n)
· n
O(1)
,
or 2
o(m)
· n
O(1)
time unless the ETH is false.
Proof.
We present a polynomial-time many-to-one reduction from the NP-hard
3-SAT problem which has as input a Boolean formula
φ
in 3-CNF.
4
For simplicity,
we assume that every clause contains exactly three literals.
The basic idea of the reduction is as follows. For each variable
x
i
of a given
3-CNF formula
φ
, we construct a variable cycle of length 4
m
i
, where
m
i
denotes
the number of clauses that contain
x
i
. These cycles are colored alternatingly
with two colors
c
e
and
c
o
such that deleting every second edge yields a minimum-
cardinality edge deletion set for obtaining colorful components for this cycle. The
corresponding two possibilities are used to represent the two choices for the value
of
x
i
. Then, for each clause
C
j
of
φ
containing the variables
x
p
,
x
q
, and
x
r
, we
connect the three corresponding variable cycles by a clause gadget. This gadget
has the property that if the solutions for the variable gadgets correspond to an
assignment that satisfies
C
j
, then one needs only four edge deletions for the
clause gadget. Conversely, if four edge deletions are sufficient, then the assignment
corresponding to the deletions in the variable cycle satisfies
C
j
. Let
m
be the
number of clauses in
φ
and observe that, since
φ
is a 3-CNF formula, the overall
number of vertices in the variable cycles is 12
m
. Our construction guarantees
that there is a satisfying assignment for
φ
if and only if the constructed graph can
be transformed into one with colorful components by exactly 6
m
+ 4
m
= 10
m
edge deletions, where 6
m
edge deletions are used for the variable cycles and
4m modifications are used for the clause gadgets. The details follow.
Given a 3-CNF formula
φ
consisting of the clauses
C
0
, . . . , C
m1
over the
variables
{x
0
, . . . , x
n1
}
, construct a Colorful Components-instance (
G
=
4
A similar reduction type was previously used to show analogous results for Transi-
tivity Editing [
16
] and Cluster Editing [
10
], which, in contrast, are defined on
uncolored graphs.

Proc. CPM 2012, Vol. 7354 of LNCS 5
r
4π(r,j)
r
4π(r,j)+1
p
4π(p,j)
p
4π(p,j)+1
q
4π(q,j)+1
q
4π(q,j)+2
a
j
Fig. 1.
The clause gadget for clause
C
j
= (
x
p
¯x
q
x
r
). White vertices have color
c
e
, gray
vertices have color
c
o
, and black vertices have color
c
g
. The vertex
a
j
is the reserved
vertex for
C
j
, the other vertices lie on the variable cycles for
x
p
,
x
q
, and
x
r
, respectively.
(
V, E
)
, k
) as follows. For each variable
x
i
, 0
i < n
,
G
contains a variable cycle
consisting of the vertices
V
v
i
:=
{i
0
, . . . , i
4m
i
1
}
and the edges
E
v
i
:=
{{i
k
, i
k+1
} |
0
k <
4
m
i
}
(for ease of presentation let
i
4m
i
=
i
0
). An edge
{i
x
, i
x+1
}
is even
if
x
is even, and odd otherwise. A vertex
i
x
receives color
c
e
if
x
is even; otherwise,
it receives color
c
o
. So far, the constructed graph consists of a disjoint union of
cycles and has 12m vertices and edges.
Next, add a clause gadget to
G
for each clause of
φ
. In the construction of the
clause gadgets, we need for each clause
C
j
in the variable cycles of
C
j
’s variables
a fixed set of vertices that are “reserved” for
C
j
. To this end, suppose that for
each variable
x
i
an arbitrary but fixed ordering of the clauses containing
x
i
is
given, and let
π
(
i, j
)
{
0
, . . . ,
4
m
i
1
}
denote the position of a clause
C
j
that
contains
x
i
in this ordering. We now give the details of the construction of the
clause gadgets. Let
C
j
be a clause containing the variables
x
p
,
x
q
, and
x
r
(either
negated or nonnegated). We construct a clause gadget connecting the variable
cycles of
x
p
,
x
q
, and
x
r
. First, let
a
j
be a new vertex that appears only in the
clause gadget for clause
C
j
and color
a
j
with a third color
c
g
. Let
E
g
j
denote
the edge set of the clause gadget and let
E
g
j
contain for each
i {p, q, r}
the
edges
{a
j
, i
4π(i,j)
}
and
{a
j
, i
4π(i,j)+1
}
if
x
i
occurs nonnegated in
C
j
or the edges
{a
j
, i
4π( i,j)+1
}
and
{a
j
, i
4π(i,j)+2
}
, otherwise. See Fig. 1 for an illustration. The
construction of
G
= (
V, E
) is completed by setting
V
:=
S
n1
i=0
V
v
i
S
m1
j=0
{a
j
}
and E :=
S
n1
i=0
E
v
i
S
m1
j=0
E
g
j
.
We show the correctness of the reduction by showing the following claim.
φ
is satisfiable
G
can be transformed into a graph with colorful
components by deleting at most k := 10m edges.
”: Given a satisfying assignment
β
for
φ
, we can transform
G
into a graph
with colorful connected components as follows. For each variable
x
i
delete the
odd edges of the variable cycle of
x
i
if
β
(
x
i
) =
true
and the even edges otherwise.
After these deletions, there are no bad paths that contain only vertices from the
variable cycles. Then, proceed as follows for each clause
C
j
. Assume without loss
of generality that
C
j
contains the variables
x
p
,
x
q
, and
x
r
, and that the literal
corresponding to
x
p
is true. Then, delete the four edges that are incident with
a
j

Citations
More filters
Posted Content
22 Nov 2006
TL;DR: Koivisto et al. as discussed by the authors presented an O(2k n2 + n m) algorithm for the Steiner tree problem in graphs with n vertices, k terminals, and m edges with bounded integer weights.
Abstract: We present a fast algorithm for the subset convolution problem:given functions f and g defined on the lattice of subsets of ann-element set n, compute their subset convolution f*g, defined for S⊆ N by [ (f * g)(S) = [T ⊆ S] f(T) g(S/T),,]where addition and multiplication is carried out in an arbitrary ring. Via Mobius transform and inversion, our algorithm evaluates the subset convolution in O(n2 2n) additions and multiplications, substanti y improving upon the straightforward O(3n) algorithm. Specifically, if the input functions have aninteger range [-M,-M+1,...,M], their subset convolution over the ordinary sum--product ring can be computed in O(2n log M) time; the notation O suppresses polylogarithmic factors.Furthermore, using a standard embedding technique we can compute the subset convolution over the max--sum or min--sum semiring in O(2n M) time.To demonstrate the applicability of fast subset convolution, wepresent the first O(2k n2 + n m) algorithm for the Steiner tree problem in graphs with n vertices, k terminals, and m edges with bounded integer weights, improving upon the O(3kn + 2k n2 + n m) time bound of the classical Dreyfus-Wagner algorithm. We also discuss extensions to recent O(2n)-time algorithms for covering and partitioning problems (Bjorklund and Husfeldt, FOCS 2006; Koivisto, FOCS 2006).

280 citations

01 Jan 2009
TL;DR: This paper presents a new algorithm, C3Part-M, based on the work by Boyer et al.
Abstract: Recent experimental progress is once again producing a huge quantity of data in various areas of biology, in particular on protein interactions. In order to extract meaningful information from this data, researchers typically use a graph representation to which they apply network alignment tools. Because of the combinatorial difficulty of the network alignment problem, most of the algorithms developed so far are heuristics, and the exact ones are of no use in practice on large numbers of networks. In this paper, we propose a unified scheme on the question of network alignment and we present a new algorithm, C3Part-M , based on the work by Boyer et al. [2], that is much more efficient than the original one in the case of multiple networks. We compare it as concerns protein-protein interaction networks to a recently proposed alignment tool, NetworkBLAST-M [10], and show that we recover similar results, while using a different but exact approach.

15 citations

Book ChapterDOI
05 Jun 2013
TL;DR: This work identifies a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describes and compares three exact and two heuristic approaches to solve this NP-hard graph partitioning problem.
Abstract: The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances.

15 citations


Cites background or methods or result from "Partitioning into colorful componen..."

  • ...Previously, we showed that it is NP-hard even in three-colored graphs with maximum degree six [4], and proposed an exact branching algorithm with running time O((c− 1) · |E|) where k is the number of deleted edges....

    [...]

  • ...2(a), we compare the running times for the three approaches and additionally the branching algorithm from [4], with a time limit of 15 minutes....

    [...]

  • ...Similar to our previous results for multiple sequence alignment [4], the mergebased heuristic gives an excellent approximation here....

    [...]

  • ...Before starting the solver, we use data reduction as described before [4]....

    [...]

  • ...For completeness, we briefly recall this greedy heuristic [4]....

    [...]

Proceedings ArticleDOI
Nicola Prezza1
01 Jan 2019
TL;DR: In this article, the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size of a string attractor for the query was studied.
Abstract: We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $\gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(\log(n/\gamma)/\log\log n)$ time within $O\left (\gamma\ \rm{polylog}\ n \right)$ space. In this paper, we extend this result to \emph{rank} and \emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $\log\log n$ time-factor in \emph{select} queries. We also provide matching lower and upper bounds for \emph{partial sum} and \emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.

9 citations

Journal ArticleDOI
TL;DR: In this article, a polynomial-time algorithm was proposed for the problem of removing a collection of edges from an undirected vertex-colored graph such that in the resulting graph all the connected components are colorful.
Abstract: In this paper we investigate the colorful components framework, motivated by applications emerging from comparative genomics The general goal is to remove a collection of edges from an undirected vertex-colored graph $$G$$G such that in the resulting graph $$G'$$G? all the connected components are colorful (ie, any two vertices of the same color belong to different connected components) We want $$G'$$G? to optimize an objective function, the selection of this function being specific to each problem in the framework We analyze three objective functions, and thus, three different problems, which are believed to be relevant for the biological applications: minimizing the number of singleton vertices, maximizing the number of edges in the transitive closure, and minimizing the number of connected components Our main result is a polynomial-time algorithm for the first problem This result disproves the conjecture of Zheng et al that the problem is $$ NP$$NP-hard (assuming $$P e NP$$P?NP) Then, we show that the second problem is $$ APX$$APX-hard, thus proving and strengthening the conjecture of Zheng et al that the problem is $$ NP$$NP-hard Finally, we show that the third problem does not admit polynomial-time approximation within a factor of $$|V|^{1/14 - \epsilon }$$|V|1/14-∈ for any $$\epsilon > 0$$∈>0, assuming $$P e NP$$P?NP (or within a factor of $$|V|^{1/2 - \epsilon }$$|V|1/2-∈, assuming $$ZPP e NP$$ZPP?NP)

8 citations

References
More filters
Book ChapterDOI
14 Jul 1980

4,755 citations

Journal ArticleDOI
01 Oct 2005-Proteins
TL;DR: The latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions is presented, including new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences.
Abstract: Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.u-strasbg.fr/balibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.

424 citations


"Partitioning into colorful componen..." refers background in this paper

  • ...0 benchmark [14], using the diafragm 1....

    [...]

  • ...0 benchmark [14] each time within five minutes on a standard PC, with up to 5 000 vertices and 13 000 edges....

    [...]

Journal ArticleDOI
TL;DR: A brief survey is presented that presents data reduction and problem kernelization as a promising research field for algorithm and complexity theory.
Abstract: To solve NP-hard problems, polynomial-time preprocessing is a natural and promising approach. Preprocessing is based on data reduction techniques that take a problem's input instance and try to perform a reduction to a smaller, equivalent problem kernel. Problem kernelization is a methodology that is rooted in parameterized computational complexity. In this brief survey, we present data reduction and problem kernelization as a promising research field for algorithm and complexity theory.

406 citations


"Partitioning into colorful componen..." refers background in this paper

  • ...We note that Rule 1 provides a trivial kernelization [8](5) for Colorful Components with respect to the combined parameter (k, c): obviously, after exhaustive data reduction, the instance has at most 2kc vertices, since an edge deletion can produce at most two colorful components, each of size at most c....

    [...]

Book ChapterDOI
TL;DR: This chapter proves lower bounds based on ETH for the time needed to solve various problems, and in many cases these lower bounds match the running time of the best known algorithms for the problem.
Abstract: The Exponential Time Hypothesis (ETH) is a conjecture stating that, roughly speaking, n-variable 3-SAT cannot be solved in time 2o(n). In this chapter, we prove lower bounds based on ETH for the time needed to solve various problems. In many cases, these lower bounds match (up to small factors) the running time of the best known algorithms for the problem.

396 citations

Journal ArticleDOI
TL;DR: It is shown that both the maximum integral multicommodity flow and the minimum multicut problem are NP-hard and MAX SNP-hard on trees, although themaximum integral flow can be computed in polynomial time if the edges have unit capacity.
Abstract: We study the maximum integral multicommodity flow problem and the minimum multicut problem restricted to trees. This restriction is quite rich and contains as special cases classical optimization problems such as matching and vertex cover for general graphs. It is shown that both the maximum integral multicommodity flow and the minimum multicut problem are NP-hard and MAX SNP-hard on trees, although the maximum integral flow can be computed in polynomial time if the edges have unit capacity. We present an efficient algorithm that computes a multicut and integral flow such that the weight of the multicut is at most twice the value of the flow. This gives a 2-approximation algorithm for minimum multicut and a 1/2-approximation algorithm for maximum integral multicommodity flow in trees.

391 citations


"Partitioning into colorful componen..." refers background in this paper

  • ...Note that Multicut is NP-hard and MaxSNPhard even if the input is a star, that is, a tree consisting of a central vertex with attached degree-1 vertices [7]....

    [...]

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "Partitioning into colorful components by minimum edge deletions" ?

The authors initiate a systematic complexity-theoretic study of Colorful Components by presenting NP-hardness as well as fixed-parameter tractability results for different variants of Colorful Components. The authors also perform experiments with their algorithms and additionally develop an efficient and very accurate heuristic algorithm clearly outperforming a previous min-cut-based heuristic on multiple sequence alignment data.