Home
/
Authors
/
Amarendran R. Subramanian

Author

Amarendran R. Subramanian

Bio: Amarendran R. Subramanian is an academic researcher from University of Tübingen. The author has contributed to research in topics: Multiple sequence alignment & Trapezoid graph. The author has an hindex of 6, co-authored 6 publications receiving 574 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

[...]

Amarendran R. Subramanian¹, Michael Kaufmann¹, Burkhard Morgenstern²•Institutions (2)

University of Tübingen¹, University of Göttingen²

27 May 2008-Algorithms for Molecular Biology

TL;DR: DIALIGN-TX is presented, a substantial improvement of DIAL IGN-T that combines the previous greedy algorithm with a progressive alignment approach and produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly.

...read moreread less

Abstract: DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach. Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences. On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

...read moreread less

261 citations

Journal Article•DOI•

DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.

[...]

Amarendran R. Subramanian¹, Jan Weyer-Menkhoff², Michael Kaufmann¹, Burkhard Morgenstern²•Institutions (2)

University of Tübingen¹, University of Göttingen²

22 Mar 2005-BMC Bioinformatics

TL;DR: A complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN and is comparable to the standard global aligner CLUSTAL W, though it is outperformed by some newly developed programs that focus on global alignment.

...read moreread less

Abstract: Background We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level.

...read moreread less

226 citations

Proceedings Article•DOI•

Max-tolerance graphs as intersection graphs: cliques, cycles, and recognition

[...]

Michael Kaufmann¹, Jan Kratochvíl², Katharina A. Lehmann, Amarendran R. Subramanian•Institutions (2)

University of Tübingen¹, Charles University in Prague²

22 Jan 2006

TL;DR: The maximal and maximum cliques problem arises naturally in DNA sequence analysis where the maximal cliques might be interpreted as functional domains carrying biologically meaningful information and it is proved that the recognition problem for max-tolerance graphs is NP-hard.

...read moreread less

Abstract: Max-tolerance graphs can be regarded as generalized interval graphs, where two intervals Ii and Ij only induce an edge in the corresponding graph iff they overlap for an amount of at least max{ti, tj} where ti is an individual tolerance parameter associated to each interval Ii. A new geometric characterization of max-tolerance graphs as intersection graphs of isosceles right triangles, shortly called semi-squares, leverages the solution of various graph-theoretic problems in connection with max-tolerance graphs. First, we solve the maximal and maximum cliques problem. It arises naturally in DNA sequence analysis where the maximal cliques might be interpreted as functional domains carrying biologically meaningful information. We prove an upper bound of O(n3) for the number of maximal cliques in max-tolerance graphs and give an efficient O(n3) algorithm for their computation. In the same vein, the semi-square representation yields a simple proof for the fact that this bound is asymptotically tight, i.e., a class of max-tolerance graphs is presented where the instances have Ω(n3) maximal cliques. Additionally, we answer an open question posed in [8] by showing that max-tolerance graphs do not contain complements of cycles Cn for n > 9. By exploiting the new representation more deeply, we can go even further and prove that the recognition problem for max-tolerance graphs is NP-hard.

...read moreread less

46 citations

Journal Article•DOI•

Multiple sequence alignment with user-defined constraints at GOBICS

[...]

Burkhard Morgenstern¹, Nadine Werner¹, Sonja J. Prohaska², Rasmus Steinkamp¹, Isabelle Schneider¹, Amarendran R. Subramanian³, Peter F. Stadler², Jan Weyer-Menkhoff¹ - Show less +4 more•Institutions (3)

University of Göttingen¹, Leipzig University², University of Tübingen³

01 Apr 2005-Bioinformatics

TL;DR: A semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure and produce alignments that reflect the true biological relationships among the input sequences more accurately than fully automated procedures can do.

...read moreread less

Abstract: Summary: Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program uses these sites as anchor points and creates a multiple alignment respecting these user-defined constraints. By using known functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relationships among the input sequences more accurately than fully automated procedures can do. Availability: Our software is available online at GOttingen BIoinformatics Compute Server (GOBICS), http://dialign.gobics.de/anchor/index.php Contact: burkhard@gobics.de

...read moreread less

35 citations

Journal Article•DOI•

DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS

[...]

Amarendran R. Subramanian¹, Suvrat Hiran², Rasmus Steinkamp², Peter Meinicke², Eduardo Corel², Burkhard Morgenstern² - Show less +2 more•Institutions (2)

University of Tübingen¹, Indian Institute of Technology Kharagpur²

01 Jul 2010-Nucleic Acids Research

TL;DR: This work introduces web interfaces for two recent extensions of the multiple-alignment program DIALIGN and offers a version of DIAL IGN that uses predicted protein secondary structures together with primary sequence information to construct multiple protein alignments.

...read moreread less

Abstract: We introduce web interfaces for two recent extensions of the multiple-alignment program DIALIGN. DIALIGN-TX combines the greedy heuristic previously used in DIALIGN with a more traditional 'progressive' approach for improved performance on locally and globally related sequence sets. In addition, we offer a version of DIALIGN that uses predicted protein secondary structures together with primary sequence information to construct multiple protein alignments. Both programs are available through 'Gottingen Bioinformatics Compute Server' (GOBICS).

...read moreread less

10 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Recent developments in the MAFFT multiple sequence alignment program

[...]

Kazutaka Katoh¹, Hiroyuki Toh•Institutions (1)

Kyushu University¹

01 Jul 2008-Briefings in Bioinformatics

TL;DR: The initial version of the MAFFT program was developed in 2002 and was updated in 2007 with two new techniques: the PartTree algorithm and the Four-way consistency objective function, which improved the scalability of progressive alignment and the accuracy of ncRNA alignment.

...read moreread less

Abstract: The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFTuses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.

...read moreread less

3,278 citations

Journal Article•DOI•

iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

[...]

Rekin's Janky¹, Annelien Verfaillie¹, Hana Imrichova¹, Bram Van de Sande¹, Laura Standaert¹, Valerie Christiaens¹, Gert Hulselmans¹, Koen Herten¹, Marina Naval Sanchez¹, Delphine Potier¹, Dmitry Svetlichnyy¹, Zeynep Kalender Atak¹, Mark Fiers¹, Jean-Christophe Marine¹, Stein Aerts¹ - Show less +11 more•Institutions (1)

Katholieke Universiteit Leuven¹

24 Jul 2014-PLOS Computational Biology

TL;DR: Over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53 and a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY are mapped.

...read moreread less

Abstract: Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

...read moreread less

680 citations

Journal Article•DOI•

M-Coffee : Combining multiple sequence alignment methods with T-Coffee

[...]

Iain M. Wallace¹, Orla O'Sullivan, Desmond G. Higgins, Cedric Notredame•Institutions (1)

University College Dublin¹

01 Jan 2006-Nucleic Acids Research

TL;DR: M-Coffee is a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA that is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs.

...read moreread less

Abstract: We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from http://www.tcoffee.org/.

...read moreread less

566 citations

Journal Article•DOI•

Multiple sequence alignment.

[...]

Robert C. Edgar, Serafim Batzoglou¹•Institutions (1)

Stanford University¹

01 Jun 2006-Current Opinion in Structural Biology

TL;DR: Although CLUSTALW is still the most popular alignment tool to date, recent methods offer significantly better alignment quality and, in some cases, reduced computational cost.

...read moreread less

530 citations

Journal Article•DOI•

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

[...]

Shujun Ou¹, Weija Su¹, Yi Liao², Kapeel Chougule³, Jireh Agda⁴, Adam J. Hellinga⁴, Carlos Santiago Blanco Lugo⁴, Tyler A. Elliott⁴, Doreen Ware⁵, Doreen Ware³, Thomas Peterson¹, Ning Jiang⁶, Candice N. Hirsch⁷, Matthew B. Hufford¹ - Show less +10 more•Institutions (7)

Iowa State University¹, University of California, Irvine², Cold Spring Harbor Laboratory³, University of Guelph⁴, Cornell University⁵, Michigan State University⁶, University of Minnesota⁷

16 Dec 2019-Genome Biology

TL;DR: A comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) is created that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements and will greatly facilitate TE annotation in eukaryotic genomes.

...read moreread less

Abstract: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

...read moreread less

410 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

Collapse