Home
/
Authors
/
Jose Castresana

Author

Jose Castresana

Other affiliations: Spanish National Research Council, University of Alicante, University of the Basque Country ...read more

Bio: Jose Castresana is an academic researcher from Pompeu Fabra University. The author has contributed to research in topics: Phylogenetic tree & Extraction (chemistry). The author has an hindex of 37, co-authored 89 publications receiving 15694 citations. Previous affiliations of Jose Castresana include Spanish National Research Council & University of Alicante.

Topics: Phylogenetic tree, Extraction (chemistry), Gene, Genome, Toluene ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2002
2001
2000
1999
1998
1997
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis

[...]

Jose Castresana

01 Apr 2000-Molecular Biology and Evolution

TL;DR: A computerized method is presented that reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

...read moreread less

Abstract: The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

...read moreread less

8,757 citations

Journal Article•DOI•

Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments

[...]

Gerard Talavera, Jose Castresana¹•Institutions (1)

Spanish National Research Council¹

01 Aug 2007-Systematic Biology

TL;DR: Whether phylogenetic reconstruction improves after alignment cleaning or not is examined and cleaned alignments produce better topologies although, paradoxically, with lower bootstrap, which indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.

...read moreread less

Abstract: Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.

...read moreread less

4,227 citations

Journal Article•DOI•

Quantitative studies of the structure of proteins in solution by fourier-transform infrared spectroscopy

[...]

José Luis R. Arrondo¹, Arturo Muga¹, Jose Castresana¹, Félix M. Goñi¹•Institutions (1)

University of the Basque Country¹

01 Jan 1993-Progress in Biophysics & Molecular Biology

804 citations

Journal Article•DOI•

Phylogenetic and ecological analysis of novel marine stramenopiles.

[...]

Ramon Massana¹, Jose Castresana¹, Vanessa Balagué¹, Laure Guillou, Khadidja Romari, Agnès Groisillier, Klaus Valentin, Carlos Pedrós-Alió¹ - Show less +4 more•Institutions (1)

Spanish National Research Council¹

01 Jun 2004-Applied and Environmental Microbiology

TL;DR: A comparative analysis of novel stramenopiles is carried out, including new sequences from coastal genetic libraries presented here and sequences from recent reports from the open ocean and marine anoxic sites, confirming that they are fundamental members of the marine eukaryotic picoplankton.

...read moreread less

Abstract: Culture-independent molecular analyses of open-sea microorganisms have revealed the existence and apparent abundance of novel eukaryotic lineages, opening new avenues for phylogenetic, evolutionary, and ecological research. Novel marine stramenopiles, identified by 18S ribosomal DNA sequences within the basal part of the stramenopile radiation but unrelated to any previously known group, constituted one of the most important novel lineages in these open-sea samples. Here we carry out a comparative analysis of novel stramenopiles, including new sequences from coastal genetic libraries presented here and sequences from recent reports from the open ocean and marine anoxic sites. Novel stramenopiles were found in all major habitats, generally accounting for a significant proportion of clones in genetic libraries. Phylogenetic analyses indicated the existence of 12 independent clusters. Some of these were restricted to anoxic or deep-sea environments, but the majority were typical components of coastal and open-sea waters. We specifically identified four clusters that were well represented in most marine surface waters (together they accounted for 74% of the novel stramenopile clones) and are the obvious targets for future research. Many sequences were retrieved from geographically distant regions, indicating that some organisms were cosmopolitan. Our study expands our knowledge on the phylogenetic diversity and distribution of novel marine stramenopiles and confirms that they are fundamental members of the marine eukaryotic picoplankton.

...read moreread less

316 citations

Journal Article•DOI•

Evolution of cytochrome oxidase, an enzyme older than atmospheric oxygen.

[...]

Jose Castresana, Mathias Lübben, Matti Saraste, Desmond G. Higgins

01 Jun 1994-The EMBO Journal

TL;DR: It is proposed that aerobic metabolism in organisms with cytochrome oxidase has a monophyletic and ancient origin, prior to the appearance of eubacterial oxygenic photosynthetic organisms.

...read moreread less

Abstract: Cytochrome oxidase is a key enzyme in aerobic metabolism. All the recorded eubacterial (domain Bacteria) and archaebacterial (Archaea) sequences of subunits 1 and 2 of this protein complex have been used for a comprehensive evolutionary analysis. The phylogenetic trees reveal several processes of gene duplication. Some of these are ancient, having occurred in the common ancestor of Bacteria and Archaea, whereas others have occurred in specific lines of Bacteria. We show that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that Proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria. The prevalent hypothesis that aerobic metabolism arose several times in evolution after oxygenic photosynthesis, is not sustained by two aspects of the molecular data. First, cytochrome oxidase was present in the common ancestor of Archaea and Bacteria whereas oxygenic photosynthesis appeared in Bacteria. Second, an extant cytochrome oxidase in nitrogen-fixing bacteria shows that aerobic metabolism is possible in an environment with a very low level of oxygen, such as the root nodules of leguminous plants. Therefore, we propose that aerobic metabolism in organisms with cytochrome oxidase has a monophyletic and ancient origin, prior to the appearance of eubacterial oxygenic photosynthetic organisms.

...read moreread less

243 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

Journal Article•DOI•

Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis

[...]

Jose Castresana

01 Apr 2000-Molecular Biology and Evolution

...read moreread less

8,757 citations

Journal Article•DOI•

trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

[...]

Salvador Capella-Gutierrez, José M. Silla-Martínez, Toni Gabaldón

01 Aug 2009-Bioinformatics

TL;DR: TrimAl is a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses and can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized.

...read moreread less

Abstract: Multiple sequence alignments (MSA) are central to many areas of bioinformatics, including phylogenetics, homology modeling, database searches and motif finding. Recently, such MSA-based techniques have been incorporated in high-throughput pipelines such as genome annotation and phylogenomics analyses. In all these applications, the reliability and accuracy of the analyses depend critically on the quality of the underlying alignments. A plethora of computer programs and algorithms for MSA are currently available (Notredame, 2007), which implement different heuristics to find mathematically optimal solutions to the MSA problem. Accuracies of 80–90% have been reported for the best algorithms, but even the best scoring alignment algorithms may fail with certain protein families or at specific regions in the alignment. The situation worsens in large-scale analyses, where faster but less reliable algorithms and large numbers of automatically selected sequences are used. It is therefore generally assumed that trimming the alignment, so that poorly aligned regions are eliminated, increases the accuracy of the resulting MSA-based applications (Talavera and Castresana, 2007). Some programs such as G-blocks (Castresana, 2000) have been developed to assist in the MSA trimming phase by selecting blocks of conserved regions. They have become very popular and are extensively used, with good performance, in small-to-medium scale datasets, where several parameters can be tested manually (Talavera and Castresana, 2007). However, their use over larger datasets is hampered by the need for defining, prior to the analysis, the set of parameters that will be used for all sequence families. Here, we present trimAl, a tool for automated alignment trimming. Its speed and the possibility for automatically adjusting the parameters to improve the phylogenetic signal-to-noise ratio, makes trimAl especially suited for large-scale phylogenomic analyses, involving thousands of large alignments. trimAl has been developed in a GNU/Linux environment using C++ programming language and has been tested on various UNIX, Mac and Windows platforms. Moreover, we have developed a web server to run trimAl online (http://phylemon2.bioinfo.cipf.es/), which has been included in the Phylemon suite for phylogenetic and phylogenomic tools (Tarraga et al., 2007). The documentation, source files and additional information for trimAl are available through a wiki page (http://trimal.cgenomics.org). trimAl reads and renders protein or nucleotide alignments in several standard formats. trimAl starts by reading all columns in an alignment and computes a score (Sx) for each of them. This score can be a gap score (Sg), a similarity score (Ss) or a consistency score (Sc). The score for each column can be computed based only on the information from that column or, if a window size of w is specified, it corresponds to the average value of w columns around the position considered. The gap score (Sg) for a column is the fraction of sequences without a gap in that position. The residue similarity score (Ss) consists of mean distance (MD) scores as described in Thompson et al. (2001) and Supplementary Material. This score uses the MD between pairs of residues, as defined by a given scoring matrix. Finally, the consistency score (Sc) can only be computed when more than one alignment for the same set of sequences is provided. Details on how these scores are computed are provided in the Supplementary Material. In brief, Sc measures the level of consistency of all the residue pairs found in a column as compared with the other alignments. The alignment with the highest consistency is chosen and then trimmed to remove the columns that are less conserved, according to Sc or other thresholds set by the user. Once all column scores have been computed trimAl can proceed in two ways. If both a score and a minimum conservation threshold are provided, trimAl renders a trimmed alignment in which only the columns with scores above the score threshold are included, as far as the number of selected columns is above a conservation threshold defined by the user. If this number is below the conservation threshold, trimAl will add more columns to the trimmed alignment in a decreasing order of scores until the conservation threshold is reached. The conservation threshold corresponds to the minimum percentage of columns, from the original alignment, which the user wants to include in the trimmed alignment. Alternatively, if the automatic selection of parameters options is selected, trimAl will compute specific score thresholds depending on the inherent characteristics of each alignment. So far, trimAl incorporates three modes for the automated selection of parameters, gappyout, strict and strictplus, which are based on the different use of gap and similarity scores. Moreover, the option automated1 implements a heuristic to decide the most appropriate mode depending on the alignment characteristics. The heuristics to define such parameters have been designed based on the results of a benchmark. Details on the heuristics and the benchmark can be found in the online documentation of the program. In brief, the automatic selection of parameters approximate optimal cutoffs by plotting, internally, the cumulative graphs of gap and similarity scores of the columns in the alignment (see online documentation). We expanded, using ROSE simulations (Stoye et al., 1998) a benchmark set that has been used previously to test the improvement in phylogenetic performance after an alignment trimming phase (Talavera and Castresana, 2007). This dataset simulates several evolutionary scenarios varying in the number and length of the sequences, the topology of the underlying tree and the level of sequence divergence considered. We compared the results obtained from MUSCLE alignments before and after trimming with trimAl using automated selection of parameters. The accuracy of the resulting trees was measured by comparing them with the original trees used to generate the sequence sets, and measuring the Robinson Foulds distance (Robinson and Foulds, 1981). We observed an overall improvement of the phylogenetic accuracy after trimming. Using -automated1 option of trimAl, the trimmed alignment always produced Maximum Likelihood trees that were of equal (36%) or significantly better (64%) quality as compared with the tree derived from the complete alignment. For Neighbor Joining reconstruction the -strictplus option of trimAl worked best, improving the phylogenetic accuracy in 89% of the scenarios. In most scenarios (90%), trimAl outperformed Gblocks v0.91b with default parameters. Most importantly, the use of Gblocks default parameters diminished the accuracy of the subsequent tree reconstruction in half of the scenarios considered. In contrast, the use of trimAl automated methods rarely (1.5%) undermined the topological accuracy of the resulting phylogenetic tree (see Supplementary Material for more details). To test the applicability of trimAl on real datasets as well as its suitability for large-scale phylogenetic datasets, we ran trimAl on the complete set of MUSCLE alignments generated for the Human Phylome project (Huerta-Cepas et al., 2007). This includes a total of 31 182 alignments, containing, on average, 67 sequences of 1472 positions of length. Trimming these alignments using the -gappyout and automated1 options used 5 min 45 s and 125 min, 2 s, respectively, on a computer with an Intel QuadCore XEON E5410 processors and 8 GB of RAM. trimAl has been used previously in a pipeline to reconstruct complete collections of gene trees. In this case, the parameter sets used were a minimum conservation threshold of 60% and a gap threshold of 90% (-cons 60 -gt 0.9). Complete and trimmed alignments used to generate the phylomes included in PhylomeDB (Huerta-Cepas et al., 2008) can be viewed through this database.

...read moreread less

6,807 citations

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations

Modern Applied Statistics With S

[...]

Christina Gloeckner

01 Jan 2016

TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.

...read moreread less

Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

...read moreread less

5,249 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse