Home
/
Authors
/
David M. Hillis

Author

David M. Hillis

Other affiliations: American Museum of Natural History, University of Miami

Bio: David M. Hillis is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Phylogenetic tree & Monophyly. The author has an hindex of 71, co-authored 191 publications receiving 28377 citations. Previous affiliations of David M. Hillis include American Museum of Natural History & University of Miami.

Topics: Phylogenetic tree, Monophyly, Population, Ribosomal DNA, Medicine ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis

[...]

David M. Hillis, James J. Bull

01 Jun 1993-Systematic Biology

TL;DR: This work uses computer simulations and a laboratory-generated phylogeny to test bootstrapping results of parsimony analyses, and indicates that any given bootstrap proportion provides an unbiased but highly imprecise measure of repeatability, unless the actual probability of replicating the relevant result is nearly one.

...read moreread less

Abstract: Bootstrapping is a common method for assessing confidence in phylogenetic anal? yses. Although bootstrapping was first applied in phylogenetics to assess the repeatability of a given result, bootstrap results are commonly interpreted as a measure of the probability that a phylogenetic estimate represents the true phylogeny. Here we use computer simulations and a laboratory-generated phylogeny to test bootstrapping results of parsimony analyses, both as measures of repeatability (i.e., the probability of repeating a result given a new sample of characters) and accuracy (i.e., the probability that a result represents the true phylogeny). Our results indicate that any given bootstrap proportion provides an unbiased but highly imprecise measure of repeatability, unless the actual probability of replicating the relevant result is nearly one. The imprecision of the estimate is great enough to render the estimate virtually useless as a measure of repeatability. Under conditions thought to be typical of most phylogenetic analyses, however, bootstrap proportions in majority-rule consensus trees provide biased but highly con? servative estimates of the probability of correctly inferring the corresponding clades. Specifically, under conditions of equal rates of change, symmetric phylogenies, and internodal change of 70% usually correspond to a probability of >95% that the corresponding dade is real. However, under conditions of very high rates of internodal change (approaching randomization of the characters among taxa) or highly unequal rates of change among taxa, bootstrap proportions >50% are overestimates of accuracy. (Boot? strapping; accuracy; repeatability; phylogeny; parsimony; precision; statistical analyses; simu? lations.)

...read moreread less

4,057 citations

Journal Article•DOI•

Ribosomal DNA: molecular evolution and phylogenetic inference.

[...]

David M. Hillis¹, Michael T. Dixon¹•Institutions (1)

University of Texas at Austin¹

01 Dec 1991-The Quarterly Review of Biology

TL;DR: An analysis of aligned sequences of the four nuclear and two mitochondrial rRNA genes identified regions of these genes that are likely to be useful to address phylogenetic problems over a wide range of levels of divergence.

...read moreread less

Abstract: Ribosomal DNA (rDNA) sequences have been aligned and compared in a number of living organisms, and this approach has provided a wealth of information about phylogenetic relationships. Studies of rDNA sequences have been used to infer phylogenetic history across a very broad spectrum, from studies among the basal lineages of life to relationships among closely related species and populations. The reasons for the systematic versatility of rDNA include the numerous rates of evolution among different regions of rDNA (both among and within genes), the presence of many copies of most rDNA sequences per genome, and the pattern of concerted evolution that occurs among repeated copies. These features facilitate the analysis of rDNA by direct RNA sequencing, DNA sequencing (either by cloning or amplification), and restriction enzyme methodologies. Constraints imposed by secondary structure of rRNA and concerted evolution need to be considered in phylogenetic analyses, but these constraints do not appear to impede seriously the usefulness of rDNA. An analysis of aligned sequences of the four nuclear and two mitochondrial rRNA genes identified regions of these genes that are likely to be useful to address phylogenetic problems over a wide range of levels of divergence. In general, the small subunit nuclear sequences appear to be best for elucidating Precambrian divergences, the large subunit nuclear sequences for Paleozoic and Mesozoic divergences, and the organellar sequences of both subunits for Cenozoic divergences. Primer sequences were designed for use in amplifying the entire nuclear rDNA array in 15 sections by use of the polymerase chain reaction; these "universal" primers complement previously described primers for the mitochondrial rRNA genes. Pairs of primers can be selected in conjunction with the analysis of divergence of the rRNA genes to address systematic problems throughout the hierarchy of life.

...read moreread less

2,439 citations

Nucleic acids II: the polymerase chain reaction

[...]

David M. Hillis, Craig Moritz, Barbara K. Mable, Stephen R. Palumbi

01 Jan 1996

2,074 citations

Journal Article•DOI•

Signal, Noise, and Reliability in Molecular Phylogenetic Analyses

[...]

David M. Hillis, John P. Huelsenbeck¹•Institutions (1)

University of Texas at Austin¹

01 Jun 1992-Journal of Heredity

TL;DR: This work analyzed 8,000 random data matrices consisting of 10-500 binary or four-state characters and 5-25 taxa to study several options for detecting signal in systematic data bases, finding the skewness of tree-length distributions is closely related to the success of parsimony in finding the true phylogeny.

...read moreread less

Abstract: DNA sequences and other molecular data compared among organisms may contain phylogenetic signal, or they may be randomized with respect to phylogenetic history. Some method is needed to distinguish phylogenetic signal from random noise to avoid analysis of data that have been randomized with respect to the historical relationships of the taxa being compared. We analyzed 8,000 random data matrices consisting of 10-500 binary or four-state characters and 5-25 taxa to study several options for detecting signal in systematic data bases. Analysis of random data often yields a single most-parsimonious tree, especially if the number of characters examined is large and the number of taxa examined is small (both often true in molecular studies). The most-parsimonious tree inferred from random data may also be considerably shorter than the second-best alternative. The distribution of tree lengths of all tree topologies (or a random sample thereof) provides a sensitive measure of phylogenetic signal: data matrices with phylogenetic signal produce tree-length distributions that are strongly skewed to the left, whereas those composed of random noise are closer to symmetrical. In simulations of phylogeny with varying rates of mutation (up to levels that produce random variation among taxa), the skewness of tree-length distributions is closely related to the success of parsimony in finding the true phylogeny. Tables of critical values of a skewness test statistic, g1, are provided for binary and four-state characters for 10-500 characters and 5-25 taxa. These tables can be used in a rapid and efficient test for significant structure in data matrices for phylogenetic analysis.

...read moreread less

1,323 citations

Journal Article•DOI•

Increased taxon sampling greatly reduces phylogenetic error.

[...]

Derrick J. Zwickl, David M. Hillis

01 Jul 2002-Systematic Biology

TL;DR: The measurement of phylogenetic error across a wide range of taxon sample sizes is considered, and it is concluded that the expected error based on randomly selecting trees must be considered in evaluating error in studies of the effects ofTaxon sampling.

...read moreread less

Abstract: Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little benefit of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined five aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems defined by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxon-sampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at finding near-optimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although benefits of taxon sampling are apparent for all models, data generated under more complex models of evolution produce higher overall levels of error and show greater positive effects of increased taxon sampling. Fifth, we asked if different phylogenetic optimality criteria show different effects of taxon sampling. Although we found strong differences in effectiveness of different optimality criteria as a function of taxon sample size, increased taxon sampling improved the results from all the common optimality criteria. Nonetheless, the method that showed the lowest overall performance (minimum evolution) also showed the least improvement from increased taxon sampling. Taking each of these results into account re-enforces the conclusion that increased sampling of taxa is one of the most important ways to increase overall phylogenetic accuracy.

...read moreread less

901 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

[...]

Stéphane Guindon¹, Olivier Gascuel¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Oct 2003-Systematic Biology

TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.

...read moreread less

Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described

...read moreread less

16,261 citations

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

Journal Article•DOI•

TCS: a computer program to estimate gene genealogies.

[...]

Mark J. Clement¹, David Posada¹, Keith A. Crandall¹•Institutions (1)

Brigham Young University¹

01 Oct 2000-Molecular Ecology

9,118 citations

Mesquite: a modular system for evolutionary analysis. Version 2.6

[...]

W. P. Maddison, D. R. Maddison

01 Jan 2009

8,708 citations

Journal Article•DOI•

An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II

[...]

K. Bremer¹, Mark W. Chase¹, J. L. Reveal¹, Douglas E. Soltis², Pamela S. Soltis², Peter F. Stevens³ - Show less +2 more•Institutions (3)

Royal Botanic Gardens¹, University of Florida², University of Missouri³

01 May 2016-Botanical Journal of the Linnean Society

TL;DR: A revised and updated classification for the families of the flowering plants is provided in this paper, which includes Austrobaileyales, Canellales, Gunnerales, Crossosomatales and Celastrales.

...read moreread less

7,299 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse