scispace - formally typeset
Search or ask a question

Showing papers on "Ancestral reconstruction published in 2021"


Journal ArticleDOI
TL;DR: A review of recent studies that probe the early stages of the evolution of a new enzyme from two complementary points of view: ancestral reconstruction and laboratory evolution.

15 citations


Journal ArticleDOI
TL;DR: The approach of ancestral sequence reconstruction (ASR) as mentioned in this paper provides a primer to reconstructing the sequence of an ancestral gene, and has been used to investigate molecular evolution on shorter timescales.
Abstract: Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.

9 citations


Journal ArticleDOI
TL;DR: SubMarine as discussed by the authors is a polynomial-time algorithm that approximates the Maximally-Constrained Ancestral Reconstruction (MAR) and guarantees that its defined relationships are a subset of those present in the MAR.
Abstract: Tumors contain multiple subpopulations of genetically distinct cancer cells. Reconstructing their evolutionary history can improve our understanding of how cancers develop and respond to treatment. Subclonal reconstruction methods cluster mutations into groups that co-occur within the same subpopulations, estimate the frequency of cells belonging to each subpopulation, and infer the ancestral relationships among the subpopulations by constructing a clone tree. However, often multiple clone trees are consistent with the data and current methods do not efficiently capture this uncertainty; nor can these methods scale to clone trees with a large number of subclonal populations. Here, we formalize the notion of a partially-defined clone tree (partial clone tree for short) that defines a subset of the pairwise ancestral relationships in a clone tree, thereby implicitly representing the set of all clone trees that have these defined pairwise relationships. Also, we introduce a special partial clone tree, the Maximally-Constrained Ancestral Reconstruction (MAR), which summarizes all clone trees fitting the input data equally well. Finally, we extend commonly used clone tree validity conditions to apply to partial clone trees and describe SubMARine, a polynomial-time algorithm producing the subMAR, which approximates the MAR and guarantees that its defined relationships are a subset of those present in the MAR. We also extend SubMARine to work with subclonal copy number aberrations and define equivalence constraints for this purpose. Further, we extend SubMARine to permit noise in the estimates of the subclonal frequencies while retaining its validity conditions and guarantees. In contrast to other clone tree reconstruction methods, SubMARine runs in time and space that scale polynomially in the number of subclones. We show through extensive noise-free simulation, a large lung cancer dataset and a prostate cancer dataset that the subMAR equals the MAR in all cases where only a single clone tree exists and that it is a perfect match to the MAR in most of the other cases. Notably, SubMARine runs in less than 70 seconds on a single thread with less than one Gb of memory on all datasets presented in this paper, including ones with 50 nodes in a clone tree. On the real-world data, SubMARine almost perfectly recovers the previously reported trees and identifies minor errors made in the expert-driven reconstructions of those trees. The freely-available open-source code implementing SubMARine can be downloaded at https://github.com/morrislab/submarine.

6 citations


Journal ArticleDOI
TL;DR: This article used quantitative phage display to measure the interactions of 30,533 random peptides with human S100A5, S 100A6, and ancA5/A6.
Abstract: Some have hypothesized that ancestral proteins were, on average, less specific than their descendants. If true, this would provide a universal axis along which to organize protein evolution and suggests that reconstructed ancestral proteins may be uniquely powerful tools for protein engineering. Ancestral sequence reconstruction studies are one line of evidence used to support this hypothesis. Previously, we performed such a study, investigating the evolution of peptide binding specificity for the paralogs S100A5 and S100A6. The modern proteins appeared more specific than their last common ancestor (ancA5/A6), as each paralog bound a subset of the peptides bound by ancA5/A6. In the current study, we revisit this transition, using quantitative phage display to measure the interactions of 30,533 random peptides with human S100A5, S100A6, and ancA5/A6. This unbiased screen reveals a different picture. While S100A5 and S100A6 do indeed bind to a subset of the peptides recognized by ancA5/A6, they also acquired new peptide partners outside of the set recognized by ancA5/A6. Our previous work showed that ancA5/A6 had lower specificity than its descendants when measured against biological targets; our new work shows that ancA5/A6 has similar specificity to the modern proteins when measured against a random set of peptide targets. This demonstrates that altered biological specificity does not necessarily indicate altered intrinsic specificity, and sounds a cautionary note for using ancestral reconstruction studies with biological targets as a means to infer global evolutionary trends in specificity.

6 citations


Journal ArticleDOI
TL;DR: In this article, an algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships, was developed, which alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual.
Abstract: In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs. We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct-we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families.

2 citations


Journal ArticleDOI
Wang Yanmei1, Jia Feng1, Junping Lv1, Qi Liu1, Fangru Nan1, Xudong Liu1, Shulian Xie1 
TL;DR: This research clarifies the phylogenetic relationships of green euglenophytes and provides a basis for the study of the origin of these plants.
Abstract: Green euglenophytes are a group of eukaryotes with ancient origin. In order to understand the evolution of the group, it is interesting to know which characteristics are more primitive. Here, a phylogenetic tree of green euglenophytes based on the 18S rRNA gene was constructed, and ancestral states were reconstructed based on eight morphological characters. This research clarifies the phylogenetic relationships of green euglenophytes and provides a basis for the study of the origin of these plants. The phylogenetic tree, which was constructed by Bayesian inference, revealed that: Eutreptia and Eutreptiella were sister groups and that Lepocinclis, Phacus, and Discoplastis were close relatives; Euglena, Cryptoglena, Monomorphina, and Colacium were closely related in addition to Trachelomonas and Strombomonas; and Euglena was not monophyletic. An ancestral reconstruction based on morphological characters revealed seven primitive character states: ductile surface, spirally striated, slightly narrowing or sharp elongated cauda, absence of a lorica, chloroplast lamellar, shield or large discoid, pyrenoid with sheath, and with many small paramylon grains. However, the ancestral state of the length of the flagellum could not be inferred. Euglena and Euglenaria, which both possessed all of the ancestral character states, might represent the most ancient lineages of green euglenophytes.

2 citations


Journal ArticleDOI
TL;DR: In this paper, the authors analyzed seven Doradinae species using combined methods (e.g., cytogenetic tools and Mesquite ancestral reconstruction software) in order to scrutinize the processes that mediated the karyotype diversification in this subfamily.
Abstract: Doradinae (Siluriformes: Doradidae) is the most species-rich subfamily among thorny catfishes, encompassing over 77 valid species, found mainly in Amazon and Platina hydrographic basins. Here, we analyzed seven Doradinae species using combined methods (e.g., cytogenetic tools and Mesquite ancestral reconstruction software) in order to scrutinize the processes that mediated the karyotype diversification in this subfamily. Our ancestral reconstruction recovered that 2n=58 chromosomes and simple nucleolar organizer regions (NOR) are ancestral features only for Wertheimerinae and the most clades of Doradinae. Some exceptions were found in Trachydoras paraguayensis (2n=56), Trachydoras steindachneri (2n=60), Ossancora punctata (2n=66) and Platydoras hancockii whose karyotypes showed a multiple NOR system. The large thorny catfishes, such as Pterodoras granulosus, Oxydoras niger and Centrodoras brachiatus share several karyotype features, with subtle variations only regarding their heterochromatin distribution. On the other hand, a remarkable karyotypic variability has been reported in the fimbriate barbells thorny catfishes. These two contrasting karyoevolution trajectories emerged from a complex interaction between chromosome rearrangements (e.g., inversions and Robertsonian translocations) and mechanisms of heterochromatin dispersion. Moreover, we believe that biological features, such as microhabitats preferences, populational size, low vagility and migratory behavior played a key role during the origin and maintenance of chromosome diversity in Doradinae subfamily.

1 citations


Journal ArticleDOI
TL;DR: The ancestral plant HYL1 evolved high affinity for both double-stranded RNA (dsRNA) and its DCL1 partner very early in plant evolutionary history, before the divergence of mosses from seed plants (~500 Ma), and these high-affinity interactions remained largely conserved throughout plant evolutionaryhistory.
Abstract: In plants, miRNA production is orchestrated by a suite of proteins that control transcription of the pri-miRNA gene, post-transcriptional processing and nuclear export of the mature miRNA. Post-transcriptional processing of miRNAs is controlled by a pair of physically interacting proteins, hyponastic leaves 1 (HYL1) and Dicer-like 1 (DCL1). However, the evolutionary history and structural basis of the HYL1-DCL1 interaction is unknown. Here we use ancestral sequence reconstruction and functional characterization of ancestral HYL1 in vitro and in Arabidopsis thaliana to better understand the origin and evolution of the HYL1-DCL1 interaction and its impact on miRNA production and plant development. We found the ancestral plant HYL1 evolved high affinity for both double-stranded RNA (dsRNA) and its DCL1 partner before the divergence of mosses from seed plants (∼500 Ma), and these high-affinity interactions remained largely conserved throughout plant evolutionary history. Structural modeling and molecular binding experiments suggest that the second of two dsRNA-binding motifs (DSRMs) in HYL1 may interact tightly with the first of two C-terminal DCL1 DSRMs to mediate the HYL1-DCL1 physical interaction necessary for efficient miRNA production. Transgenic expression of the nearly 200 Ma-old ancestral flowering-plant HYL1 in A. thaliana was sufficient to rescue many key aspects of plant development disrupted by HYL1- knockout and restored near-native miRNA production, suggesting that the functional partnership of HYL1-DCL1 originated very early in and was strongly conserved throughout the evolutionary history of terrestrial plants. Overall, our results are consistent with a model in which miRNA-based gene regulation evolved as part of a conserved plant "developmental toolkit."

1 citations


Journal ArticleDOI
TL;DR: It is shown that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data and that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data.
Abstract: How can we best learn the history of a protein's evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the "unseen" state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that $$\omega$$, a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.].