scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2016"


Journal ArticleDOI
Adam M. Session1, Adam M. Session2, Yoshinobu Uno3, Taejoon Kwon4, Taejoon Kwon5, Jarrod Chapman2, Atsushi Toyoda6, Shuji Takahashi7, Akimasa Fukui8, Akira Hikosaka7, Atsushi Suzuki7, Mariko Kondo9, Simon J. van Heeringen10, Ian K. Quigley11, Sven Heinz11, Hajime Ogino12, Haruki Ochi13, Uffe Hellsten2, Jessica B. Lyons1, Oleg Simakov14, Nicholas H. Putnam, Jonathan C. Stites, Yoko Kuroki, Toshiaki Tanaka15, Tatsuo Michiue9, Minoru Watanabe16, Ozren Bogdanovic17, Ryan Lister17, Georgios Georgiou10, Sarita S. Paranjpe10, Ila van Kruijsbergen10, Shengquiang Shu2, Joseph W. Carlson2, Tsutomu Kinoshita18, Yuko Ohta19, Shuuji Mawaribuchi20, Jerry Jenkins2, Jane Grimwood2, Jeremy Schmutz2, Therese Mitros1, Sahar V. Mozaffari21, Yutaka Suzuki9, Yoshikazu Haramoto22, Takamasa S. Yamamoto23, Chiyo Takagi23, Rebecca Heald1, Kelly E. Miller1, Christian D. Haudenschild24, Jacob O. Kitzman25, Takuya Nakayama26, Yumi Izutsu27, Jacques Robert28, Joshua D. Fortriede29, Kevin A. Burns, Vaneet Lotay30, Kamran Karimi30, Yuuri Yasuoka14, Darwin S. Dichmann1, Martin F. Flajnik19, Douglas W. Houston31, Jay Shendure25, Louis DuPasquier32, Peter D. Vize30, Aaron M. Zorn29, Michihiko Ito20, Edward M. Marcotte5, John B. Wallingford5, Yuzuru Ito22, Makoto Asashima22, Naoto Ueno23, Naoto Ueno33, Yoichi Matsuda3, Gert Jan C. Veenstra10, Asao Fujiyama34, Asao Fujiyama33, Asao Fujiyama6, Richard M. Harland1, Masanori Taira9, Daniel S. Rokhsar1, Daniel S. Rokhsar14, Daniel S. Rokhsar2 
20 Oct 2016-Nature
TL;DR: The Xenopus laevis genome is sequenced and it is estimated that the two diploid progenitor species diverged around 34 million years ago and combined to form an allotetraploid around 17–18 Ma, where more than 56% of all genes were retained in two homoeologous copies.
Abstract: To explore the origins and consequences of tetraploidy in the African clawed frog, we sequenced the Xenopus laevis genome and compared it to the related diploid X. tropicalis genome. We characterize the allotetraploid origin of X. laevis by partitioning its genome into two homoeologous subgenomes, marked by distinct families of 'fossil' transposable elements. On the basis of the activity of these elements and the age of hundreds of unitary pseudogenes, we estimate that the two diploid progenitor species diverged around 34 million years ago (Ma) and combined to form an allotetraploid around 17-18 Ma. More than 56% of all genes were retained in two homoeologous copies. Protein function, gene expression, and the amount of conserved flanking sequence all correlate with retention rates. The subgenomes have evolved asymmetrically, with one chromosome set more often preserving the ancestral state and the other experiencing more gene loss, deletion, rearrangement, and reduced gene expression.

761 citations


Journal ArticleDOI
TL;DR: As the first report on cavefish genomes among distinct species in Sinocyclocheilus, this work provides not only insights into genetic mechanisms of cave adaptation, but also represents a fundamental resource for a better understanding of cavefish biology.
Abstract: An emerging cavefish model, the cyprinid genus Sinocyclocheilus, is endemic to the massive southwestern karst area adjacent to the Qinghai-Tibetan Plateau of China. In order to understand whether orogeny influenced the evolution of these species, and how genomes change under isolation, especially in subterranean habitats, we performed whole-genome sequencing and comparative analyses of three species in this genus, S. grahami, S. rhinocerous and S. anshuiensis. These species are surface-dwelling, semi-cave-dwelling and cave-restricted, respectively. The assembled genome sizes of S. grahami, S. rhinocerous and S. anshuiensis are 1.75 Gb, 1.73 Gb and 1.68 Gb, respectively. Divergence time and population history analyses of these species reveal that their speciation and population dynamics are correlated with the different stages of uplifting of the Qinghai-Tibetan Plateau. We carried out comparative analyses of these genomes and found that many genetic changes, such as gene loss (e.g. opsin genes), pseudogenes (e.g. crystallin genes), mutations (e.g. melanogenesis-related genes), deletions (e.g. scale-related genes) and down-regulation (e.g. circadian rhythm pathway genes), are possibly associated with the regressive features (such as eye degeneration, albinism, rudimentary scales and lack of circadian rhythms), and that some gene expansion (e.g. taste-related transcription factor gene) may point to the constructive features (such as enhanced taste buds) which evolved in these cave fishes. As the first report on cavefish genomes among distinct species in Sinocyclocheilus, our work provides not only insights into genetic mechanisms of cave adaptation, but also represents a fundamental resource for a better understanding of cavefish biology.

279 citations


Journal ArticleDOI
28 Jul 2016-Oncogene
TL;DR: This work shows that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo 3 pseudogene (Foxo3P), and FoxO3 circular RNA, both of which bind to eight miRNAs.
Abstract: It has recently been shown that the upregulation of a pseudogene specific to a protein-coding gene could function as a sponge to bind multiple potential targeting microRNAs (miRNAs), resulting in increased gene expression. Similarly, it was recently demonstrated that circular RNAs can function as sponges for miRNAs, and could upregulate expression of mRNAs containing an identical sequence. Furthermore, some mRNAs are now known to not only translate protein, but also function to sponge miRNA binding, facilitating gene expression. Collectively, these appear to be effective mechanisms to ensure gene expression and protein activity. Here we show that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo3 pseudogene (Foxo3P), and Foxo3 circular RNA, both of which bind to eight miRNAs. We found that the ectopic expression of the Foxo3P, Foxo3 circular RNA and Foxo3 mRNA could all suppress tumor growth and cancer cell proliferation and survival. Our results showed that at least three mechanisms are used to ensure protein translation of Foxo3, which reflects an essential role of Foxo3 and its corresponding non-coding RNAs.

278 citations


Journal ArticleDOI
03 Nov 2016-Nature
TL;DR: The characterization of a pseudogene in the chemosensory variant ionotropic glutamate receptor repertoire of Drosophila sechellia, an insect endemic to the Seychelles that feeds almost exclusively on the ripe fruit of Morinda citrifolia is reported.
Abstract: Pseudogenes are generally considered to be non-functional DNA sequences that arise through nonsense or frame-shift mutations of protein-coding genes. Although certain pseudogene-derived RNAs have regulatory roles, and some pseudogene fragments are translated, no clear functions for pseudogene-derived proteins are known. Olfactory receptor families contain many pseudogenes, which reflect low selection pressures on loci no longer relevant to the fitness of a species. Here we report the characterization of a pseudogene in the chemosensory variant ionotropic glutamate receptor repertoire of Drosophila sechellia, an insect endemic to the Seychelles that feeds almost exclusively on the ripe fruit of Morinda citrifolia. This locus, D. sechellia Ir75a, bears a premature termination codon (PTC) that appears to be fixed in the population. However, D. sechellia Ir75a encodes a functional receptor, owing to efficient translational read-through of the PTC. Read-through is detected only in neurons and is independent of the type of termination codon, but depends on the sequence downstream of the PTC. Furthermore, although the intact Drosophila melanogaster Ir75a orthologue detects acetic acid-a chemical cue important for locating fermenting food found only at trace levels in Morinda fruit-D. sechellia Ir75a has evolved distinct odour-tuning properties through amino-acid changes in its ligand-binding domain. We identify functional PTC-containing loci within different olfactory receptor repertoires and species, suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon.

102 citations


Journal ArticleDOI
TL;DR: An overview of expression levels of ORs and auxiliary genes in human olfactory epithelium is provided and a transcriptomic view of the entire OR repertoire is revealed, revealing a large number of over-expressed uncharacterized human non-receptor genes, providing a platform for future discovery.
Abstract: Olfaction is a versatile sensory mechanism for detecting thousands of volatile odorants. Although molecular basis of odorant signaling is relatively well understood considerable gaps remain in the complete charting of all relevant gene products. To address this challenge, we applied RNAseq to four well-characterized human olfactory epithelial samples and compared the results to novel and published mouse olfactory epithelium as well as 16 human control tissues. We identified 194 non-olfactory receptor (OR) genes that are overexpressed in human olfactory tissues vs. controls. The highest overexpression is seen for lipocalins and bactericidal/permeability-increasing (BPI)-fold proteins, which in other species include secreted odorant carriers. Mouse-human discordance in orthologous lipocalin expression suggests different mammalian evolutionary paths in this family. Of the overexpressed genes 36 have documented olfactory function while for 158 there is little or no previous such functional evidence. The latter group includes GPCRs, neuropeptides, solute carriers, transcription factors and biotransformation enzymes. Many of them may be indirectly implicated in sensory function, and ~70 % are over expressed also in mouse olfactory epithelium, corroborating their olfactory role. Nearly 90 % of the intact OR repertoire, and ~60 % of the OR pseudogenes are expressed in the olfactory epithelium, with the latter showing a 3-fold lower expression. ORs transcription levels show a 1000-fold inter-paralog variation, as well as significant inter-individual differences. We assembled 160 transcripts representing 100 intact OR genes. These include 1–4 short 5’ non-coding exons with considerable alternative splicing and long last exons that contain the coding region and 3’ untranslated region of highly variable length. Notably, we identified 10 ORs with an intact open reading frame but with seemingly non-functional transcripts, suggesting a yet unreported OR pseudogenization mechanism. Analysis of the OR upstream regions indicated an enrichment of the homeobox family transcription factor binding sites and a consensus localization of a specific transcription factor binding site subfamily (Olf/EBF). We provide an overview of expression levels of ORs and auxiliary genes in human olfactory epithelium. This forms a transcriptomic view of the entire OR repertoire, and reveals a large number of over-expressed uncharacterized human non-receptor genes, providing a platform for future discovery.

93 citations



Journal ArticleDOI
TL;DR: This study defines a core set of 67 genes with robust periodic expression in multiple cell types and suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes.
Abstract: Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes.

81 citations


Journal ArticleDOI
TL;DR: The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets, but their creation, however, remains challenging.
Abstract: A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.

71 citations


Journal ArticleDOI
TL;DR: The analyses indicate that the first functional gene losses occurred within 10 Myr of the transition to obligate parasitism in Orobanchaceae, and that the physical plastome reduction proceeds by small deletions that accumulate over time.
Abstract: Plastid genomes (plastomes) of nonphotosynthetic plants experience extensive gene losses and an acceleration of molecular evolutionary rates. Here, we inferred the mechanisms and timing of reductive genome evolution under relaxed selection in the broomrape family (Orobanchaceae). We analyzed the plastomes of several parasites with a major focus on the genus Orobanche using genome-descriptive and Bayesian phylogenetic-comparative methods. Besides this, we scanned the parasites' other cellular genomes to trace the fate of all genes that were purged from their plastomes. Our analyses indicate that the first functional gene losses occurred within 10 Myr of the transition to obligate parasitism in Orobanchaceae, and that the physical plastome reduction proceeds by small deletions that accumulate over time. Evolutionary rate shifts coincide with the genomic reduction process in broomrapes, suggesting that the shift of selectional constraints away from photosynthesis to other molecular processes alters the plastid rate equilibrium. Most of the photosynthesis-related genes or fragments of genes lost from the plastomes of broomrapes have survived in their nuclear or mitochondrial genomes as the results of multiple intracellular transfers and subsequent fragmentation. Our findings indicate that nonessential DNA is eliminated much faster in the plastomes of nonphotosynthetic parasites than in their other cellular genomes.

71 citations


Journal ArticleDOI
07 Sep 2016-PLOS ONE
TL;DR: A de novo annotated assembly of the chromosomal genome of an industrially-relevant strain, W29/CLIB89, determined by hybrid next-generation sequencing underscores the utility of an additional independent genome assembly for this economically important organism.
Abstract: Yarrowia lipolytica, an oleaginous yeast, is capable of accumulating significant cellular mass in lipid making it an important source of biosustainable hydrocarbon-based chemicals. In spite of a similar number of protein-coding genes to that in other Hemiascomycetes, the Y. lipolytica genome is almost double that of model yeasts. Despite its economic importance and several distinct strains in common use, an independent genome assembly exists for only one strain. We report here a de novo annotated assembly of the chromosomal genome of an industrially-relevant strain, W29/CLIB89, determined by hybrid next-generation sequencing. For the first time, each Y. lipolytica chromosome is represented by a single contig. The telomeric rDNA repeats were localized by Irys long-range genome mapping and one complete copy of the rDNA sequence is reported. Two large structural variants and retroelement differences with reference strain CLIB122 including a full-length, novel Ty3/Gypsy long terminal repeat (LTR) retrotransposon and multiple LTR-like sequences are described. Strikingly, several of these are adjacent to RNA polymerase III-transcribed genes, which are almost double in number in Y. lipolytica compared to other Hemiascomycetes. In addition to previously-reported dimeric RNA polymerase III-transcribed genes, tRNA pseudogenes were identified. Multiple full-length and truncated LINE elements are also present. Therefore, although identified transposons do not constitute a significant fraction of the Y. lipolytica genome, they could have played an active role in its evolution. Differences between the sequence of this strain and of the existing reference strain underscore the utility of an additional independent genome assembly for this economically important organism.

69 citations


Journal ArticleDOI
TL;DR: It is shown that truncTALE Tal2h of Xoc strain BLS256, and by correlation truncTales in other strains, specifically suppress resistance mediated by the Xo1 locus recently described in the heirloom rice variety Carolina Gold.
Abstract: Delivered into plant cells by type III secretion from pathogenic Xanthomonas species, TAL (transcription activator-like) effectors are nuclear-localized, DNA-binding proteins that directly activate specific host genes. Targets include genes important for disease, genes that confer resistance, and genes inconsequential to the host-pathogen interaction. TAL effector specificity is encoded by polymorphic repeats of 33-35 amino acids that interact one-to-one with nucleotides in the recognition site. Activity depends also on N-terminal sequences important for DNA binding and C-terminal nuclear localization signals (NLS) and an acidic activation domain (AD). Coding sequences missing much of the N- and C-terminal regions due to conserved, in-frame deletions are present and annotated as pseudogenes in sequenced strains of X. oryzae pv. oryzicola (Xoc) and pv. oryzae (Xoo), which cause bacterial leaf streak and bacterial blight of rice, respectively. Here we provide evidence that these sequences encode proteins we call ‘truncTALEs,’ for ‘truncated TAL effectors.’ We show that truncTALE Tal2h of Xoc strain BLS256, and by correlation truncTALEs in other strains, specifically suppress resistance mediated by the Xo1 locus recently described in the heirloom rice variety Carolina Gold. Xo1-mediated resistance is triggered by different TAL effectors from diverse X. oryzae strains, irrespective of their DNA binding specificity, and does not require the AD. This implies a direct protein-protein rather than protein-DNA interaction. Similarly, truncTALEs exhibit diverse predicted DNA recognition specificities. And, in vitro, Tal2h did not bind any of several potential recognition sites. Further, a single candidate NLS sequence in Tal2h was dispensable for resistance suppression. Many truncTALEs have one 28 aa repeat, a length not observed previously. Tested in an engineered TAL effector, this repeat required a single base pair deletion in the DNA, suggesting that it or a neighbor disengages. The presence of the 28 aa repeat, however, was not required for resistance suppression. TruncTALEs expand the paradigm for TAL effector-mediated effects on plants. We propose that Tal2h and other truncTALEs act as dominant negative ligands for an immune receptor encoded by the Xo1 locus, likely a nucleotide binding, leucine-rich repeat protein. Understanding truncTALE function and distribution will inform strategies for disease control.

Journal ArticleDOI
TL;DR: It is shown that ribosome profiling upon translation inhibition by tetracycline offers a simple, reliable and comprehensive experimental tool for precise annotation of translation start sites of expressed genes in bacteria.
Abstract: Tetracycline-inhibited ribosome profiling (TetRP) provides a powerful new experimental tool for comprehensive genome-wide identification of translation initiation sites in bacteria. We validated TetRP by confirming the translation start sites of protein-coding genes in accordance with the 2006 version of Escherichia coli K-12 annotation record (GenBank U000962) and found ∼150 new start sites within 60 nucleotides of the annotated site. This analysis revealed 72 per cent of the genes whose initiation site annotations were changed from the 2006 GenBank record to the newer 2014 annotation record (GenBank U000963), indicating a high sensitivity. Also, results from reporter fusion and proteomics of N-terminally enriched peptides showed high specificity of the TetRP results. In addition, we discovered over 300 translation start sites within non-coding, intergenic regions of the genome, using a threshold that retains ∼2,000 known coding genes. While some appear to correspond to pseudogenes, others may encode small peptides or have previously unforeseen roles. In summary, we showed that ribosome profiling upon translation inhibition by tetracycline offers a simple, reliable and comprehensive experimental tool for precise annotation of translation start sites of expressed genes in bacteria.

Journal ArticleDOI
TL;DR: The present work provides a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERv-W role in pathologies with poorly understood etiology, representing, to the authors' knowledge, the most complete and exhaustive HERVs-W dataset up to date.
Abstract: Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies. In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported. The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.

Journal Article
TL;DR: In this article, the authors present a Web of Science Record created on 2016-11-21, modified on 2017-05-12.Reference EPFL-CONF-223164
Abstract: Reference EPFL-CONF-223164View record in Web of Science Record created on 2016-11-21, modified on 2017-05-12

Journal ArticleDOI
TL;DR: It is demonstrated that some but not all translated pseudogenes have selected functions at theprotein level, and neither ORF disruption nor presence of protein product disproves or proves gene functionality at the protein level.
Abstract: By definition, pseudogenes are relics of former genes that no longer possess biological functions. Operationally, they are identified based on disruptions of open reading frames (ORFs) or presumed losses of promoter activities. Intriguingly, a recent human proteomic study reported peptides encoded by 107 pseudogenes. These peptides may play currently unrecognized physiological roles. Alternatively, they may have resulted from accidental translations of pseudogene transcripts and possess no function. Comparing between human and macaque orthologs, we show that the nonsynonymous to synonymous substitution rate ratio (ω) is significantly smaller for translated pseudogenes than other pseudogenes. In particular, five of 34 translated pseudogenes amenable to evolutionary analysis have ω values significantly lower than 1, indicative of the action of purifying selection. This and other findings demonstrate that some but not all translated pseudogenes have selected functions at the protein level. Hence, neither ORF disruption nor presence of protein product disproves or proves gene functionality at the protein level.

Journal ArticleDOI
TL;DR: The first comprehensive genomic study for a whole gene family, the SUT family, in Saccharum is presented, showing that the examined four SsSUT members exhibited conservations of gene structures and amino acid sequences among the allelic haplotypes accompanied by variations of intron sizes.
Abstract: Sugarcane is an economically important crop contributing to about 80 % of the world sugar production. Increasing efforts in molecular biological studies have been performed for improving the sugar yield and other relevant important agronomic traits. However, due to sugarcane’s complicated genomes, it is still challenging to study the genetic basis of traits, such as sucrose accumulation. Sucrose transporters (SUTs) are critical for both phloem loading in source tissue and sucrose uptaking in sink tissue, and are considered to be the control points for regulating sucrose storage. However, no genomic study for sugarcane sucrose transporter (SsSUT) families has been reported up to date. By using comparative genomics and bacterial artificial chromosomes (BACs), six SUT genes were identified and characterized in S. spontaenum. Phylogenetic analyses revealed that the two pairs SsSUTs (SsSUT1/SsSUT3 and SsSUT5/SsSUT6) could be clustered together into two separate monocot specific SUT groups, while SsSUT2 and SsSUT4 were separated into the other two groups, with members from both dicot and monocot species. Gene structure comparison demonstrated that the number and position of exons/introns in SUTs were highly conserved among the close orthologs; in contrast, there were variations among the paralogous SUTs in Sacchuarm. Though with the high polyploidy level, gene allelic haplotype comparative analysis showed that the examined four SsSUT members exhibited conservations of gene structures and amino acid sequences among the allelic haplotypes accompanied by variations of intron sizes. Gene expression analyses were performed for tissues from seedlings under drought stress and mature plants of three Saccharum species (S.officinarnum, S.spotaneum and S.robustum). Both SUT1 and SUT4 expressed abundantly at different conditions. SUT2 had similar expression level in all of the examined tissues, but SUT3 was undetectable. Both of SUT5 and SUT6 had lower expression level than other gene member, and expressed stronger in source leaves and are likely to play roles in phloem loading. In the seeding plant leave under water stress, four genes SUT1, SUT2, SUT4 and SUT5 were detectable. In these detectable genes, SUT1 and SUT4 were down regulated, while, SUT2 and SUT5 were up regulated. In this study, we presented the first comprehensive genomic study for a whole gene family, the SUT family, in Saccharum. We speculated that there were six SUT members in the S. spotaneum genome. Out of the six members, SsSUTs, SsSUT5 and SsSUT6 were recent duplication genes accompanied by rapid evolution, while, SsSUT2 and SsSUT4 were the ancient members in the families. Despite the high polypoidy genome, functional redundancy may not exist among the SUTs allelic haplotypes supported by the evidence of strong purifying selection of the gene allele. SUT3 could be a low active member in the family because it is undetectable in our study, but it might not be a pseudogene because it harbored integrated gene structure. SUT1 and SUT4 were the main members for the sucrose transporter, while, these SUTs had sub-functional divergence in response to sucrose accumulation and plant development in Saccharum.

Journal ArticleDOI
TL;DR: A computational method, PTESFinder, that systematically excludes potential artefacts emanating from pseudogenes, segmental duplications, and template switching, and outputs both PTES and canonical exon junction counts to facilitate comparative analyses of poorly understood transcripts.
Abstract: Transcripts, which have been subject to Post-transcriptional exon shuffling (PTES), have an exon order inconsistent with the underlying genomic sequence. These have been identified in a wide variety of tissues and cell types from many eukaryotes, and are now known to be mostly circular, cytoplasmic, and non-coding. Although there is no uniformly ascribed function, several have been shown to be involved in gene regulation. Accurate identification of these transcripts can, however, be difficult due to artefacts from a wide variety of sources. Here, we present a computational method, PTESFinder, to identify these transcripts from high throughput RNAseq data. Uniquely, it systematically excludes potential artefacts emanating from pseudogenes, segmental duplications, and template switching, and outputs both PTES and canonical exon junction counts to facilitate comparative analyses. In comparison with four existing methods, PTESFinder achieves highest specificity and comparable sensitivity at a variety of read depths. PTESFinder also identifies between 13 % and 41.6 % more structures, compared to publicly available methods recently used to identify human circular RNAs. With high sensitivity and specificity, user-adjustable filters that target known sources of false positives, and tailored output to facilitate comparison of transcript levels, PTESFinder will facilitate the discovery and analysis of these poorly understood transcripts.

Journal ArticleDOI
02 Mar 2016-PLOS ONE
TL;DR: The plastid genome of Lathraea squamaria, a holoparasitic plant from Orobanchaceae, is reported, and it is found that in this plant the degree of plastome reduction is the least among non-photosynthetic plants.
Abstract: Plants from the family Orobanchaceae are widely used as a model to study different aspects of parasitic lifestyle including host–parasite interactions and physiological and genomic adaptations. Among the latter, the most prominent are those that occurred due to the loss of photosynthesis; they include the reduction of the photosynthesis-related gene set in both nuclear and plastid genomes. In Orobanchaceae, the transition to non-photosynthetic lifestyle occurred several times independently, but only one lineage has been in the focus of evolutionary studies. These studies included analysis of plastid genomes and transcriptomes and allowed the inference of patterns and mechanisms of genome reduction that are thought to be general for parasitic plants. Here we report the plastid genome of Lathraea squamaria, a holoparasitic plant from Orobanchaceae, clade Rhinantheae. We found that in this plant the degree of plastome reduction is the least among non-photosynthetic plants. Like other parasites, Lathraea possess a plastome with elevated absolute rate of nucleotide substitution. The only gene lost is petL, all other genes typical for the plastid genome are present, but some of them–those encoding photosystem components (22 genes), cytochrome b6/f complex proteins (4 genes), plastid-encoded RNA polymerase subunits (2 genes), ribosomal proteins (2 genes), ccsA and cemA–are pseudogenized. Genes for cytochrome b6/f complex and photosystems I and II that do not carry nonsense or frameshift mutations have an increased ratio of non-synonymous to synonymous substitution rates, indicating the relaxation of purifying selection. Our divergence time estimates showed that transition to holoparasitism in Lathraea lineage occurred relatively recently, whereas the holoparasitic lineage Orobancheae is about two times older.

Journal ArticleDOI
TL;DR: Evaluating the enamel specificity of four genes implicated in amelogenesis imperfecta concludes that C4orf26 is tooth- specific, but not enamel-specific, with respect to its essential functions that are maintained by natural selection.

Journal ArticleDOI
TL;DR: Comparisons of genome structure and gene content of six new isolates from the Fusarium graminearum species complex are compared, including the first available genomes of F. asiaticum and F. meridionale, with four other genomes reported in previous studies.
Abstract: The Fusarium graminearum species complex is composed of many distinct fungal species that cause several diseases in economically important crops, including Fusarium Head Blight of wheat. Despite being closely related, these species and individuals within species have distinct phenotypic differences in toxin production and pathogenicity, with some isolates reported as non-pathogenic on certain hosts. In this report, we compare genomes and gene content of six new isolates from the species complex, including the first available genomes of F. asiaticum and F. meridionale, with four other genomes reported in previous studies. A comparison of genome structure and gene content revealed a 93–99% overlap across all ten genomes. We identified more than 700 k base pairs (kb) of single nucleotide polymorphisms (SNPs), insertions, and deletions (indels) within common regions of the genome, which validated the species and genetic populations reported within species. We constructed a non-redundant pan gene list containing 15,297 genes from the ten genomes and among them 1827 genes or 12% were absent in at least one genome. These genes were co-localized in telomeric regions and select regions within chromosomes with a corresponding increase in SNPs and indels. Many are also predicted to encode for proteins involved in secondary metabolism and other functions associated with disease. Genes that were common between isolates contained high levels of nucleotide variation and may be pseudogenes, allelic, or under diversifying selection. The genomic resources we have contributed will be useful for the identification of genes that contribute to the phenotypic variation and niche specialization that have been reported among members of the F. graminearum species complex.

Journal ArticleDOI
TL;DR: Overall, this study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.
Abstract: The qualification of orthology is a significant challenge when developing large, multiloci phylogenetic data sets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein-coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete phylogenetic data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon capture targeting 490 of the 500 genes (those with at least one exon >120 bp) from 22 species of Australian Camaenidae successfully captured sequences of 2825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness when considering the original 500 genes. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and 'Agalma-equivalent' data set (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.

Journal ArticleDOI
TL;DR: The current knowledge of pseudogenes is reviewed and the nascent evidence for functional properties and regulatory modalities exerted by pseudogene-transcribed RNAs in human cancers are synthesized to prospect the potential as molecular signatures in cancer reclassification and tailored therapy.
Abstract: Over the past decade, the importance of non-protein-coding functional elements in the human genome has emerged from the water and been identified as a key revelation in post-genomic biology. Since the completion of the ENCODE (Encyclopedia of DNA Elements) and FANTOM (Functional Annotation of Mammals) project, tens of thousands of pseudogenes as well as numerous long non-coding RNA (lncRNA) genes were identified. However, while pseudogenes were initially regarded as non-functional relics littering the human genome during evolution, recent studies have revealed that they play critical roles at multiple levels in diverse physiological and pathological processes, especially in cancer through parental-gene-dependent or parental-gene-independent regulation. Herein, we review the current knowledge of pseudogenes and synthesize the nascent evidence for functional properties and regulatory modalities exerted by pseudogene-transcribed RNAs in human cancers and prospect the potential as molecular signatures in cancer reclassification and tailored therapy.

Journal ArticleDOI
TL;DR: It is hypothesize that PsORC3a is associated with the down-regulation of its functional homolog and with the development of apomictic endosperm which deviates from the canonical 2(maternal):1(paternal) genome ratio.
Abstract: Apomixis in plants consists of asexual reproduction by seeds. Here we characterized at structural and functional levels an apomixis-linked sequence of Paspalum simplex homologous to subunit 3 of the ORIGIN RECOGNITION COMPLEX (ORC3). ORC is a multiprotein complex which controls DNA replication and cell differentiation in eukaryotes. Three PsORC3 copies were identified, each one characterized by a specific expression profile. Of these, PsORC3a, specific for apomictic genotypes, is a pseudogene that was poorly and constitutively expressed in all developmental stages of apomictic flowers, whereas PsORC3b, the putative functional gene in sexual flowers, showed a precise time-related regulation. Sense transcripts of PsORC3 were expressed in the female cell lineage of both apomictic and sexual reproductive phenotypes, and in aposporous initials. Although strong expression was detected in sexual early endosperm, no expression was present in the apomictic endosperm. Antisense PsORC3 transcripts were revealed exclusively in apomictic germ cell lineages. Defective orc3 mutants of rice and Arabidopsis showed normal female gametophytes although the embryo and endosperm were arrested at early phases of development. We hypothesize that PsORC3a is associated with the down-regulation of its functional homolog and with the development of apomictic endosperm which deviates from the canonical 2(maternal):1(paternal) genome ratio.

Journal ArticleDOI
28 Jun 2016-PeerJ
TL;DR: The complete chloroplast genome sequence of Primula sinensis was reported and it strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.
Abstract: Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36-rps8, rps16-trnQ, trnH-psbA and ndhC-trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.

Journal ArticleDOI
13 Oct 2016-PLOS ONE
TL;DR: The data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants, and identifies 734 RNA editing sites supported by at least two datasets.
Abstract: Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.

Journal ArticleDOI
13 Sep 2016-PLOS ONE
TL;DR: A likely evolutionary rDNA stasis during land colonisation and diversification across 480 myr of bryophyte evolution is suggested and it is hypothesised that strong selection forces may be acting against ribosomal gene locus amplification.
Abstract: Genes encoding ribosomal RNA (rDNA) are universal key constituents of eukaryotic genomes, and the nuclear genome harbours hundreds to several thousand copies of each species. Knowledge about the number of rDNA loci and gene copy number provides information for comparative studies of organismal and molecular evolution at various phylogenetic levels. With the exception of seed plants, the range of 45S rDNA locus (encoding 18S, 5.8S and 26S rRNA) and gene copy number variation within key evolutionary plant groups is largely unknown. This is especially true for the three earliest land plant lineages Marchantiophyta (liverworts), Bryophyta (mosses), and Anthocerotophyta (hornworts). In this work, we report the extent of rDNA variation in early land plants, assessing the number of 45S rDNA loci and gene copy number in 106 species and 25 species, respectively, of mosses, liverworts and hornworts. Unexpectedly, the results show a narrow range of ribosomal locus variation (one or two 45S rDNA loci) and gene copies not present in vascular plant lineages, where a wide spectrum is recorded. Mutation analysis of whole genomic reads showed higher (3-fold) intragenomic heterogeneity of Marchantia polymorpha (Marchantiophyta) rDNA compared to Physcomitrella patens (Bryophyta) and two angiosperms (Arabidopsis thaliana and Nicotiana tomentosifomis) suggesting the presence of rDNA pseudogenes in its genome. No association between phylogenetic position, taxonomic adscription and the number of rDNA loci and gene copy number was found. Our results suggest a likely evolutionary rDNA stasis during land colonisation and diversification across 480 myr of bryophyte evolution. We hypothesise that strong selection forces may be acting against ribosomal gene locus amplification. Despite showing a predominant haploid phase and infrequent meiosis, overall rDNA homogeneity is not severely compromised in bryophytes.

Journal ArticleDOI
TL;DR: This is the largest collection of lineage 5 and 6 whole genome sequences to date, and assembly and alignment data provide valuable insights into what distinguishes these lineages from other MTC lineages.
Abstract: Background Mycobacterium africanum, made up of lineages 5 and 6 within the Mycobacterium tuberculosis complex (MTC), causes up to half of all tuberculosis cases in West Africa, but is rarely found outside of this region. The reasons for this geographical restriction remain unknown. Possible reasons include a geographically restricted animal reservoir, a unique preference for hosts of West African ethnicity, and an inability to compete with other lineages outside of West Africa. These latter two hypotheses could be caused by loss of fitness or altered interactions with the host immune system. Methodology/Principal Findings We sequenced 92 MTC clinical isolates from Mali, including two lineage 5 and 24 lineage 6 strains. Our genome sequencing assembly, alignment, phylogeny and average nucleotide identity analyses enabled us to identify features that typify lineages 5 and 6 and made clear that these lineages do not constitute a distinct species within the MTC. We found that in Mali, lineage 6 and lineage 4 strains have similar levels of diversity and evolve drug resistance through similar mechanisms. In the process, we identified a putative novel streptomycin resistance mutation. In addition, we found evidence of person-to-person transmission of lineage 6 isolates and showed that lineage 6 is not enriched for mutations in virulence-associated genes. Conclusions This is the largest collection of lineage 5 and 6 whole genome sequences to date, and our assembly and alignment data provide valuable insights into what distinguishes these lineages from other MTC lineages. Lineages 5 and 6 do not appear to be geographically restricted due to an inability to transmit between West African hosts or to an elevated number of mutations in virulence-associated genes. However, lineage-specific mutations, such as mutations in cell wall structure, secretion systems and cofactor biosynthesis, provide alternative mechanisms that may lead to host specificity.

Journal ArticleDOI
TL;DR: The extraordinary high levels of 35S rDNA diversity in C. revoluta, and probably other species of cycads, indicate that the frequency of repeat homogenisation has been much lower in this lineage, compared with all other land plant lineages studied.
Abstract: In all eukaryotes, the highly repeated 35S ribosomal DNA (rDNA) sequences encoding 18S-5.8S-26S ribosomal RNA (rRNA) typically show high levels of intragenomic uniformity due to homogenisation processes, leading to concerted evolution of 35S rDNA repeats. Here, we compared 35S rDNA divergence in several seed plants using next generation sequencing and a range of molecular and cytogenetic approaches. Most species showed similar 35S rDNA homogeneity indicating concerted evolution. However, Cycas revoluta exhibits an extraordinary diversity of rDNA repeats (nucleotide sequence divergence of different copies averaging 12 %), influencing both the coding and non-coding rDNA regions nearly equally. In contrast, its rRNA transcriptome was highly homogeneous suggesting that only a minority of genes ( T substitutions located in symmetrical CG and CHG contexts which were also highly methylated. Both functional genes and pseudogenes appear to cluster on chromosomes. The extraordinary high levels of 35S rDNA diversity in C. revoluta, and probably other species of cycads, indicate that the frequency of repeat homogenisation has been much lower in this lineage, compared with all other land plant lineages studied. This has led to the accumulation of methylation-driven mutations and pseudogenisation. Potentially, the reduced homology between paralogs prevented their elimination by homologous recombination, resulting in long-term retention of rDNA pseudogenes in the genome.

Journal ArticleDOI
TL;DR: Deep sequencing the CD177 locus identified a novel stop codon in CD177null individuals arising from a single base substitution in exon 7 and identifies a method for screening for individuals at risk of CD177 isoimmunisation.
Abstract: Most humans harbor both CD177neg and CD177pos neutrophils but 1–10% of people are CD177null, placing them at risk for formation of anti-neutrophil antibodies that can cause transfusion-related acute lung injury and neonatal alloimmune neutropenia. By deep sequencing the CD177 locus, we catalogued CD177 single nucleotide variants and identified a novel stop codon in CD177null individuals arising from a single base substitution in exon 7. This is not a mutation in CD177 itself, rather the CD177null phenotype arises when exon 7 of CD177 is supplied entirely by the CD177 pseudogene (CD177P1), which appears to have resulted from allelic gene conversion. In CD177 expressing individuals the CD177 locus contains both CD177P1 and CD177 sequences. The proportion of CD177hi neutrophils in the blood is a heritable trait. Abundance of CD177hi neutrophils correlates with homozygosity for CD177 reference allele, while heterozygosity for ectopic CD177P1 gene conversion correlates with increased CD177neg neutrophils, in which both CD177P1 partially incorporated allele and paired intact CD177 allele are transcribed. Human neutrophil heterogeneity for CD177 expression arises by ectopic allelic conversion. Resolution of the genetic basis of CD177null phenotype identifies a method for screening for individuals at risk of CD177 isoimmunisation.

Journal ArticleDOI
01 Jan 2016-Cell
TL;DR: A potential coding-independent function for OCT4 pseudogenes during differentiation or tumorigenesis is suggested, consistent with a newly proposed competitive role of pseudogene microRNA docking sites.