scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2019"


Posted ContentDOI
30 Apr 2019-bioRxiv
TL;DR: Overall, tRNA detection sensitivity and specificity is improved for all isotypes, particularly those utilizing specialized models for selenocysteine and the three subtypes of tRNA genes encoding a CAU anticodon.
Abstract: tRNAscan-SE has been widely used for whole-genome transfer RNA gene prediction for nearly two decades. With the increased availability of new genomes, a vastly larger training set has enabled creation of nearly one hundred specialized isotype-specific models, greatly improving tRNAscan-SE’s ability to identify and classify both typical and atypical tRNAs. We employ a new multi-model annotation strategy where predicted tRNAs are scored against a full set of isotype-specific covariance models. A post-filtering feature also better identifies tRNA-derived SINEs that are abundant in many eukaryotic genomes, and provides a “high confidence” tRNA gene set which improves upon prior pseudogene prediction. These new enhancements of tRNAscan-SE will provide researchers more accurate detection and more comprehensive annotation for tRNA genes.

297 citations


Journal ArticleDOI
TL;DR: The misconception that a gene in E. coli whose primary name starts with ‘y’ is unannotated is resolved, and the value of the y-ome is discussed for systematic improvement ofE.
Abstract: Experimental studies of Escherichia coli K-12 MG1655 often implicate poorly annotated genes in cellular phenotypes. However, we lack a systematic understanding of these genes. How many are there? What information is available for them? And what features do they share that could explain the gap in our understanding? Efforts to build predictive, whole-cell models of E. coli inevitably face this knowledge gap. We approached these questions systematically by assembling annotations from the knowledge bases EcoCyc, EcoGene, UniProt and RegulonDB. We identified the genes that lack experimental evidence of function (the 'y-ome') which include 1600 of 4623 unique genes (34.6%), of which 111 have absolutely no evidence of function. An additional 220 genes (4.7%) are pseudogenes or phantom genes. y-ome genes tend to have lower expression levels and are enriched in the termination region of the E. coli chromosome. Where evidence is available for y-ome genes, it most often points to them being membrane proteins and transporters. We resolve the misconception that a gene in E. coli whose primary name starts with 'y' is unannotated, and we discuss the value of the y-ome for systematic improvement of E. coli knowledge bases and its extension to other organisms.

102 citations


Journal ArticleDOI
TL;DR: It is important to monitor the quality of sequencing libraries in investigating variation and many recombinant strains have been transmitted during HCMV evolution, and some have apparently survived for thousands of years without further recombination.
Abstract: The genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. A total of 207 datasets generated in this and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determination of genome sequences for 91 strains. Key findings were that (i) it is important to monitor the quality of sequencing libraries in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.

71 citations


Journal ArticleDOI
TL;DR: A study identified unprecedented horizontal gene transfer events in Cuscuta campestris and related species, and provides insights into convergent HGTs between Cus cuta and Orobanchaceae parasites and the functional importance of the HGT sequences.
Abstract: Horizontal gene transfer (HGT), the movement and genomic integration of DNA across species boundaries, is commonly associated with bacteria and other microorganisms, but functional HGT (fHGT) is increasingly being recognized in heterotrophic parasitic plants that obtain their nutrients and water from their host plants through direct haustorial feeding. Here, in the holoparasitic stem parasite Cuscuta, we identify 108 transcribed and probably functional HGT events in Cuscuta campestris and related species, plus 42 additional regions with host-derived transposon, pseudogene and non-coding sequences. Surprisingly, 18 Cuscuta fHGTs were acquired from the same gene families by independent HGT events in Orobanchaceae parasites, and the majority are highly expressed in the haustorial feeding structures in both lineages. Convergent retention and expression of HGT sequences suggests an adaptive role for specific additional genes in parasite biology. Between 16 and 20 of the transcribed HGT events are inferred as ancestral in Cuscuta based on transcriptome sequences from species across the phylogenetic range of the genus, implicating fHGT in the successful radiation of Cuscuta parasites. Genome sequencing of C. campestris supports transfer of genomic DNA-rather than retroprocessed RNA-as the mechanism of fHGT. Many of the C. campestris genes horizontally acquired are also frequent sources of 24-nucleotide small RNAs that are typically associated with RNA-directed DNA methylation. One HGT encoding a leucine-rich repeat protein kinase overlaps with a microRNA that has been shown to regulate host gene expression, suggesting that HGT-derived parasite small RNAs may function in the parasite-host interaction. This study enriches our understanding of HGT by describing a parasite-host system with unprecedented gene exchange that points to convergent evolution of HGT events and the functional importance of horizontally transferred coding and non-coding sequences.

63 citations


Journal ArticleDOI
TL;DR: Although much has been learned about LCNs and MUPs in recent years, more research is necessary to allow better understanding of their physiological functions, as well as their involvement in clinical disorders.
Abstract: Lipocalins (LCNs) are members of a family of evolutionarily conserved genes present in all kingdoms of life. There are 19 LCN-like genes in the human genome, and 45 Lcn-like genes in the mouse genome, which include 22 major urinary protein (Mup) genes. The Mup genes, plus 29 of 30 Mup-ps pseudogenes, are all located together on chromosome (Chr) 4; evidence points to an “evolutionary bloom” that resulted in this Mup cluster in mouse, syntenic to the human Chr 9q32 locus at which a single MUPP pseudogene is located. LCNs play important roles in physiological processes by binding and transporting small hydrophobic molecules —such as steroid hormones, odorants, retinoids, and lipids—in plasma and other body fluids. LCNs are extensively used in clinical practice as biochemical markers. LCN-like proteins (18–40 kDa) have the characteristic eight β-strands creating a barrel structure that houses the binding-site; LCNs are synthesized in the liver as well as various secretory tissues. In rodents, MUPs are involved in communication of information in urine-derived scent marks, serving as signatures of individual identity, or as kairomones (to elicit fear behavior). MUPs also participate in regulation of glucose and lipid metabolism via a mechanism not well understood. Although much has been learned about LCNs and MUPs in recent years, more research is necessary to allow better understanding of their physiological functions, as well as their involvement in clinical disorders.

52 citations


Journal ArticleDOI
TL;DR: GenTree, an integrated online database that compiles age inferences from three major methods together with functional genomic data for new genes, revealed that the synteny-based pipeline (SBP) is most suited for recently duplicated genes, whereas the protein-family–based methods are useful for ancient genes.
Abstract: The origination of new genes contributes to phenotypic evolution in humans. Two major challenges in the study of new genes are the inference of gene ages and annotation of their protein-coding potential. To tackle these challenges, we created GenTree, an integrated online database that compiles age inferences from three major methods together with functional genomic data for new genes. Genome-wide comparison of the age inference methods revealed that the synteny-based pipeline (SBP) is most suited for recently duplicated genes, whereas the protein-family-based methods are useful for ancient genes. For SBP-dated primate-specific protein-coding genes (PSGs), we performed manual evaluation based on published PSG lists and showed that SBP generated a conservative data set of PSGs by masking less reliable syntenic regions. After assessing the coding potential based on evolutionary constraint and peptide evidence from proteomic data, we curated a list of 254 PSGs with different levels of protein evidence. This list also includes 41 candidate misannotated pseudogenes that encode primate-specific short proteins. Coexpression analysis showed that PSGs are preferentially recruited into organs with rapidly evolving pathways such as spermatogenesis, immune response, mother-fetus interaction, and brain development. For brain development, primate-specific KRAB zinc-finger proteins (KZNFs) are specifically up-regulated in the mid-fetal stage, which may have contributed to the evolution of this critical stage. Altogether, hundreds of PSGs are either recruited to processes under strong selection pressure or to processes supporting an evolving novel organ.

52 citations


Journal ArticleDOI
TL;DR: The first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito are presented, and a workflow that uses machine-learning to predict novel conserved protein-coding regions and efficiently guide their manual curation is developed.
Abstract: The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.

46 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the expression of cancer‐associated KRAB‐ZNFs correlates with patient survival, tumor histology, and molecular subtyping, and by analyzing the clinicopathological data for breast and lung cancers, it is shown that this upregulation is commonly upregulated across multiple cancer cohorts in comparison to normal samples.

42 citations


Journal ArticleDOI
TL;DR: A new method uses only epigenomic patterns to classify the expression potential of annotated genes and identifies pseudogenes that are difficult to classify based solely on sequence and highlights the potential of using chromatin information to improve annotations of functional genes.
Abstract: Accurate annotation of plant genomes remains complex due to the presence of many pseudogenes arising from whole-genome duplication-generated redundancy or the capture and movement of gene fragments by transposable elements. Machine learning on genome-wide epigenetic marks, informed by transcriptomic and proteomic training data, could be used to improve annotations through classification of all putative protein-coding genes as either constitutively silent or able to be expressed. Expressed genes were subclassified as able to express both mRNAs and proteins or only RNAs, and CG gene body methylation was associated only with the former subclass. More than 60,000 protein-coding genes have been annotated in the reference genome of maize inbred B73. About two-thirds of these genes are transcribed and are designated the filtered gene set (FGS). Classification of genes by our trained random forest algorithm was accurate and relied only on histone modifications or DNA methylation patterns within the gene body; promoter methylation was unimportant. Other inbred lines are known to transcribe significantly different sets of genes, indicating that the FGS is specific to B73. We accurately classified the sets of transcribed genes in additional inbred lines, arising from inbred-specific DNA methylation patterns. This approach highlights the potential of using chromatin information to improve annotations of functional genes.

40 citations


Journal ArticleDOI
TL;DR: The authors show that reexpression of RPSAP52 promotes tumorigenicity by facilitating IGF2BP2 binding to its mRNA targets and consequently regulates the balance of LIN28B and let-7 levels.
Abstract: One largely unknown question in cell biology is the discrimination between inconsequential and functional transcriptional events with relevant regulatory functions. Here, we find that the oncofetal HMGA2 gene is aberrantly reexpressed in many tumor types together with its antisense transcribed pseudogene RPSAP52. RPSAP52 is abundantly present in the cytoplasm, where it interacts with the RNA binding protein IGF2BP2/IMP2, facilitating its binding to mRNA targets, promoting their translation by mediating their recruitment on polysomes and enhancing proliferative and self-renewal pathways. Notably, downregulation of RPSAP52 impairs the balance between the oncogene LIN28B and the tumor suppressor let-7 family of miRNAs, inhibits cellular proliferation and migration in vitro and slows down tumor growth in vivo. In addition, high levels of RPSAP52 in patient samples associate with a worse prognosis in sarcomas. Overall, we reveal the roles of a transcribed pseudogene that may display properties of an oncofetal master regulator in human cancers.

38 citations


Journal ArticleDOI
TL;DR: The biogenesis, regulation, and function of C. elegans endo-siRNAs and piRNAs are described, along with recent insights into how these distinct pathways are integrated to collectively regulate germline gene expression, transgenerational epigenetic inheritance, and ultimately, animal fertility.
Abstract: In animals, small noncoding RNAs that are expressed in the germline and transmitted to progeny control gene expression to promote fertility. Germline-expressed small RNAs, including endogenous small interfering RNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs), drive the repression of deleterious transcripts such as transposons, repetitive elements, and pseudogenes. Recent studies have highlighted an important role for small RNAs in transgenerational epigenetic inheritance via regulation of heritable chromatin marks; therefore, small RNAs are thought to convey an epigenetic memory of genomic self and nonself elements. Small RNA pathways are highly conserved in metazoans and have been best described for the model organism Caenorhabditis elegans. In this review, we describe the biogenesis, regulation, and function of C. elegans endo-siRNAs and piRNAs, along with recent insights into how these distinct pathways are integrated to collectively regulate germline gene expression, transgenerational epigenetic inheritance, and ultimately, animal fertility.

Journal ArticleDOI
TL;DR: The authors report that the lncRNA REG1CP forms an RNA–DNA triplex at the promoter of REG3A gene to increase its expression, and suggest that REG1 CP may constitute a target for cancer treatment.
Abstract: Protein products of the regenerating islet-derived (REG) gene family are important regulators of many cellular processes. Here we functionally characterise a non-protein coding product of the family, the long noncoding RNA (lncRNA) REG1CP that is transcribed from a DNA fragment at the family locus previously thought to be a pseudogene. REG1CP forms an RNA-DNA triplex with a homopurine stretch at the distal promoter of the REG3A gene, through which the DNA helicase FANCJ is tethered to the core promoter of REG3A where it unwinds double stranded DNA and facilitates a permissive state for glucocorticoid receptor α (GRα)-mediated REG3A transcription. As such, REG1CP promotes cancer cell proliferation and tumorigenicity and its upregulation is associated with poor outcome of patients. REG1CP is also transcriptionally inducible by GRα, indicative of feedforward regulation. These results reveal the function and regulation of REG1CP and suggest that REG1CP may constitute a target for cancer treatment.

Journal ArticleDOI
TL;DR: There is a tight correlation between amino acid substitution rates inclpP1 and the nuclear-encoded Clp subunits across a broad sampling of angiosperms, suggesting continuing selection on interactions within this complex.
Abstract: Eukaryotic cells represent an intricate collaboration between multiple genomes, even down to the level of multi-subunit complexes in mitochondria and plastids. One such complex in plants is the caseinolytic protease (Clp), which plays an essential role in plastid protein turnover. The proteolytic core of Clp comprises subunits from one plastid-encoded gene (clpP1) and multiple nuclear genes. TheclpP1 gene is highly conserved across most green plants, but it is by far the fastest evolving plastid-encoded gene in some angiosperms. To better understand these extreme and mysterious patterns of divergence, we investigated the history ofclpP1 molecular evolution across green plants by extracting sequences from 988 published plastid genomes. We find thatclpP1 has undergone remarkably frequent bouts of accelerated sequence evolution and architectural changes (e.g. a loss of introns andRNA-editing sites) within seed plants. AlthoughclpP1 is often assumed to be a pseudogene in such cases, multiple lines of evidence suggest that this is rarely true. We applied comparative native gel electrophoresis of chloroplast protein complexes followed by protein mass spectrometry in two species within the angiosperm genusSilene, which has highly elevated and heterogeneous rates ofclpP1 evolution. We confirmed thatclpP1 is expressed as a stable protein and forms oligomeric complexes with the nuclear-encoded Clp subunits, even in one of the most divergentSilene species. Additionally, there is a tight correlation between amino acid substitution rates inclpP1 and the nuclear-encoded Clp subunits across a broad sampling of angiosperms, suggesting continuing selection on interactions within this complex.

Journal ArticleDOI
TL;DR: Five novel pseudogenes capable of predicting survival in LGG patients were identified and their findings provide novel insights into the biological role of pseudogene in L GG.

Journal ArticleDOI
TL;DR: It is shown that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies and provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.
Abstract: Chromosome evolution is an important driver of speciation and species evolution. Previous studies have detected chromosome rearrangement events among different Carnivora species using chromosome painting strategies. However, few of these studies have focused on chromosome evolution at a nucleotide resolution due to the limited availability of chromosome-level Carnivora genomes. Although the de novo genome assembly of the giant panda is available, current short read-based assemblies are limited to moderately sized scaffolds, making the study of chromosome evolution difficult. Here, we present a chromosome-level giant panda draft genome with a total size of 2.29 Gb. Based on the giant panda genome and published chromosome-level dog and cat genomes, we conduct six large-scale pairwise synteny alignments and identify evolutionary breakpoint regions. Interestingly, gene functional enrichment analysis shows that for all of the three Carnivora genomes, some genes located in evolutionary breakpoint regions are significantly enriched in pathways or terms related to sensory perception of smell. In addition, we find that the sweet receptor gene TAS1R2, which has been proven to be a pseudogene in the cat genome, is located in an evolutionary breakpoint region of the giant panda, suggesting that interchromosomal rearrangement may play a role in the cat TAS1R2 pseudogenization. We show that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies. Moreover, our comparative genomics analyses provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.

Journal ArticleDOI
TL;DR: A scoring system was developed that allowed us to reveal novel promising RGs for each examined cancer type and identify several “universal” pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4.
Abstract: Quantitative PCR (qPCR) remains the most widely used technique for gene expression evaluation. Obtaining reliable data using this method requires reference genes (RGs) with stable mRNA level under experimental conditions. This issue is especially crucial in cancer studies because each tumor has a unique molecular portrait. The Cancer Genome Atlas (TCGA) project provides RNA-Seq data for thousands of samples corresponding to dozens of cancers and presents the basis for assessment of the suitability of genes as reference ones for qPCR data normalization. Using TCGA RNA-Seq data and previously developed CrossHub tool, we evaluated mRNA level of 32 traditionally used RGs in 12 cancer types, including those of lung, breast, prostate, kidney, and colon. We developed an 11-component scoring system for the assessment of gene expression stability. Among the 32 genes, PUM1 was one of the most stably expressed in the majority of examined cancers, whereas GAPDH, which is widely used as a RG, showed significant mRNA level alterations in more than a half of cases. For each of 12 cancer types, we suggested a pair of genes that are the most suitable for use as reference ones. These genes are characterized by high expression stability and absence of correlation between their mRNA levels. Next, the scoring system was expanded with several features of a gene: mutation rate, number of transcript isoforms and pseudogenes, participation in cancer-related processes on the basis of Gene Ontology, and mentions in PubMed-indexed articles. All the genes covered by RNA-Seq data in TCGA were analyzed using the expanded scoring system that allowed us to reveal novel promising RGs for each examined cancer type and identify several "universal" pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4. The choice of RGs is the basis for precise gene expression evaluation by qPCR. Here, we suggested optimal pairs of traditionally used RGs for 12 cancer types and identified novel promising RGs that demonstrate high expression stability and other features of reliable and convenient RGs (high expression level, low mutation rate, non-involvement in cancer-related processes, single transcript isoform, and absence of pseudogenes).

Journal ArticleDOI
TL;DR: It is proposed that rapid rewiring of Ψ transcriptional regulatory regions is a major mechanism driving the origin of novel regulatory modules in plant Ψs and their relationships with noncoding sequences.
Abstract: Pseudogenes (Ψs), nonfunctional relatives of functional genes, form by duplication or retrotransposition, and loss of gene function by disabling mutations. Evolutionary analysis provides clues to Ψ origins and effects on gene regulation. However, few systematic studies of plant Ψs have been conducted, hampering comparative analyses. Here, we examined the origin, evolution, and expression patterns of Ψs and their relationships with noncoding sequences in seven angiosperm plants. We identified ∼250,000 Ψs, most of which are more lineage specific than protein-coding genes. The distribution of Ψs on the chromosome indicates that genome recombination may contribute to Ψ elimination. Most Ψs evolve rapidly in terms of sequence and expression levels, showing tissue- or stage-specific expression patterns. We found that a surprisingly large fraction of nontransposable element regulatory noncoding RNAs (microRNAs and long noncoding RNAs) originate from transcription of Ψ proximal upstream regions. We also found that transcription factor binding sites preferentially occur in putative Ψ proximal upstream regions compared with random intergenic regions, suggesting that Ψs have conditioned genome evolution by providing transcription factor binding sites that serve as promoters and enhancers. We therefore propose that rapid rewiring of Ψ transcriptional regulatory regions is a major mechanism driving the origin of novel regulatory modules.

Journal ArticleDOI
TL;DR: Experiments revealed that the increased Hb–O2 affinity requires a specific two-site combination of amino acid replacements, suggesting that the molecular underpinnings of Hb adaptation in Tibetan mastiff may be qualitatively distinct from functionally similar changes in protein function that could have evolved via sequential fixation of de novo mutations during the breed’s relatively short duration of residency at high altitude.
Abstract: A key question in evolutionary biology concerns the relative importance of different sources of adaptive genetic variation, such as de novo mutations, standing variation, and introgressive hybridization. A corollary question concerns how allelic variants derived from these different sources may influence the molecular basis of phenotypic adaptation. Here, we use a protein-engineering approach to examine the phenotypic effect of putatively adaptive hemoglobin (Hb) mutations in the high-altitude Tibetan wolf that were selectively introgressed into the Tibetan mastiff, a high-altitude dog breed that is renowned for its hypoxia tolerance. Experiments revealed that the introgressed coding variants confer an increased Hb-O2 affinity in conjunction with an enhanced Bohr effect. We also document that affinity-enhancing mutations in the β-globin gene of Tibetan wolf were originally derived via interparalog gene conversion from a tandemly linked β-globin pseudogene. Thus, affinity-enhancing mutations were introduced into the β-globin gene of Tibetan wolf via one form of intragenomic lateral transfer (ectopic gene conversion) and were subsequently introduced into the Tibetan mastiff genome via a second form of lateral transfer (introgression). Site-directed mutagenesis experiments revealed that the increased Hb-O2 affinity requires a specific two-site combination of amino acid replacements, suggesting that the molecular underpinnings of Hb adaptation in Tibetan mastiff (involving mutations that arose in a nonexpressed gene and which originally fixed in Tibetan wolf) may be qualitatively distinct from functionally similar changes in protein function that could have evolved via sequential fixation of de novo mutations during the breed's relatively short duration of residency at high altitude.

Journal ArticleDOI
13 Aug 2019
TL;DR: Upregulated D UXAP8 and DUXAP9 promote growth of renal cell carcinoma and serve as two promising prognostic biomarkers.
Abstract: Background Growing studies have reported that pseudogenes play key roles in multiple human cancers. However, expression and roles of pseudogenes in renal cell carcinoma remains absent. Results 31 upregulated and 16 downregulated pseudogenes were screened. Higher expression of DUXAP8 and DUXAP9 indicated poorer prognosis of kidney cancer. 33 and 5 miRNAs were predicted to potentially binding to DUXAP8 and DUXAP9, respectively. miR-29c-3p was identified as the most potential binding miRNAs of DUXAP8 and DUXAP9 based on expression, survival and correlation analyses. 254 target genes of miR-29c-3p were forecast. 47 hub genes with node degree >= 10 were identified. Subsequent analysis for the top 10 hub genes demonstrated that COL1A1 and COL1A2 may be two functional targets of DUXAP8 and DUXAP9. Expression of DUXAP8, DUXAP9, COL1A1 and COL1A2 were significantly increased in cancer samples compared to normal controls while miR-29c-3p expression was decreased. Luciferase reporter assay revealed that miR-29c-3p could directly bind to DUXAP8, DUXAP9, COL1A1 and COL1A2. Functional experiments showed that DUXAP8 and DUXAP9 enhanced but miR-29c-3p weakened growth of renal cell carcinoma. Conclusions In conclusion, upregulated DUXAP8 and DUXAP9 promote growth of renal cell carcinoma and serve as two promising prognostic biomarkers. Methods Dysregulated pseudogenes were obtained by dreamBase and GEPIA. The binding miRNAs of pseudogene and targets of miRNA were predicted using starBase and miRNet. Kaplan-Meier plotter was utilized to perform survival analysis, and Enrichr database was introduced to conduct functional enrichment analysis. Hub genes were identified through STRING and Cytoscape. qRT-PCR, luciferase reporter assay, cell counting assay and colony formation assay were performed to validate in silico analytic results.

Journal ArticleDOI
04 Apr 2019
TL;DR: This study sequenced and analyzed the complete chloroplast genome of Rhodomyrtus tomentosa, providing valuable information for further investigations on species identification and the phylogenetic evolution between R. toMENTosa and related species.
Abstract: In the last decade, several studies have relied on a small number of plastid genomes to deduce deep phylogenetic relationships in the species-rich Myrtaceae. Nevertheless, the plastome of Rhodomyrtus tomentosa, an important representative plant of the Rhodomyrtus (DC.) genera, has not yet been reported yet. Here, we sequenced and analyzed the complete chloroplast (CP) genome of R. tomentosa, which is a 156,129-bp-long circular molecule with 37.1% GC content. This CP genome displays a typical quadripartite structure with two inverted repeats (IRa and IRb), of 25,824 bp each, that are separated by a small single copy region (SSC, 18,183 bp) and one large single copy region (LSC, 86,298 bp). The CP genome encodes 129 genes, including 84 protein-coding genes, 37 tRNA genes, eight rRNA genes and three pseudogenes (ycf1, rps19, ndhF). A considerable number of protein-coding genes have a universal ATG start codon, except for psbL and ndhD. Premature termination codons (PTCs) were found in one protein-coding gene, namely atpE, which is rarely reported in the CP genome of plants. Phylogenetic analysis revealed that R. tomentosa has a sister relationship with Eugenia uniflora and Psidium guajava. In conclusion, this study identified unique characteristics of the R. tomentosa CP genome providing valuable information for further investigations on species identification and the phylogenetic evolution between R. tomentosa and related species.

Journal ArticleDOI
27 Aug 2019-Mbio
TL;DR: A two-pronged approach to define the basis for O-antigen structural diversity was pursued, revealing that gene duplication, pseudogene formation, gene deletion, and bacteriophage insertion elements occur ubiquitously across serogroups.
Abstract: O-antigens are glycopolymers in lipopolysaccharides expressed on the cell surface of Gram-negative bacteria. Variability in the O-antigen structure constitutes the basis for the establishment of the serotyping schema. We pursued a two-pronged approach to define the basis for O-antigen structural diversity. First, we developed a bottom-up systems biology approach to O-antigen metabolism by building a reconstruction of Salmonella O-antigen biosynthesis and used it to (i) update 410 existing Salmonella strain-specific metabolic models, (ii) predict a strain's serogroup and its O-antigen glycan synthesis capability (yielding 98% agreement with experimental data), and (iii) extend our workflow to more than 1,400 Gram-negative strains. Second, we used a top-down pangenome analysis to elucidate the genetic basis for intraserogroup O-antigen structural variations. We assembled a database of O-antigen gene islands from over 11,000 sequenced Salmonella strains, revealing (i) that gene duplication, pseudogene formation, gene deletion, and bacteriophage insertion elements occur ubiquitously across serogroups; (ii) novel serotypes in the group O:4 B2 variant, as well as an additional genotype variant for group O:4, and (iii) two novel O-antigen gene islands in understudied subspecies. We thus comprehensively defined the genetic basis for O-antigen diversity.IMPORTANCE Lipopolysaccharides are a major component of the outer membrane in Gram-negative bacteria. They are composed of a conserved lipid structure that is embedded in the outer leaflet of the outer membrane and a polysaccharide known as the O-antigen. O-antigens are highly variable in structure across strains of a species and are crucial to a bacterium's interactions with its environment. They constitute the first line of defense against both the immune system and bacteriophage infections and have been shown to mediate antimicrobial resistance. The significance of our research is in identifying the metabolic and genetic differences within and across O-antigen groups in Salmonella strains. Our effort constitutes a first step toward characterizing the O-antigen metabolic network across Gram-negative organisms and a comprehensive overview of genetic variations in Salmonella.

Journal ArticleDOI
TL;DR: The hypothesis that small mitochondrial RNAs are primarily transcribed by the mitochondrial genome and that this capacity is conserved across Amniota and, most likely, across most metazoan lineages is supported.
Abstract: Several studies have linked mitochondrial genetic variation to phenotypic modifications; albeit the identity of the mitochondrial polymorphisms involved remains elusive. The search for these polymorphisms led to the discovery of small noncoding RNAs, which appear to be transcribed by the mitochondrial DNA ("small mitochondrial RNAs"). This contention is, however, controversial because the nuclear genome of most animals harbors mitochondrial pseudogenes (NUMTs) of identical sequence to regions of mtDNA, which could alternatively represent the source of these RNAs. To discern the likely contributions of the mitochondrial and nuclear genome to transcribing these small mitochondrial RNAs, we leverage data from six vertebrate species exhibiting markedly different levels of NUMT sequence. We explore whether abundances of small mitochondrial RNAs are associated with levels of NUMT sequence across species, or differences in tissue-specific mtDNA content within species. Evidence for the former would support the hypothesis these RNAs are primarily transcribed by NUMT sequence, whereas evidence for the latter would provide strong evidence for the counter hypothesis that these RNAs are transcribed directly by the mtDNA. No association exists between the abundance of small mitochondrial RNAs and NUMT levels across species. Moreover, a sizable proportion of transcripts map exclusively to the mtDNA sequence, even in species with highest NUMT levels. Conversely, tissue-specific abundances of small mitochondrial RNAs are strongly associated with the mtDNA content. These results support the hypothesis that small mitochondrial RNAs are primarily transcribed by the mitochondrial genome and that this capacity is conserved across Amniota and, most likely, across most metazoan lineages.

Journal ArticleDOI
TL;DR: The complete sequencing and the comparative genome analysis show that M. leprae underwent a genome reductive evolution process, as result of lifestyle change and adaptation to different environments; some of lost genes are homologous to those of host cells.

Journal ArticleDOI
TL;DR: This work provided a first look into population genomics of the PLY phytoplasma in Taiwan, as well as identified several evolutionary processes that contributed to the genetic diversification of these plant-pathogenic bacteria.
Abstract: The periwinkle leaf yellowing (PLY) disease was first reported in Taiwan in 2005. This disease was caused by an uncultivated bacterium in the genus ‘Candidatus phytoplasma’. In subsequent years, this bacterium was linked to other plant diseases and caused losses in agriculture. For genomic investigation of this bacterium and its relatives, we conducted whole genome sequencing of a PLY phytoplasma from an infected periwinkle collected in Taoyuan. The de novo genome assembly produced eight contigs with a total length of 824,596 bp. The annotation contains 775 protein-coding genes, 63 pseudogenes, 32 tRNA genes, and two sets of rRNA operons. To characterize the genomic diversity across populations, a second strain that infects green onions in Yilan was collected for re-sequencing analysis. Comparison between these two strains identified 337 sequence polymorphisms and 10 structural variations. The metabolic pathway analysis indicated that the PLY phytoplasma genome contains two regions with highly conserved gene composition for carbohydrate metabolism. Intriguingly, each region contains several pseudogenes and the remaining functional genes in these two regions complement each other, suggesting a case of duplication followed by differential gene losses. Comparative analysis with other available phytoplasma genomes indicated that this PLY phytoplasma belongs to the 16SrI-B subgroup in the genus, with ‘Candidatus Phytoplasma asteris’ that causes the onion yellowing (OY) disease in Japan as the closest known relative. For characterized effectors that these bacteria use to manipulate their plant hosts, the PLY phytoplasma has homologs for SAP11, SAP54/PHYL1, and TENGU. For genome structure comparison, we found that potential mobile unit (PMU) insertions may be the main factor that drives genome rearrangements in these bacteria. A total of 10 PMU-like regions were found in the PLY phytoplasma genome. Two of these PMUs were found to harbor one SAP11 homolog each, with one more similar to the 16SrI-B type and the other more similar to the 16SrI-A type, suggesting possible horizontal transfer. Taken together, this work provided a first look into population genomics of the PLY phytoplasmas in Taiwan, as well as identified several evolutionary processes that contributed to the genetic diversification of these plant-pathogenic bacteria.

Journal ArticleDOI
TL;DR: Both MM 4 and MM 190 strains are capable of hemolysis and their activity correlates well with a cytotoxicity level on T-24 bladder carcinoma cells, and it is determined that all strains contained urease gene cluster ureABCEFGD and had a Urease activity.
Abstract: Morganella morganii is an opportunistic bacterial pathogen shown to cause a wide range of clinical and community-acquired infections. This study was aimed at sequencing and comparing the genomes of three M. morganii strains isolated from the urine samples of patients with community-acquired urinary tract infections. Draft genome sequencing was conducted using the Illumina HiSeq platform. The genomes of MM 1, MM 4, and MM 190 strains have a size of 3.82-3.97 Mb and a GC content of 50.9-51%. Protein-coding sequences (CDS) represent 96.1% of the genomes, RNAs are encoded by 2.7% of genes and pseudogenes account for 1.2% of the genomes. The pan-genome containes 4,038 CDS, of which 3,279 represent core genes. Six to ten prophages and 21-33 genomic islands were identified in the genomes of MM 1, MM 4, and MM 190. More than 30 genes encode capsular biosynthesis proteins, an average of 60 genes encode motility and chemotaxis proteins, and about 70 genes are associated with fimbrial biogenesis and adhesion. We determined that all strains contained urease gene cluster ureABCEFGD and had a urease activity. Both MM 4 and MM 190 strains are capable of hemolysis and their activity correlates well with a cytotoxicity level on T-24 bladder carcinoma cells. These activities were associated with expression of RTX toxin gene hlyA, which was introduced into the genomes by a phage similar to Salmonella phage 118970_sal4.

Journal ArticleDOI
01 Jan 2019
TL;DR: In this article, a linkage map was constructed using markers from nonduplicated Region01 and for the duplication (Region01 and Region02), demonstrating the possibility of mapping markers located in duplicated regions with markers in non-plased regions.
Abstract: Sugarcane (Saccharum spp.) is highly polyploid and aneuploid. Modern cultivars are derived from hybridization between S. officinarum and S. spontaneum. This combination results in a genome exhibiting variable ploidy among different loci, a huge genome size (similar to 10 Gb) and a high content of repetitive regions. An approach using genomic, transcriptomic, and genetic mapping can improve our knowledge of the behavior of genetics in sugarcane. The hypothetical HP600 and Centromere Protein C (CENP-C) genes from sugarcane were used to elucidate the allelic expression and genomic and genetic behaviors of this complex polyploid. The physically linked side-by-side genes HP600 and CENP-C were found in two different homeologous chromosome groups with ploidies of eight and ten. The first region (Region01) was a Sorghum bicolor ortholog region with all haplotypes of HP600 and CENP-C expressed, but HP600 exhibited an unbalanced haplotype expression. The second region (Region02) was a scrambled sugarcane sequence formed from different noncollinear genes containing partial duplications of HP600 and CENP-C (paralogs). This duplication resulted in a non-expressed HP600 pseudogene and a recombined fusion version of CENP-C and the orthologous gene Sobic. 003G299500 with at least two chimeric gene haplotypes expressed. It was also determined that it occurred before Saccharum genus formation and after the separation of sorghum and sugarcane. A linkage map was constructed using markers from nonduplicated Region01 and for the duplication (Region01 and Region02). We compare the physical and linkage maps, demonstrating the possibility of mapping markers located in duplicated regions with markers in nonduplicated region. Our results contribute directly to the improvement of linkage mapping in complex polyploids and improve the integration of physical and genetic data for sugarcane breeding programs. Thus, we describe the complexity involved in sugarcane genetics and genomics and allelic dynamics, which can be useful for understanding complex polyploid genomes.

Journal ArticleDOI
TL;DR: To the authors' knowledge this is the first apicoplast genome sequenced from any adeleorinid coccidium and the first mitochondrion-associated sequences from this serious pathogen of wild and domestic canids.

Journal ArticleDOI
25 Feb 2019-Genes
TL;DR: A whole-genome analysis of the Chinese perch OR repertoire identified a total of 152 OR genes, including 123 functional genes and 29 pseudogenes, and showed their genomic organization, and the phylogenetic relationships of teleosts ORs was illustrated.
Abstract: Olfaction, which is mediated by olfactory receptor (OR) genes, is essential in the daily life of fish, especially in foraging. However, Chinese perch (Siniperca chuatsi) is believed to prey with reliance on vision and lateral sensation, but not on olfaction. Therefore, understanding the evolutionary dynamics of the Chinese perch OR repertoire could provide insights into genetic evidence for adapting to a decreasing reliance on olfaction. Here, we reported a whole-genome analysis of the Chinese perch OR repertoire. Our analysis identified a total of 152 OR genes, including 123 functional genes and 29 pseudogenes, and showed their genomic organization. A phylogenetic tree was constructed, and the phylogenetic relationships of teleosts ORs was illustrated. The dN/dS (global ratios of non-synonymous to synonymous) analysis demonstrated that OR groups all appeared to be under purifying selection. Among the five Percomorpha fishes, Chinese perch only had 22 subfamilies, suggesting a decrease in OR diversities. The species-specific loss of subfamily 56 and 66 in Chinese perch, of which the genes belonged to subfamily 66, were orthologs of OR51E2, which recognized the plant odorant β-ionone, indicating that extremely piscivorous fish which might lose those receptors responded to plant-related odors. Finally, the expression profiles of OR genes in the olfactory epithelium at different developmental stages were investigated using RNA-seq data. From the aforementioned results, the evolution of the OR repertoire may be shaped by the adaption of vision-dependent specializations for foraging in Chinese perch. The first systematic study of OR genes in Chinese perch could provide valuable genomic resources for the further investigation of olfactory function in teleosts.

Journal ArticleDOI
TL;DR: It is proposed that the high frequency of pseudogenization leading to gene loss in P. anserina and P. comata accompanies specialization of these two fungi, likely two different species that rarely interbreed in nature.
Abstract: Mechanisms involved in fine adaptation of fungi to their environment include differential gene regulation associated with single nucleotide polymorphisms and indels (including transposons), horizontal gene transfer, gene copy amplification, as well as pseudogenization and gene loss. The two Podospora genome sequences examined here emphasize the role of pseudogenization and gene loss, which have rarely been documented in fungi. Podospora comata is a species closely related to Podospora anserina, a fungus used as model in several laboratories. Comparison of the genome of P. comata with that of P. anserina, whose genome is available for over 10 years, should yield interesting data related to the modalities of genome evolution between these two closely related fungal species that thrive in the same types of biotopes, i.e., herbivore dung. Here, we present the genome sequence of the mat + isolate of the P. comata reference strain T. Comparison with the genome of the mat + isolate of P. anserina strain S confirms that P. anserina and P. comata are likely two different species that rarely interbreed in nature. Despite having a 94–99% of nucleotide identity in the syntenic regions of their genomes, the two species differ by nearly 10% of their gene contents. Comparison of the species-specific gene sets uncovered genes that could be responsible for the known physiological differences between the two species. Finally, we identified 428 and 811 pseudogenes (3.8 and 7.2% of the genes) in P. anserina and P. comata, respectively. Presence of high numbers of pseudogenes supports the notion that difference in gene contents is due to gene loss rather than horizontal gene transfers. We propose that the high frequency of pseudogenization leading to gene loss in P. anserina and P. comata accompanies specialization of these two fungi. Gene loss may be more prevalent during the evolution of other fungi than usually thought.

Journal ArticleDOI
TL;DR: This study quantitatively analyzed human genome-wide RNA editing events derived from tumor or normal tissues and identified hyper RNA edited genes (protein-coding genes, lincRNAs, and pseudogenes) that embody a large portion of cancer prognostic predictors.
Abstract: RNA editing is phenomenon that occurs in both protein coding and non-coding RNAs. Increasing evidence have shown that adenosine-to-inosine RNA editing can potentially rendering substantial functional effects throughout the genome. Using RNA editing datasets from two large consortiums: The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) project, we quantitatively analyzed human genome-wide RNA editing events derived from tumor or normal tissues. Generally, a common RNA editing site tends to have a higher editing level in tumors as compared to normal samples. Of the 14 tumor-normal-paired cancer types examined, Eleven of the 14 cancers tested had overall increased RNA editing levels in the tumors. The editomes in cancer or normal tissues were dissected by genomic locations, and significant RNA editing locational difference was found between cancerous and healthy subjects. Additionally, our results indicated a significant correlation between the RNA editing rate and the gene density across chromosomes, highlighted hyper RNA editing clusters through visualization of running RNA editing rates along chromosomes, and identified hyper RNA edited genes (protein-coding genes, lincRNAs, and pseudogenes) that embody a large portion of cancer prognostic predictors. This study reinforces the potential functional effects of RNA editing in protein-coding genes, and also makes a strong foundation for further exploration of RNA editing's roles in non-coding regions.