scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2014"


Journal ArticleDOI
18 Jul 2014-Science
TL;DR: The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination and high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption.
Abstract: We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits.

522 citations


Journal ArticleDOI
TL;DR: The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest.
Abstract: Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens. We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation. This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.

239 citations


Journal ArticleDOI
TL;DR: The excitement and subsequent backlash of the ENCODE claims served to illustrate the widespread interest among scientists and nonspecialists in determining how much of the human genome is functionally significant at the organism level.
Abstract: The search for function in the genome It has been known for several decades that only a small fraction of the human genome is made up of protein-coding sequences and that at least some noncoding DNA has important biological functions. In addition to coding exons, the genome contains sequences that are transcribed into functional RNA molecules (e.g., tRNA, rRNA, and snRNA), regulatory regions that control gene expression (e.g., promoters, silencers, and enhancers), origins of replication, and repeats that play structural roles at the chromosomal level (e.g., telomeres and centromeres). New discoveries regarding potentially important sequences amongst the nonprotein-coding majority of the genome are becoming more prevalent. By far the best-known effort to identify functional regions in the human genome is the recently completed Encyclopaedia of DNA Elements (ENCODE) project [1], whose authors made the remarkable claim that a “biochemical function” could be assigned to 80% of the human genome [2]. Reports that ENCODE had refuted the existence of large amounts of junk DNA in the human genome received considerable media attention [3], [4]. Criticisms that these claims were based on an extremely loose definition of “function” soon followed [5]–[8] (for a discussion of the relevant function concepts, see [9]), and debate continues regarding the most appropriate interpretation of the ENCODE results. Nevertheless, the excitement and subsequent backlash served to illustrate the widespread interest among scientists and nonspecialists in determining how much of the human genome is functionally significant at the organism level.

227 citations


Journal ArticleDOI
TL;DR: Deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients characterized the CLL transcriptional landscape with unprecedented resolution, and found genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL.
Abstract: Chronic lymphocytic leukemia (CLL) has heterogeneous clinical and biological behavior. Whole-genome and -exome sequencing has contributed to the characterization of the mutational spectrum of the disease, but the underlying transcriptional profile is still poorly understood. We have performed deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients, and characterized the CLL transcriptional landscape with unprecedented resolution. We detected thousands of transcriptional elements differentially expressed between the CLL and normal B cells, including protein-coding genes, noncoding RNAs, and pseudogenes. Transposable elements are globally derepressed in CLL cells. In addition, two thousand genes—most of which are not differentially expressed—exhibit CLL-specific splicing patterns. Genes involved in metabolic pathways showed higher expression in CLL, while genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL. Clustering of the CLL samples according to RNA-seq derived gene expression levels unveiled two robust molecular subgroups, C1 and C2. C1/C2 subgroups and the mutational status of the immunoglobulin heavy variable (IGHV) region were the only independent variables in predicting time to treatment in a multivariate analysis with main clinico-biological features. This subdivision was validated in an independent cohort of patients monitored through DNA microarrays. Further analysis shows that B-cell receptor (BCR) activation in the microenvironment of the lymph node may be at the origin of the C1/C2 differences.

188 citations


Journal ArticleDOI
TL;DR: Structural, functional, and evolutionary analyses indicate that SOPE has undergone extensive adaptation toward an insect-associated lifestyle in a very short time period, and analyses of the bacterial cell envelope and genes encoding secretion systems suggest that these structures and elements have become simplified in the transition to a mutualistic association.
Abstract: Symbiotic associations between animals and microbes are ubiquitous in nature, with an estimated 15% of all insect species harboring intracellular bacterial symbionts. Most bacterial symbionts share many genomic features including small genomes, nucleotide composition bias, high coding density, and a paucity of mobile DNA, consistent with long-term host association. In this study, we focus on the early stages of genome degeneration in a recently derived insect-bacterial mutualistic intracellular association. We present the complete genome sequence and annotation of Sitophilus oryzae primary endosymbiont (SOPE). We also present the finished genome sequence and annotation of strain HS, a close free-living relative of SOPE and other insect symbionts of the Sodalis-allied clade, whose gene inventory is expected to closely resemble the putative ancestor of this group. Structural, functional, and evolutionary analyses indicate that SOPE has undergone extensive adaptation toward an insect-associated lifestyle in a very short time period. The genome of SOPE is large in size when compared with many ancient bacterial symbionts; however, almost half of the protein-coding genes in SOPE are pseudogenes. There is also evidence for relaxed selection on the remaining intact protein-coding genes. Comparative analyses of the whole-genome sequence of strain HS and SOPE highlight numerous genomic rearrangements, duplications, and deletions facilitated by a recent expansion of insertions sequence elements, some of which appear to have catalyzed adaptive changes. Functional metabolic predictions suggest that SOPE has lost the ability to synthesize several essential amino acids and vitamins. Analyses of the bacterial cell envelope and genes encoding secretion systems suggest that these structures and elements have become simplified in the transition to a mutualistic association.

167 citations


Journal ArticleDOI
23 Jan 2014-Blood
TL;DR: This study identifies multiple lncRNAs that are dynamically expressed during erythropoiesis, show epigenetic regulation, and are targeted by key erythroid transcription factors GATA1, TAL1, or KLF1 and focuses on 12 candidates, finding that they are nuclear-localized and exhibit complex developmental expression patterns.

162 citations


Journal ArticleDOI
19 Feb 2014-eLife
TL;DR: Palmieri et al. as discussed by the authors investigated the evolutionary fate of orphan genes in a small group of related species of fruit fly and found that most orphan genes are very short-lived, even though they showed clear signals of carrying out important functions.
Abstract: New genes are added to most genomes on a steady basis. A new gene can either begin as a copy of an existing gene from elsewhere in the genome, or is created entirely ‘from scratch’ from a DNA sequence that had not previously encoded for a protein. New genes that are not found in other related species are called orphan genes—and these genes can account for up to 30% of all the genes in the well-studied genomes. However, for reasons that are not fully understood, the total number of genes in most genomes remains fairly constant despite these regular additions. Now, Palmieri et al. have investigated this paradox by following the evolutionary fate of orphan genes in a small group of related species of fruit fly. Palmieri et al. discovered that most orphan genes are very short-lived, even though they showed clear signals of carrying out important functions. Most orphan genes died out quickly due to mutations that made them unable to be expressed as functional proteins, and a small number were deleted entirely from the genome. Unexpectedly, new orphan genes were more likely to die out than those that had been around for a while. Palmieri et al. also found that the expression levels of orphan genes determined their probability of dying with those genes that were expressed to the highest levels being most likely to persist longer. Furthermore, genes that were expressed more in males than in females were also less likely to die. The next challenge will be to identify the mechanisms that determine which orphan genes survive and which do not.

157 citations


Journal ArticleDOI
07 May 2014-Wormbook
TL;DR: What is currently known about these endogenous small interfering RNAs and piwi-interacting RNAs is reviewed, providing an overview of their biogenesis, their associated protein factors, and their effects on mRNA dynamics and chromatin structure.
Abstract: In addition to several hundred microRNAs, C. elegans produces thousands of other small RNAs targeting coding genes, pseudogenes, transposons, and other noncoding RNAs. Here we review what is currently known about these endogenous small interfering RNAs (siRNAs) and piwi-interacting RNAs (piRNAs), providing an overview of their biogenesis, their associated protein factors, and their effects on mRNA dynamics and chromatin structure. Additionally, we describe how the molecular actions of these classes of endogenous small RNAs connect to their physiological roles in the organism.

133 citations


Journal ArticleDOI
TL;DR: It is found that OR and VR genes are neither equally nor randomly expressed, but have reproducible distributions of abundance in both tissues, and evidence that hundreds of novel, putatively protein-coding genes are expressed in these highly specialized olfactory tissues is presented.
Abstract: The olfactory (OR) and vomeronasal receptor (VR) repertoires are collectively encoded by 1700 genes and pseudogenes in the mouse genome Most OR and VR genes were identified by comparative genomic techniques and therefore, in many of those cases, only their protein coding sequences are defined Some also lack experimental support, due in part to the similarity between them and their monogenic, cell-specific expression in olfactory tissues Here we use deep RNA sequencing, expression microarray and quantitative RT-PCR in both the vomeronasal organ and whole olfactory mucosa to quantify their full transcriptomes in multiple male and female mice We find evidence of expression for all VR, and almost all OR genes that are annotated as functional in the reference genome, and use the data to generate over 1100 new, multi-exonic, significantly extended receptor gene annotations We find that OR and VR genes are neither equally nor randomly expressed, but have reproducible distributions of abundance in both tissues The olfactory transcriptomes are only minimally different between males and females, suggesting altered gene expression at the periphery is unlikely to underpin the striking sexual dimorphism in olfactory-mediated behavior Finally, we present evidence that hundreds of novel, putatively protein-coding genes are expressed in these highly specialized olfactory tissues, and carry out a proof-of-principle validation Taken together, these data provide a comprehensive, quantitative catalog of the genes that mediate olfactory perception and pheromone-evoked behavior at the periphery

129 citations


Journal ArticleDOI
TL;DR: Across cancer types, the tumor subtypes revealed by pseudogene expression show extensive and strong concordance with the subtypes defined by other molecular data, and in kidney cancer, the pseudogene-expression subtypes not only significantly correlate with patient survival, but also help stratify patients in combination with clinical variables.
Abstract: Although individual pseudogenes have been implicated in tumour biology, the biomedical significance and clinical relevance of pseudogene expression have not been assessed in a systematic way. Here we generate pseudogene expression profiles in 2,808 patient samples of seven cancer types from The Cancer Genome Atlas RNA-seq data using a newly developed computational pipeline. Supervised analysis reveals a significant number of pseudogenes differentially expressed among established tumour subtypes and pseudogene expression alone can accurately classify the major histological subtypes of endometrial cancer. Across cancer types, the tumour subtypes revealed by pseudogene expression show extensive and strong concordance with the subtypes defined by other molecular data. Strikingly, in kidney cancer, the pseudogene expression subtypes not only significantly correlate with patient survival, but also help stratify patients in combination with clinical variables. Our study highlights the potential of pseudogene expression analysis as a new paradigm for investigating cancer mechanisms and discovering prognostic biomarkers.

129 citations


Journal ArticleDOI
TL;DR: The first seedling or all-stage resistance (R) R gene against stripe rust isolated from Moro wheat using a map-based cloning approach was identified as Yr10, which encodes a highly evolutionary-conserved and unique CC-NBS-LRR sequence.

Journal ArticleDOI
Mathew W. Wright1
TL;DR: A short guide is presented herein to help authors when developing novel gene symbols for lncRNAs with characterised function, and to provide unique and, wherever possible, meaningful gene symbols to all lncRNA genes.
Abstract: The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 38,000 approved gene symbols in our database (http://www.genenames.org), the majority represent protein-coding (pc) genes; however, we also name pseudogenes, phenotypic loci, some genomic features, and to date have named more than 8,500 human non-protein coding RNA (ncRNA) genes and ncRNA pseudogenes. We have already established unique names for most of the small ncRNA genes by working with experts for each class. Small ncRNAs can be defined into their respective classes by their shared homology and common function. In contrast, long non-coding RNA (lncRNA) genes represent a disparate set of loci related only by their size, more than 200 bases in length, share no conserved sequence homology, and have variable functions. As with pc genes, wherever possible, lncRNAs are named based on the known function of their product; a short guide is presented herein to help authors when developing novel gene symbols for lncRNAs with characterised function. Researchers must contact the HGNC with their suggestions prior to publication, to check whether the proposed gene symbol can be approved. Although thousands of lncRNAs have been predicted in the human genome, for the vast majority their function remains unresolved. lncRNA genes with no known function are named based on their genomic context. Working with lncRNA researchers, the HGNC aims to provide unique and, wherever possible, meaningful gene symbols to all lncRNA genes.

Journal ArticleDOI
TL;DR: The E. turgida spheroid body (EtSB) genome was found to possess a gene set for nitrogen fixation, as anticipated, but is reduced in size and gene repertoire compared with the genomes of their closest known free-living relatives.
Abstract: The evolution of mitochondria and plastids from bacterial endosymbionts were key events in the origin and diversification of eukaryotic cells. Although the ancient nature of these organelles makes it difficult to understand the earliest events that led to their establishment, the study of eukaryotic cells with recently evolved obligate endosymbiotic bacteria has the potential to provide important insight into the transformation of endosymbionts into organelles. Diatoms belonging to the family Rhopalodiaceae and their endosymbionts of cyanobacterial origin (i.e., “spheroid bodies”) are emerging as a useful model system in this regard. The spheroid bodies, which appear to enable rhopalodiacean diatoms to use gaseous nitrogen, became established after the divergence of extant diatom families. Here we report what is, to our knowledge, the first complete genome sequence of a spheroid body, that of the rhopalodiacean diatom Epithemia turgida. The E. turgida spheroid body (EtSB) genome was found to possess a gene set for nitrogen fixation, as anticipated, but is reduced in size and gene repertoire compared with the genomes of their closest known free-living relatives. The presence of numerous pseudogenes in the EtSB genome suggests that genome reduction is ongoing. Most strikingly, our genomic data convincingly show that the EtSB has lost photosynthetic ability and is metabolically dependent on its host cell, unprecedented characteristics among cyanobacteria, and cyanobacterial symbionts. The diatom–spheroid body endosymbiosis is thus a unique system for investigating the processes underlying the integration of a bacterial endosymbiont into eukaryotic cells.

Journal ArticleDOI
TL;DR: Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease.
Abstract: The organization of the genome into functional units, such as enhancers and active or repressed promoters, is associated with distinct patterns of DNA and histone modifications. The Encyclopedia of DNA Elements (ENCODE) project has advanced our understanding of the principles of genome, epigenome and chromatin organization, identifying hundreds of thousands of potential regulatory regions and transcription factor binding sites. Part of the ENCODE consortium, GENCODE, has annotated the human genome with novel transcripts including new noncoding RNAs and pseudogenes, highlighting transcriptional complexity. Many disease variants identified in genome-wide association studies are located within putative enhancer regions defined by the ENCODE project. Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease.

Journal ArticleDOI
TL;DR: It is found that ectopic expression of T USC2P and the TUSC2 3′-UTR inhibits cell proliferation, survival, migration, invasion and colony formation, and increases tumour cell death.
Abstract: Various non-coding regions of the genome, once presumed to be 'junk' DNA, have recently been found to be transcriptionally active. In particular, pseudogenes are now known to have important biological roles. Here we report that transcripts of the two tumour suppressor candidate-2 pseudogenes (TUSC2P), found on chromosomes X and Y, are homologous to the 3'-UTR of their corresponding protein coding transcript, TUSC2. TUSC2P and the TUSC2 3'-UTR share many common miRNA-binding sites, including miR-17, miR-93, miR-299-3p, miR-520a, miR-608 and miR-661. We find that ectopic expression of TUSC2P and the TUSC2 3'-UTR inhibits cell proliferation, survival, migration, invasion and colony formation, and increases tumour cell death. By interacting with endogenous miRNAs, TUSC2P and TUSC2 3'-UTR arrest the functions of these miRNAs, resulting in increased translation of TUSC2. The TUSC2P and TUSC2 3'-UTR could thus be used as combinatorial miRNA inhibitors and might have clinical applications.

Journal ArticleDOI
TL;DR: Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications, consensus TTTTAA sites at insertion points, inverted rearrangements and polyA tails, and transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes.
Abstract: Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.

Journal ArticleDOI
TL;DR: The identification and characterization of two HMGA1 non-coding pseudogenes are reported, which show that their overexpression increases the levels ofHMGA1 and other cancer-related proteins by inhibiting the suppression of their synthesis mediated by microRNAs.
Abstract: The High Mobility Group A (HMGA) are nuclear proteins that participate in the organization of nucleoprotein complexes involved in chromatin structure, replication and gene transcription. HMGA overexpression is a feature of human cancer and plays a causal role in cell transformation. Since non-coding RNAs and pseudogenes are now recognized to be important in physiology and disease, we investigated HMGA1 pseudogenes in cancer settings using bioinformatics analysis. Here we report the identification and characterization of two HMGA1 non-coding pseudogenes, HMGA1P6 and HMGA1P7. We show that their overexpression increases the levels of HMGA1 and other cancer-related proteins by inhibiting the suppression of their synthesis mediated by microRNAs. Consistently, embryonic fibroblasts from HMGA1P7-overexpressing transgenic mice displayed a higher growth rate and reduced susceptibility to senescence. Moreover, HMGA1P6 and HMGA1P7 were overexpressed in human anaplastic thyroid carcinomas, which are highly aggressive, but not in differentiated papillary carcinomas, which are less aggressive. Lastly, the expression of the HMGA1 pseudogenes was significantly correlated with HMGA1 protein levels thereby implicating HMGA1P overexpression in cancer progression. In conclusion, HMGA1P6 and HMGA1P7 are potential proto-oncogenic competitive endogenous RNAs.

Journal ArticleDOI
TL;DR: This study is the first complete genome analysis of gastropod endosymbiont and offers an opportunity to study genome evolution in a recently evolved endosYmbionts.
Abstract: Deep-sea vents harbor dense populations of various animals that have their specific symbiotic bacteria. Scaly-foot gastropods, which are snails with mineralized scales covering the sides of its foot, have a gammaproteobacterial endosymbiont in their enlarged esophageal glands and diverse epibionts on the surface of their scales. In this study, we report the complete genome sequencing of gammaproteobacterial endosymbiont. The endosymbiont genome displays features consistent with ongoing genome reduction such as large proportions of pseudogenes and insertion elements. The genome encodes functions commonly found in deep-sea vent chemoautotrophs such as sulfur oxidation and carbon fixation. Stable carbon isotope (13C)-labeling experiments confirmed the endosymbiont chemoautotrophy. The genome also includes an intact hydrogenase gene cluster that potentially has been horizontally transferred from phylogenetically distant bacteria. Notable findings include the presence and transcription of genes for flagellar assembly, through which proteins are potentially exported from bacterium to the host. Symbionts of snail individuals exhibited extreme genetic homogeneity, showing only two synonymous changes in 19 different genes (13 810 positions in total) determined for 32 individual gastropods collected from a single colony at one time. The extremely low genetic individuality in endosymbionts probably reflects that the stringent symbiont selection by host prevents the random genetic drift in the small population of horizontally transmitted symbiont. This study is the first complete genome analysis of gastropod endosymbiont and offers an opportunity to study genome evolution in a recently evolved endosymbiont.

Journal ArticleDOI
23 May 2014-RNA
TL;DR: It is shown that for 80%-90% of the RP genes, the molar ratio of mRNAs varies less than threefold, with little tissue specificity, and since the RPs are needed in equimolar amounts, there must be sluggish or regulated translation of the more abundant RP m RNAs and/or substantial turnover of unused RPs.
Abstract: The torrent of RNA-seq data becoming available not only furnishes an overview of the entire transcriptome but also provides tools to focus on specific areas of interest. Our focus on the synthesis of ribosomes asked whether the abundance of mRNAs encoding ribosomal proteins (RPs) matched the equimolar need for the RPs in the assembly of ribosomes. We were at first surprised to find, in the mapping data of ENCODE and other sources, that there were nearly 100-fold differences in the level of the mRNAs encoding the different RPs. However, after correcting for the mapping ambiguities introduced by the presence of more than 2000 pseudogenes derived from RP mRNAs, we show that for 80%-90% of the RP genes, the molar ratio of mRNAs varies less than threefold, with little tissue specificity. Nevertheless, since the RPs are needed in equimolar amounts, there must be sluggish or regulated translation of the more abundant RP mRNAs and/or substantial turnover of unused RPs. In addition, seven of the RPs have subsidiary genes, three of which are pseudogenes that have been "rescued" by the introduction of promoters and/or upstream introns. Several of these are transcribed in a tissue-specific manner, e.g., RPL10L in testis and RPL3L in muscle, leading to potential variation in ribosome structure from one tissue to another. Of the 376 introns in the RP genes, a single one is alternatively spliced in a tissue-specific manner.

Journal ArticleDOI
TL;DR: Overall, a broad spectrum of biochemical activity for pseudogenes is identified, with the majority in each organism exhibiting varying degrees of partial activity, suggesting a uniform degradation process.
Abstract: Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism’s genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

Journal ArticleDOI
03 Apr 2014-PLOS ONE
TL;DR: A comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.
Abstract: Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼ 3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.

Journal ArticleDOI
TL;DR: What is known about pseudogene expressed non-coding RNA mediated gene regulation and their roles in the control of epigenetic states is reviewed.

Journal ArticleDOI
TL;DR: Induced pluripotent stem cells from non-human primates reveal insights into general primate gene expression evolution and should provide a rich source to identify conserved and species-specific gene expression patterns for cellular phenotypes.

Journal ArticleDOI
TL;DR: It is now clear that significant levels of retrotransposition occur not only in the human germline but also in some somatic cell types, and new investigations under way suggest that this may especially be the case for cancers and neuronal cells.
Abstract: LINE-1s (L1s), the only currently active autonomous mobile DNA in humans, occupy at least 17% of human DNA. Throughout evolution, the L1 has also been responsible for genomic insertion of thousands of processed pseudogenes and over one million nonautonomous retrotransposons called SINEs (mainly Alus and SVAs). The 6-kb human L1 has a 5′- untranslated region (UTR) that functions as an internal promoter, two open reading frames—ORF1, which encodes an RNA-binding protein, and ORF2, which expresses endonuclease and reverse transcriptase activities—and a 3′-UTR which ends in a poly(A) signal and tail. Most L1s are molecular fossils: truncated, rearranged or mutated. However, 80 to 100 remain potentially active in any human individual, and to date 101 de novo disease-causing germline retrotransposon insertions have been characterized. It is now clear that significant levels of retrotransposition occur not only in the human germline but also in some somatic cell types. Recent publications and new investigations under way suggest that this may especially be the case for cancers and neuronal cells. This commentary offers a few points to consider to aid in avoiding misinterpretation of data as these studies move forward.

Journal ArticleDOI
TL;DR: No evidence suggests that adaptive selection drove the reorganization of neobatrachian mitogenomes, and protein-coding genes that function in metabolism showed evidence for purifying selection, and some functional constraints appear to act on the organization of rRNA and tRNA genes.
Abstract: Although mitochondrial (mt) gene order is highly conserved among vertebrates, widespread gene rearrangements occur in anurans, especially in neobatrachians. Protein coding genes in the mitogenome experience adaptive or purifying selection, yet the role that selection plays on genomic reorganization remains unclear. We sequence the mitogenomes of three species of Glandirana and hot spots of gene rearrangements of 20 frog species to investigate the diversity of mitogenomic reorganization in the Neobatrachia. By combing these data with other mitogenomes in GenBank, we evaluate if selective pressures or functional constraints act on mitogenomic reorganization in the Neobatrachia. We also look for correlations between tRNA positions and codon usage. Gene organization in Glandirana was typical of neobatrachian mitogenomes except for the presence of pseudogene trnS (AGY). Surveyed ranids largely exhibited gene arrangements typical of neobatrachian mtDNA although some gene rearrangements occurred. The correlation between codon usage and tRNA positions in neobatrachians was weak, and did not increase after identifying recurrent rearrangements as revealed by basal neobatrachians. Codon usage and tRNA positions were not significantly correlated when considering tRNA gene duplications or losses. Change in number of tRNA gene copies, which was driven by genomic reorganization, did not influence codon usage bias. Nucleotide substitution rates and dN/dS ratios were higher in neobatrachian mitogenomes than in archaeobatrachians, but the rates of mitogenomic reorganization and mt nucleotide diversity were not significantly correlated. No evidence suggests that adaptive selection drove the reorganization of neobatrachian mitogenomes. In contrast, protein-coding genes that function in metabolism showed evidence for purifying selection, and some functional constraints appear to act on the organization of rRNA and tRNA genes. As important nonadaptive forces, genetic drift and mutation pressure may drive the fixation and evolution of mitogenomic reorganizations.

Journal ArticleDOI
TL;DR: The assumption is made that the intergenic noncoding sequences without definite/clear functions can be involved in spatial organization of genetic loci in interphase nuclei.
Abstract: Most of the mammalian genome consists of nucleotide sequences not coding for proteins. Exons of genes make up only 3% of the human genome, while the significance of most other sequences remains unknown. Recent genome studies with high-throughput methods demonstrate that the so-called noncoding part of the genome may perform important functions. This hypothesis is supported by three groups of experimental data: 1) approximately 10% of the sequences, most of which are located in noncoding parts of the genome, is evolutionarily conserved and thus can be of functional importance; 2) up to 99% of the mammalian genome is being transcribed forming short and long noncoding RNAs in addition to common mRNA; and 3) mutations in noncoding parts of the genome can be accompanied by progression of pathological states of the organism. In the light of these data, in the review we consider the functional role of numerous known sequences of noncoding parts of the genome including introns, DNA methylation regions, enhancers and locus control regions, insulators, S/MAR sequences, pseudogenes, and genes of noncoding RNAs, as well as transposons and simple repeats of centromeric and telomeric regions of chromosomes. The assumption is made that the intergenic noncoding sequences without definite/clear functions can be involved in spatial organization of genetic loci in interphase nuclei.

Journal ArticleDOI
TL;DR: It is concluded that Kat1 is a highly regulated transposase-derived endonuclease vital for sexual differentiation, operating at the fossil imprints of an ancient transposon that catalyzes the differentiation of cell type.
Abstract: Transposable elements (TEs) have had a major influence on shaping both prokaryotic and eukaryotic genomes, largely through stochastic events following random or near-random insertions. In the mammalian immune system, the recombination activation genes1/2 (Rag1/2) recombinase has evolved from a transposase gene, demonstrating that TEs can be domesticated by the host. In this study, we uncovered a domesticated transposase, Kluyveromyces lactis hobo/Activator/Tam3 (hAT) transposase 1 (Kat1), operating at the fossil imprints of an ancient transposon, that catalyzes the differentiation of cell type. Kat1 induces mating-type switching from mating type a (MATa) to MATα in the yeast K. lactis. Kat1 activates switching by introducing two hairpin-capped DNA double-strand breaks (DSBs) in the MATa1–MATa2 intergenic region, as we demonstrate both in vivo and in vitro. The DSBs stimulate homologous recombination with the cryptic hidden MAT left alpha (HMLα) locus resulting in a switch of the cell type. The sites where Kat1 acts in the MATa locus most likely are ancient remnants of terminal inverted repeats from a long-lost TE. The KAT1 gene is annotated as a pseudogene because it contains two overlapping ORFs. We demonstrate that translation of full-length Kat1 requires a programmed −1 frameshift. The frameshift limited Kat1 activity, because restoring the zero frame causes switching to the MATα genotype. Kat1 also was transcriptionally activated by nutrient limitation via the transcription factor mating type switch 1 (Mts1). A phylogenetic analysis indicated that KAT1 was domesticated specifically in the Kluyveromyces clade of the budding yeasts. We conclude that Kat1 is a highly regulated transposase-derived endonuclease vital for sexual differentiation.

Journal ArticleDOI
TL;DR: The use of MAKER-P is reported to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure and demonstrate the utility of MAker-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.
Abstract: The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.

Journal ArticleDOI
TL;DR: It is shown that resistin, TNFα, and PAI-1 (SERPINE1), three genes encoding adipokines inhibiting insulin sensitivity, have been lost in chicken and zebra finch genomes.
Abstract: Gene loss is one of the main drivers in the evolution of genomes and species. The demonstration that a gene has been lost by pseudogenization is truly complete when one finds the pseudogene in the orthologous genomic region with respect to active genes in other species. In some cases, the identification of such orthologous loci is not possible because of chromosomal rearrangements or if the gene of interest has not yet been sequenced. This question is particularly important in the case of birds because the genomes of avian species possess only about 15,000 predicted genes, in comparison with 20,000 in mammals. Yet, gene loss raises the question of which functions are affected by the changes in gene counts. We describe a systematic approach that makes it possible to demonstrate gene loss in the chicken genome even if a pseudogene has not been found. By using phylogenetic and synteny analysis in vertebrates, genome-wide comparisons between the chicken genome and expressed sequence tags, RNAseq data analysis, statistical analysis of the chicken genome, and radiation hybrid mapping, we show that resistin, TNFα, and PAI-1 (SERPINE1), three genes encoding adipokines inhibiting insulin sensitivity, have been lost in chicken and zebra finch genomes. Moreover, omentin, a gene encoding an adipokine that enhances insulin sensitivity, has also been lost in the chicken genome. Overall, only one adipokine inhibiting insulin sensitivity and five adipokines enhancing insulin sensitivity are still present in the chicken genome. These genetic differences between mammals and chicken, given the functions of the genes in mammals, would have dramatic consequences on chicken endocrinology, leading to novel equilibriums especially in the regulation of energy metabolism, insulin sensitivity, as well as appetite and reproduction.

Journal ArticleDOI
TL;DR: The utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization is investigated and suggested.
Abstract: Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods. We generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology. Recent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms.