scispace - formally typeset
Search or ask a question
Topic

Pseudogene

About: Pseudogene is a research topic. Over the lifetime, 5528 publications have been published within this topic receiving 336634 citations. The topic is also known as: Ψ & pseudogenes.


Papers
More filters
Journal ArticleDOI
16 Jun 2006-Science
TL;DR: It is shown that Xist evolved, at least partly, from a protein-coding gene and that the loss of protein- coding function of the proto-Xist coincides with the four flanking protein genes becoming pseudogenes, which suggests that mechanisms of dosage compensation have evolved independently in both lineages.
Abstract: The Xist noncoding RNA is the key initiator of the process of X chromosome inactivation in eutherian mammals, but its precise function and origin remain unknown. Although Xist is well conserved among eutherians, until now, no homolog has been identified in other mammals. We show here that Xist evolved, at least partly, from a protein-coding gene and that the loss of protein-coding function of the proto-Xist coincides with the four flanking protein genes becoming pseudogenes. This event occurred after the divergence between eutherians and marsupials, which suggests that mechanisms of dosage compensation have evolved independently in both lineages.

391 citations

Journal ArticleDOI
TL;DR: The studies support a model for KIR haplotype diversity based on six basic gene compositions, and suggest that the centromeric half of the KIR genomic region is comprised of three major combinations, while the telomeric half can assume a short form with either 2DS4 or KIR1D or a long form with multiple combinations of several stimulatory KIR genes.
Abstract: Killer Ig-like receptor (KIR) genes constitute a multigene family whose genomic diversity is achieved through differences in gene content and allelic polymorphism. KIR haplotypes containing a single activating KIR gene (A-haplotypes), and KIR haplotypes with multiple activating receptor genes (B-haplotypes) have been described. We report the evaluation of KIR gene content in extended families, sibling pairs, and an unrelated Caucasian panel through identification of the presence or absence of 14 KIR genes and 2 pseudogenes. Haplotype definition included subtyping for the expressed and nonexpressed KIR2DL5 variants, for two alleles of pseudogene 3DP1, and for two alleles of 2DS4, including a novel 2DS4 allele, KIR1D. KIR1D appears functionally homologous to the rhesus monkey KIR1D and likely arose as a consequence of a 22 nucleotide deletion in the coding sequence of 2DS4, leading to disruption of Ig-domain 2D and a premature termination codon following the first amino acid in the putative transmembrane domain. Our investigations identified 11 haplotypes within 12 families. From 49 sibling pairs and 17 consanguineous DNA samples, an additional 12 haplotypes were predicted. Our studies support a model for KIR haplotype diversity based on six basic gene compositions. We suggest that the centromeric half of the KIR genomic region is comprised of three major combinations, while the telomeric half can assume a short form with either 2DS4 or KIR1D or a long form with multiple combinations of several stimulatory KIR genes. Additional rare haplotypes can be identified, and may have arisen by gene duplication, intergenic recombination, or deletions.

387 citations

Journal ArticleDOI
TL;DR: It is shown that a hitherto unknown L1Hs antisense promoter (ASP) drives the transcription of adjacent genes, and this type of transcriptional control may be widespread.
Abstract: In the human genome, retrotranspositionally competent long interspersed nuclear elements (L1Hs) are involved in the generation of processed pseudogenes and mobilization of unrelated sequences into existing genes. Transcription of each L1Hs is initiated from its internal promoter but may also be driven from the promoters of adjacent cellular genes. Here I show that a hitherto unknown L1Hs antisense promoter (ASP) drives the transcription of adjacent genes. The ASP is located in the L1Hs 5* untranslated region (5*UTR) and works in the opposite direction. Fifteen cDNAs, isolated from a human NTera2D1 cDNA library by a differential screening method, contained L1Hs 5*UTRs spliced to the sequences of known genes or non-proteincoding sequences. Four of these chimeric transcripts, selected for detailed analysis, were detected in total RNA of different cell lines. Their abundance accounted for roughly 1 to 500% of the transcripts of four known genes, suggesting a large variation in the efficiency of L1Hs ASP-driven transcription. ASP-directed transcription was also revealed from expressed sequence tag sequences and confirmed by using an RNA dot blot analysis. Nine of the 15 randomly selected genomic L1Hs 5*UTRs had ASP activities about 7- to 50-fold higher than background in transient transfection assays. ASP was assigned to the L1Hs 5*UTR between nucleotides 400 to 600 by deletion and mutation analysis. These results indicate that many L1Hs contain active ASPs which are capable of interfering with normal gene expression, and this type of transcriptional control may be widespread.

387 citations

Journal ArticleDOI
23 Jul 2013-eLife
TL;DR: The discovery and characterized the sets of mouse lncRNAs induced by inflammatory signaling via TNFα suggest that expression of pseudogenes lnc RNAs are actively regulated and constitute functional regulators of inflammatory signaling.
Abstract: Pseudogenes are thought to be inactive gene sequences, but recent evidence of extensive pseudogene transcription raised the question of potential function. Here we discover and characterize the sets of mouse lncRNAs induced by inflammatory signaling via TNFα. TNFα regulates hundreds of lncRNAs, including 54 pseudogene lncRNAs, several of which show exquisitely selective expression in response to specific cytokines and microbial components in a NF-κB-dependent manner. Lethe, a pseudogene lncRNA, is selectively induced by proinflammatory cytokines via NF-κB or glucocorticoid receptor agonist, and functions in negative feedback signaling to NF-κB. Lethe interacts with NF-κB subunit RelA to inhibit RelA DNA binding and target gene activation. Lethe level decreases with organismal age, a physiological state associated with increased NF-κB activity. These findings suggest that expression of pseudogenes lncRNAs are actively regulated and constitute functional regulators of inflammatory signaling. DOI: http://dx.doi.org/10.7554/eLife.00762.001

384 citations

Journal ArticleDOI
TL;DR: It is concluded that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome.
Abstract: In addition to over 20,000 protein-coding genes and known small-RNA, including microRNA host genes, the human genome includes at least 9640 loci transcribed solely into long, non-protein-coding RNAs (long noncoding RNAs; lncRNAs), often with multiple transcript isoforms (Derrien et al. 2012). Of these, only a minority (under 100) have been functionally characterized at an individual level by forward and reverse genetic approaches in organismal and cell culture models. The remainder are known purely via high-throughput discovery and expression analysis. Well-known examples of lncRNAs that have been functionally characterized in-depth include the imprinted Myc target H19 (Gabory et al. 2009), the epigenetic homeobox gene regulator HOTAIR, which promotes cancer metastasis (Gupta et al. 2010), and Xist, the lncRNA that is responsible for inactivation of the mammalian X-chromosome (Jeon and Lee 2011). While these few examples already attest to the diversity of lncRNA functions in chromatin remodeling and imprinting, the diversity of heretofore-uncharacterized lncRNAs hints at numerous additional lncRNA-dependent regulatory mechanisms in mammalian systems. Miat is another example of a recently discovered lncRNA that takes part in a direct network feedback loop with the Pou5f1 pluripotency factor in stem cells (Pou5f1 is also known as Oct4); Miat is both a direct target of and a direct regulator of Pou5f1 (Lipovich et al. 2010; Sheik Mohamed et al. 2010). Hence, lncRNAs can be both regulated by and regulators of key transcription factors. LncRNA genes are transcribed in a diverse range of human tissues and cell lines, and show highly specific spatial and temporal expression profiles, which, in conjunction with detailed molecular characterization of the lncRNAs, attest to numerous distinct functions. These functions include, but are not limited to, epigenetic and post-transcriptional gene expression regulation, sense-antisense interactions with known protein-coding genes, direct binding and regulation of transcription factor proteins, nuclear pore gatekeeping, and enhancer function by transcriptional initiation of lncRNAs that cause chromatin remodeling (Lipovich et al. 2010). Mammalian lncRNAs have epigenetic signatures comparable to those of protein-coding genes, frequently associate with the polycomb repressor complex PRC2 which renders them capable of regulating numerous target genes through histone modifications suppressing gene expression, and mediate global transcriptional programs of cancer transcription factors (Guttman et al. 2009; Khalil et al. 2009; Huarte et al. 2010; Derrien et al. 2012). A particularly intriguing property of mammalian lncRNAs is their lack of evolutionary conservation, relative to protein-coding genes. Primate-specific lncRNAs in the human genome are increasingly well-documented in the literature (for a review citing multiple pertinent recent reports, see Lipovich et al. 2010). Previously, Tay et al. (2009) screened the human genome for primate-specific single-copy genomic sequences, uncovering 131 primate-specific transcriptional units supported by transcriptome data. The brain-derived neurotrophic factor (BDNF) gene, a key contributor to synaptic plasticity, learning, memory, and multiple neurological diseases, is overlapped by a cis-encoded primate-specific lncRNA (Pruunsild et al. 2007). Most recently, Derrien et al. (2012) found that ∼30% of human lncRNA transcripts in GENCODE, many of which are expressed in the brain, are primate specific. The resulting relevance of lncRNAs to species-specific phenotypes, including primate and human uniqueness, highlights the importance of using empirical methodologies to document whether lncRNAs are actually non-protein-coding. The majority of definitively known lncRNAs have been annotated using empirical evidence such as cDNA and EST alignments to genome assemblies (Carninci et al. 2005; Katayama et al. 2005; Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project 2009). Yet, despite the attention that they have received, the noncoding status of most lncRNA genes and transcripts has been established mostly through computational means including: examining the size of open reading frames (ORFs), assessing conservation of ORFs that are shorter than known proteins, and looking for conserved translation initiation and termination codons. However, a recent flurry of literature suggests that there may exist a class of bifunctional RNAs encoding both mRNAs and functional noncoding transcripts: Indeed, there is direct evidence for rare members of this transcript class in human, mouse, and fly (Hube et al. 2006; Kondo et al. 2010; Dinger et al. 2011; Ingolia et al. 2011; Ulveling et al. 2011). Hence, identifying the fraction of ostensibly noncoding RNAs that may encode polypeptides is a compelling and open question. In this report, we utilize empirical evidence to estimate, in two ENCODE cell lines, the fraction of annotated lncRNAs that may encode, and therefore possibly function through, polypeptides. As part of the Encyclopedia of DNA Elements (ENCODE) project, matched-sample long polyA+ and polyA− RNA-seq data were produced, along with tandem mass spectrometry (MS/MS) data for cellular proteins, for the Tier-1 “ENCODE-prioritized” human cell lines K562 and GM12878. The RNA-seq data provides measures of relative gene expression in various cellular compartments (Djebali et al. 2012); for both GM12878 and K562, nucleus, cytosol, and whole-cell samples were used to sequence both polyA+ and polyA− RNA populations. These data have been used to obtain measures of transcript abundance for all genes in GENCODE v7 annotation (the annotation generated for the ENCODE Consortium), based on ENCODE and other data (Harrow et al. 2012). The mass spec data were produced via a “shotgun” approach, wherein cells were cultured, subcellular fractionation performed, followed by protein separations, tryptic digestion, and MS/MS analysis. The resulting spectra were mapped directly to a 6-frame translation of the entire hg19 assembly to produce a “proteogenomic track” within the UCSC Genome Browser (Kent 2002; Karolchik et al. 2009), and were also mapped against the GENCODE gene annotation set (J Khatun, Y Yu, J Wrobel, BA Risk, HP Gunawardena, A Secrest, WJ Spitzer, L Xie, L Wang, X Chen, et al., in prep.). Integrative analysis of RNA and proteomics data has been explored in the literature and is examined in another ENCODE paper, highlighting translation of novel splice variants and expressed pseudogenes (Tian et al. 2004, Djebali et al. 2012). However, these data have not yet been applied to examine the empirical evidence for or against translation of computationally classified human long noncoding RNAs. A recent joint study of RNA and proteomic data in mouse revealed that protein levels and mRNA levels correlate such that RNA concentration is predictive of at least 40% of the variation in protein levels (Schwanhausser et al. 2011). Since lncRNA genes are expressed, on average, at 4% of the level of protein-coding genes in the ENCODE cell lines (Derrien et al. 2012), we expect a similarly low level of expression for any putative protein(s) translated from lncRNAs. Therefore, to interrogate the translational competence of lncRNAs, we must account for the relative expression levels of these transcripts. It has been shown that the quantity of detectable matches between MS/MS spectra and their corresponding peptides in a transcript correlate to protein abundance levels (Lu et al. 2007). This means that the number of detected peptide matches is an approximate surrogate for protein abundance (Liu et al. 2004; Vogel and Marcotte 2008). We used this characteristic to determine a calibration function that links mRNA expression abundance and protein expression abundance for the ENCODE data from K562 and GM12878. In our analysis, 21% of GENCODE v7 protein-coding genes are represented by at least one uniquely mapping peptide in any MS/MS sample, and the majority of those genes detected are expressed above 5 RPKMs in the whole-cell RNA-seq data (Harrow et al. 2012). We used these data, applying state-of-the-art machine-learning models to estimate the translational competence of transcripts as a function of RNA expression levels in various cellular compartments and RNA fractions. Using these models, we “regressed out” the expression-level effects to compare the translation competency of ostensibly noncoding transcripts to that of known mRNAs. We then manually examined each lncRNA for which we obtained empirical evidence of coding capacity. From these data, we determined the proportion of lncRNAs that appear to be truly “noncoding” in ENCODE Tier 1 cell lines, and we examined the exceptional cases where there was strong evidence of protein translation to determine whether these are indeed translated lncRNAs or simply misannotated mRNAs.

383 citations


Network Information
Related Topics (5)
Gene
211.7K papers, 10.3M citations
95% related
Genome
74.2K papers, 3.8M citations
93% related
Regulation of gene expression
85.4K papers, 5.8M citations
91% related
Gene expression
113.3K papers, 5.5M citations
90% related
Transcription factor
82.8K papers, 5.4M citations
89% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022250
2021123
2020160
2019119
2018127