scispace - formally typeset
Search or ask a question
Topic

Pseudogene

About: Pseudogene is a research topic. Over the lifetime, 5528 publications have been published within this topic receiving 336634 citations. The topic is also known as: Ψ & pseudogenes.


Papers
More filters
Journal ArticleDOI
TL;DR: Evidence that PGAM3 likely produces a functional protein is found, as an example of addressing functionality for human processed pseudogenes, and interspecies and intraspecies variation forPGAM3 was not consistent with the neutral model proposed for pseudogene, suggesting that a new functional primate gene has originated.
Abstract: Processed genes are created by retroposition from messenger RNA of expressed genes. The estimated amount of processed copies of genes in the human genome is 10,000-14,000. Some of these might be pseudogenes with the expected pattern for nonfunctional sequences, but some others might be an important source of new genes. We have studied the evolution of a Phosphoglycerate mutase processed gene (PGAM3) described in humans and believed to be a pseudogene. We sequenced PGAM3 in chimpanzee and macaque and obtained polymorphism data for human coding region. We found evidence that PGAM3 likely produces a functional protein, as an example of addressing functionality for human processed pseudogenes. First, the open reading frame was intact despite many deletions that occurred in the 3' untranslated region. Second, it appears that the gene is expressed. Finally, interspecies and intraspecies variation for PGAM3 was not consistent with the neutral model proposed for pseudogenes, suggesting that a new functional primate gene has originated. Amino acid divergence was significantly higher than synonymous divergence in PGAM3 lineage, supporting positive selection acting in this gene. This role of selection was further supported by the excess of rare alleles in a population genetic analysis. PGAM3 is located in a region of very low recombination; therefore, it is conceivable that the rapid fixation events in this newly arising gene may have contributed to a selective sweep of variation in the region.

86 citations

Journal ArticleDOI
TL;DR: It is shown that the three Tetrahymena Piwi family proteins (Twis) preferentially expressed in growing cells differ in their genetic essentiality and subcellular localization, and Affinity purification of all eight distinct Twi proteins revealed unique properties of their bound sRNAs.
Abstract: PAZ/PIWI domain (PPD) proteins carrying small RNAs (sRNAs) function in gene and genome regulation. The ciliate Tetrahymena thermophila encodes numerous PPD proteins exclusively of the Piwi clade. We show that the three Tetrahymena Piwi family proteins (Twis) preferentially expressed in growing cells differ in their genetic essentiality and subcellular localization. Affinity purification of all eight distinct Twi proteins revealed unique properties of their bound sRNAs. Deep sequencing of Twi-bound and total sRNAs in strains disrupted for various silencing machinery uncovered an unanticipated diversity of 23- to 24-nt sRNA classes in growing cells, each with distinct genetic requirements for accumulation. Altogether, Twis distinguish sRNAs derived from loci of pseudogene families, three types of DNA repeats, structured RNAs, and EST-supported loci with convergent or paralogous transcripts. Most surprisingly, Twi7 binds complementary strands of unequal length, while Twi10 binds a specific permutation of the guanosine-rich telomeric repeat. These studies greatly expand the structural and functional repertoire of endogenous sRNAs and RNPs.

86 citations

Journal ArticleDOI
TL;DR: The repertoire of immunoglobulin expressed very early in human development was approached by cloning and sequencing 55 rearranged and 11 germ‐line VH transcripts, after amplification by polymerase chain reaction of cDNA libraries derived from two fetal livers at 8 and 13 weeks of gestation.
Abstract: The repertoire of immunoglobulin expressed very early in human development was approached by cloning and sequencing 55 rearranged and 11 germ-line VH transcripts, after amplification by polymerase chain reaction of cDNA libraries derived from two fetal livers at 8 and 13 weeks of gestation. All families with the exception of VH2, were expressed as soon as 8 weeks, with preferential usage of certain germ-line genes. Very few somatic mutations, randomly localized, were identified. By contrast, in a series of clones derived from the same VDJ rearrangement using the VH6 family, extensive mutations had taken place, mostly accumulated in the third complementarity-determining region (CDR3) suggesting that the specialized enzymatic machinery was at hand very early during human development. Some other characteristics of the fetal repertoire also emerged, namely increased usage of JH3 and JH2, as compared to the adult pattern, where JH4 is dominant and reduced length of the D/CDR3 regions. All D gene families were identified, and their usage frequently involved D-D fusions. N diversity was present very early, and increased with age. Identification of germ-line transcripts pertaining to all six VH families including pseudogenes, in the E55 library, revealed a population very different as compared to rearranged gene transcripts. This suggests that a large portion of VH locus is accessible for transcription, bringing no evidence of correlation between preferential rearrangement of a given VH gene and its localization in the locus.

86 citations

Journal ArticleDOI
TL;DR: Because stranded RNA-seq retains strand information of a read, it can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-strandedRNA-seq.
Abstract: While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult and sometimes impossible to accurately quantify gene expression levels for genes with overlapping genomic loci that are transcribed from opposite strands. It has recently become possible to retain the strand information by modifying the RNA-seq protocol, known as strand-specific or stranded RNA-seq. Here, we evaluated the advantages of stranded RNA-seq in transcriptome profiling of whole blood RNA samples compared with non-stranded RNA-seq, and investigated the influence of gene overlaps on gene expression profiling results based on practical RNA-seq datasets and also from a theoretical perspective. Our results demonstrated a substantial impact of stranded RNA-seq on transcriptome profiling and gene expression measurements. As many as 1751 genes in Gencode Release 19 were identified to be differentially expressed when comparing stranded and non-stranded RNA-seq whole blood samples. Antisense and pseudogenes were significantly enriched in differential expression analyses. Because stranded RNA-seq retains strand information of a read, we can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-stranded RNA-seq. In the human genome, it is not uncommon to find genomic loci where both strands encode distinct genes. Among the over 57,800 annotated genes in Gencode release 19, there are an estimated 19 % (about 11,000) of overlapping genes transcribed from the opposite strands. Based on our whole blood mRNA-seq datasets, the fraction of overlapping nucleotide bases on the same and opposite strands were estimated at 2.94 % and 3.1 %, respectively. The corresponding theoretical estimations are 3 % and 3.6 %, well in agreement with our own findings. Stranded RNA-seq provides a more accurate estimate of transcript expression compared with non-stranded RNA-seq, and is therefore the recommended RNA-seq approach for future mRNA-seq studies.

86 citations

Journal ArticleDOI
TL;DR: It was found that the mean rate of amino acid replacement is not significantly different between genes expressed during and after embryogenesis, however, synonymous substitution rates differed significantly between these two classes.
Abstract: It has been hypothesized that evolutionary changes will be more frequent in later ontogeny than early ontogeny because of developmental constraint. To test this hypothesis, a genomewide examination of molecular evolution through ontogeny was carried out using comparative genomic data in Caenorhabditis elegans and Caenorhabditis briggsae. We found that the mean rate of amino acid replacement is not significantly different between genes expressed during and after embryogenesis. However, synonymous substitution rates differed significantly between these two classes. A genomewide survey of correlation between codon bias and expression level found codon bias to be significantly correlated with mRNA expression (r(s) = -0.30 and P < 10(-131)) but does not alone explain differences in dS between classes. Surprisingly, it was found that genes expressed after embryogenesis have a significantly greater number of duplicates in both the C. elegans and C. briggsae genomes (P < 10(-20) and P < 10(-13)) when compared with early-expressed and nonmodulated genes. A similarity in the distribution of duplicates of nonmodulated and early-expressed genes, as well as a disproportionately higher number of early pseudogenes, lend support to the hypothesis that this difference in duplicate number is caused by selection against gene duplicates of early-expressed genes, reflecting developmental constraint. Developmental constraint at the level of gene duplication may have important implications for macroevolutionary change.

85 citations


Network Information
Related Topics (5)
Gene
211.7K papers, 10.3M citations
95% related
Genome
74.2K papers, 3.8M citations
93% related
Regulation of gene expression
85.4K papers, 5.8M citations
91% related
Gene expression
113.3K papers, 5.5M citations
90% related
Transcription factor
82.8K papers, 5.4M citations
89% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022250
2021123
2020160
2019119
2018127