scispace - formally typeset
Topic

Pseudogene

About: Pseudogene is a(n) research topic. Over the lifetime, 5528 publication(s) have been published within this topic receiving 336634 citation(s). The topic is also known as: Ψ & pseudogenes.


Papers
More filters
Journal ArticleDOI

[...]

TL;DR: TopHat2 is described, which incorporates many significant enhancements to TopHat, and combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes.
Abstract: TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.

9,972 citations

Journal ArticleDOI

[...]

06 Dec 2002-Science
TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.
Abstract: We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

6,993 citations

Journal ArticleDOI

[...]

05 Aug 2011-Cell
TL;DR: It is proposed that this "competing endogenous RNA" (ceRNA) activity forms a large-scale regulatory network across the transcriptome, greatly expanding the functional genetic information in the human genome and playing important roles in pathological conditions, such as cancer.
Abstract: Here, we present a unifying hypothesis about how messenger RNAs, transcribed pseudogenes, and long noncoding RNAs "talk" to each other using microRNA response elements (MREs) as letters of a new language. We propose that this "competing endogenous RNA" (ceRNA) activity forms a large-scale regulatory network across the transcriptome, greatly expanding the functional genetic information in the human genome and playing important roles in pathological conditions, such as cancer.

3,999 citations

Journal ArticleDOI

[...]

TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.
Abstract: The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

3,750 citations

Journal ArticleDOI

[...]

21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,722 citations


Network Information
Related Topics (5)
Gene
211.7K papers, 10.3M citations
95% related
Genome
74.2K papers, 3.8M citations
93% related
Regulation of gene expression
85.4K papers, 5.8M citations
91% related
Gene expression
113.3K papers, 5.5M citations
90% related
Transcription factor
82.8K papers, 5.4M citations
89% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
2021121
2020160
2019119
2018127
2017144