scispace - formally typeset
Open AccessJournal ArticleDOI

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

TLDR
It is shown here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human chromosome is repetitive or repeat-derived, and that the human genomes consists of substantially more repetitive sequence than previously believed.
Abstract
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Long noncoding RNAs: past, present, and future.

TL;DR: The current state of knowledge of the lncRNA field is reviewed, discussing what is known about the genomic contexts, biological functions, and mechanisms of action of lncRNAs and how this interest is deeply rooted in biology's longstanding concern with the evolution and function of genomes.
Journal ArticleDOI

Chromosome-scale shotgun assembly using an in vitro method for long-range linkage

TL;DR: This work generated two Chicago data sets with human DNA and developed a statistical model and a new software pipeline that can identify poor quality joins and produce accurate, long-range sequence scaffolds and demonstrated the utility of Chicago for improving existing assemblies by re assembling and scaffolding the genome of the American alligator.
Journal ArticleDOI

Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs

TL;DR: Transposable elements (TEs) are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
Journal ArticleDOI

Transposable Elements, Epigenetics, and Genome Evolution

TL;DR: Transposable genetic elements (TEs) comprise a vast array of DNA sequences, all having the ability to move to new sites in genomes either directly by a cut-and-paste mechanism (Transposons) or indirectly through an RNA intermediate (retrotransposons).
References
More filters
Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI

Initial sequencing and analysis of the human genome.

Eric S. Lander, +248 more
- 15 Feb 2001 - 
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Journal ArticleDOI

BEDTools: a flexible suite of utilities for comparing genomic features

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Journal ArticleDOI

Initial sequencing and comparative analysis of the mouse genome.

Robert H. Waterston, +222 more
- 05 Dec 2002 - 
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Journal ArticleDOI

Tandem repeats finder: a program to analyze DNA sequences

TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Related Papers (5)

Initial sequencing and analysis of the human genome.

Eric S. Lander, +248 more
- 15 Feb 2001 -