scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Alu repeats and human genomic diversity

01 May 2002-Nature Reviews Genetics (Nature Publishing Group)-Vol. 3, Iss: 5, pp 370-379
TL;DR: During the past 65 million years, Alu elements have propagated to more than one million copies in primate genomes, which has resulted in the generation of a series of Alu subfamilies of different ages.
Abstract: During the past 65 million years, Alu elements have propagated to more than one million copies in primate genomes, which has resulted in the generation of a series of Alu subfamilies of different ages. Alu elements affect the genome in several ways, causing insertion mutations, recombination between elements, gene conversion and alterations in gene expression. Alu-insertion polymorphisms are a boon for the study of human population genetics and primate comparative genomics because they are neutral genetic markers of identical descent with known ancestral states.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.
Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

1,843 citations


Cites background from "Alu repeats and human genomic diver..."

  • ...Most of these SINE insertions (88%) belong to the youngest Alu family (AluY), for which insertion polymorphisms are well documented in the human genome [52,53]....

    [...]

Journal ArticleDOI
12 Mar 2004-Science
TL;DR: Mobile elements within genomes have driven genome evolution in diverse ways and are becoming useful tools for learning more about genome evolution and gene function.
Abstract: Mobile elements within genomes have driven genome evolution in diverse ways. Particularly in plants and mammals, retrotransposons have accumulated to constitute a large fraction of the genome and have shaped both genes and the entire genome. Although the host can often control their numbers, massive expansions of retrotransposons have been tolerated during evolution. Now mobile elements are becoming useful tools for learning more about genome evolution and gene function.

1,797 citations

Journal ArticleDOI
TL;DR: The computational problems surrounding repeats are discussed and strategies used by current bioinformatics systems to solve them are described.
Abstract: Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.

1,451 citations


Cites background from "Alu repeats and human genomic diver..."

  • ...Most large genomes are filled with repetitive sequences; for example, nearly half of the human genome is covered by repeats, many of which have been known about for decade...

    [...]

Journal ArticleDOI
TL;DR: This Review focuses on non-long terminal repeat (LTR) retrotransposons, and discusses the many ways that they affect the human genome: from generating insertion mutations and genomic instability to altering gene expression and contributing to genetic innovation.
Abstract: Their ability to move within genomes gives transposable elements an intrinsic propensity to affect genome evolution. Non-long terminal repeat (LTR) retrotransposons — including LINE-1, Alu and SVA elements — have proliferated over the past 80 million years of primate evolution and now account for approximately one-third of the human genome. In this Review, we focus on this major class of elements and discuss the many ways that they affect the human genome: from generating insertion mutations and genomic instability to altering gene expression and contributing to genetic innovation. Increasingly detailed analyses of human and other primate genomes are revealing the scale and complexity of the past and current contributions of non-LTR retrotransposons to genomic change in the human lineage.

1,432 citations


Cites background from "Alu repeats and human genomic diver..."

  • ...as a result of their continued mobilization activity over the past ∼65 My...

    [...]

Journal ArticleDOI
TL;DR: LAST, the open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition, and guarantees that the number of matches increases linearly, instead of quadratically, with sequence length.
Abstract: The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

1,097 citations

References
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal Article
TL;DR: A new basis for the construction of a genetic linkage map of the human genome is described, to develop, by recombinant DNA techniques, random single-copy DNA probes capable of detecting DNA sequence polymorphisms, when hybridized to restriction digests of an individual's DNA.
Abstract: We describe a new basis for the construction of a genetic linkage map of the human genome. The basic principle of the mapping scheme is to develop, by recombinant DNA techniques, random single-copy DNA probes capable of detecting DNA sequence polymorphisms, when hybridized to restriction digests of an individual's DNA. Each of these probes will define a locus. Loci can be expanded or contracted to include more or less polymorphism by further application of recombinant DNA technology. Suitably polymorphic loci can be tested for linkage relationships in human pedigrees by established methods; and loci can be arranged into linkage groups to form a true genetic map of "DNA marker loci." Pedigrees in which inherited traits are known to be segregating can then be analyzed, making possible the mapping of the gene(s) responsible for the trait with respect to the DNA marker loci, without requiring direct access to a specified gene's DNA. For inherited diseases mapped in this way, linked DNA marker loci can be used predictively for genetic counseling.

7,853 citations

Journal ArticleDOI
08 Mar 1996-Science
TL;DR: A few FRDA patients were found to have point mutations in X25, but the majority were homozygous for an unstable GAA trinucleotide expansion in the first X25 intron.
Abstract: Friedreich's ataxia (FRDA) is an autosomal recessive, degenerative disease that involves the central and peripheral nervous systems and the heart. A gene, X25, was identified in the critical region for the FRDA locus on chromosome 9q13. This gene encodes a 210-amino acid protein, frataxin, that has homologs in distant species such as Caenorhabditis elegans and yeast. A few FRDA patients were found to have point mutations in X25, but the majority were homozygous for an unstable GAA trinucleotide expansion in the first X25 intron.

2,552 citations

Journal ArticleDOI
TL;DR: Evidence is presented that single-base repeats (the shortest possible motifs) are represented by longer runs in mammalian introns than would be expected on a random basis, supporting the idea that SSM may be a ubiquitous force in the evolution of the eukaryotic genome.
Abstract: Simple repetitive DNA sequences are a widespread and abundant feature of genomic DNA. The following several features characterize such sequences: (1) they typically consist of a variety of repeated motifs of 1-10 bases--but may include much larger repeats as well; (2) larger repeat units often include shorter ones within them; (3) long polypyrimidine and poly-CA tracts are often found; and (4) tandem arrangements of closely related motifs are often found. We propose that slipped-strand mispairing events, in concert with unequal crossing-over, can readily account for all of these features. The frequent occurrence of long tandem repeats of particular motifs (polypyrimidine and poly-CA tracts) appears to result from nonrandom patterns of nucleotide substitution. We argue that the intrahelical process of slipped-strand mispairing is much more likely to be the major factor in the initial expansion of short repeated motifs and that, after initial expansion, simple tandem repeats may be predisposed to further expansion by unequal crossing-over or other interhelical events because of their propensity to mispair. Evidence is presented that single-base repeats (the shortest possible motifs) are represented by longer runs in mammalian introns than would be expected on a random basis, supporting the idea that SSM may be a ubiquitous force in the evolution of the eukaryotic genome. Simple repetitive sequences may therefore represent a natural ground state of DNA unselected for coding functions.

2,312 citations

Journal ArticleDOI
27 Mar 1987-Science
TL;DR: Ten oligomeric sequences derived from the tandem repeat regions of the myoglobin gene, the zeta-globin pseudogene, the insulin gene, and the X-gene region of hepatitis B virus were used to develop a series of single-copy probes that revealed new, highly polymorphic genetic loci whose allele sizes reflected variation in the number of tandem repeats.
Abstract: A large collection of good genetic markers is needed to map the genes that cause human genetic diseases. Although nearly 400 polymorphic DNA markers for human chromosomes have been described, the majority have only two alleles and are thus uninformative for analysis of genetic linkage in many families. A few known marker systems, however, detect loci that respond to restriction enzyme cleavage by producing a fragment that can have many different lengths. This polymorphism is due to variation in the number of tandem repeats of a short DNA sequence. Because most individuals will be heterozygous at such loci, these markers will provide linkage information in almost all families. Ten oligomeric sequences derived from the tandem repeat regions of the myoglobin gene, the zeta-globin pseudogene, the insulin gene, and the X-gene region of hepatitis B virus, were used to develop a series of single-copy probes. These probes revealed new, highly polymorphic genetic loci whose allele sizes reflected variation in the number of tandem repeats.

1,615 citations


"Alu repeats and human genomic diver..." refers background in this paper

  • ...By contrast, other types of genetic polymorphism, such as variable numbers of tandem repeat...

    [...]