scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Initial sequencing and analysis of the human genome.

Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature (Nature Publishing Group)-Vol. 409, Iss: 6822, pp 860-921
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
14 Jan 2005-Cell
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

11,624 citations

Journal ArticleDOI
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

9,605 citations

Journal ArticleDOI
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

9,389 citations

References
More filters
Journal ArticleDOI
TL;DR: This study demonstrates that the physical extent of linkage disequilibrium can differ substantially among populations from different regions of the world, because of both ancient genetic drift in the ancestor common to a large regional group of modern populations and recent genetic drift affecting individual populations.
Abstract: Because defects in the phenylalanine hydroxylase gene ( PAH ) cause phenylketonuria (PKU), PAH was studied for normal polymorphisms and linkage disequilibrium soon after the gene was cloned. Studies in the 1980s concentrated on European populations in which PKU was common and showed that haplotype-frequency variation exists between some regions of the world. In European populations, linkage disequilibrium generally was found not to exist between RFLPs at opposite ends of the gene but was found to exist among the RFLPs clustered at each end. We have now undertaken the first global survey of normal variation and disequilibrium across the PAH gene. Four well-mapped single-nucleotide polymorphisms (SNPs) spanning ∼75 kb, two near each end of the gene, were selected to allow linkage disequilibrium across most of the gene to be examined. These SNPs were studied as PCR-RFLP markers in samples of, on average, 50 individuals for each of 29 populations, including, for the first time, multiple populations from Africa and from the Americas. All four sites are polymorphic in all 29 populations. Although all but 5 of the 16 possible haplotypes reach frequencies >5% somewhere in the world, no haplotype was seen in all populations. Overall linkage disequilibrium is highly significant in all populations, but disequilibrium between the opposite ends is significant only in Native American populations and in one African population. This study demonstrates that the physical extent of linkage disequilibrium can differ substantially among populations from different regions of the world, because of both ancient genetic drift in the ancestor common to a large regional group of modern populations and recent genetic drift affecting individual populations.

138 citations

Journal ArticleDOI
TL;DR: It is argued here that the whole-genome shotgun proposed by Weber and Myers satisfies neither the high probability of success nor the decreased cost of any such approach.
Abstract: The human genome project is entering its decisive final phase, in which the genome sequence will be determined in large-scale efforts in multiple laboratories worldwide. A number of sequencing groups are in the process of scaling up their throughput; over the next few years they will need to attain a collective capacity approaching half a gigabase per year to complete the 3-Gb genome sequence by the target date of 2005. At present, all contributing groups are using a clone-by-clone approach, in which mapped bacterial clones (typically 40–400 kb in size) from known chromosomal locations are sequenced to completion. Among other advantages, this permits a variety of alternative sequencing strategies and methods to be explored independently without redundancy of effort. Although it is not too late to consider implementing a different approach, any such approach must have as high a probability of success as the current one and offer significant advantages (such as decreased cost). I argue here that the whole-genome shotgun proposed by Weber and Myers satisfies neither condition.

137 citations

Journal ArticleDOI
TL;DR: These data provide the first example of whole-genome random BAC fingerprint analysis of a eucaryote, and have provided a model essential to efforts aimed at generating similar databases of fingerprint contigs to support sequencing of other complex genomes, including that of human.
Abstract: Arabidopsis thaliana has emerged as a model system for studies of plant genetics and development, and its genome has been targeted for sequencing1 by an international consortium (the Arabidopsis Genome Initiative; http://genome-www.stanford.edu/Arabidopsis/agi.html ). To support the genome-sequencing effort, we fingerprinted more than 20,000 BACs (ref. 2) from two high-quality publicly available libraries3,4,5, generating an estimated 17-fold redundant coverage of the genome, and used the fingerprints to nucleate assembly of the data by computer. Subsequent manual revision of the assemblies resulted in the incorporation of 19,661 fingerprinted BACs into 169 ordered sets of overlapping clones ('contigs'), each containing at least 3 clones. These contigs are ideal for parallel selection of BACs for large-scale sequencing and have supported the generation of more than 5.8 Mb of finished genome sequence submitted to GenBank; analysis of the sequence has confirmed the integrity of contigs constructed using this fingerprint data. Placement of contigs onto chromosomes can now be performed, and is being pursued by groups involved in both sequencing and positional cloning studies. To our knowledge, these data provide the first example of whole-genome random BAC fingerprint analysis of a eucaryote, and have provided a model essential to efforts aimed at generating similar databases of fingerprint contigs to support sequencing of other complex genomes, including that of human.

136 citations

Journal ArticleDOI
TL;DR: Human PEX1 has been identified by computer-based ‘homology probing’ using the ScPexlp sequence to screen databases of expressed sequence tags (dbEST) for human cDNA clones and expresses a 147-kD member of the AAA protein family (ATPases associated with diverse cellular activities).
Abstract: Human peroxisome biogenesis disorders (PBDs) are a group of genetically heterogeneous autosomal-recessive diseases caused by mutations in PEX genes that encode peroxins, proteins required for peroxisome biogenesis. These lethal diseases include Zellweger syndrome (ZS), neonatal adrenoleukodystro-phy (NALD) and infantile Refsum's disease (IRD)1, three pheno-types now thought to represent a continuum of clinical features that are most severe in ZS, milder in NALD and least severe in IRD2. At least eleven PBD complementation groups have been identified by somatic-cell hybridization analysis2–6 compared to the eighteen PEX complementation groups that have been found in yeast. We have cloned the human PEX1 gene encoding a 147-kD member of the AAA protein family (ATPases associated with diverse cellular activities)7, which is the putative orthologue of Saccharomyces cerevisiae Pexlp (ScPexlp). Human PEX1 has been identified by computer-based ‘homology probing’ using the ScPexlp sequence to screen databases of expressed sequence tags (dbEST) for human cDNA clones. Expression of PEX1 rescued the cells from the biogenesis defect in human fibroblasts of complementation group 1 (CG1), the largest PBD complementation group. We show that PEX1 is mutated in CG1 patie

136 citations

Journal ArticleDOI
TL;DR: Observations suggest that, as with some other transposable elements, horizontal transfer may play an important role in the maintenance of P elements in natural populations.
Abstract: The P element, originally described in Drosophila melanogaster, is one of the best-studied eukaryotic transposable elements. In an attempt to understand the evolutionary dynamics of the P element family, an extensive phylogenetic analysis of 239 partial P element sequences has been completed. These sequences were obtained from 40 species in the Drosophila subgenus Sophophora. The phylogeny of the P element family is examined in the context of a phylogeny of the species in which these elements are found. An interesting feature of many of the species examined is the coexistence in the same genome of P sequences belonging to two or more divergent subfamilies. In general, P elements in Drosophila have been transmitted vertically from generation to generation over evolutionary time. However, four unequivocal cases of horizontal transfer, in which the element was transferred between species, have been identified. In addition, the P element phylogeny is best explained in numerous instances by horizontal transfer at various times in the past. These observations suggest that, as with some other transposable elements, horizontal transfer may play an important role in the maintenance of P elements in natural populations.

134 citations