Structural variation in the human genome

doi:10.1038/NRG1767

Home
/
Papers
/
Structural variation in the human genome

Journal Article•DOI•

Structural variation in the human genome

Lars Feuk¹, Andrew R. Carson¹, Stephen W. Scherer¹•Institutions (1)

The Centre for Applied Genomics¹

01 Feb 2006-Nature Reviews Genetics (Nature Publishing Group)-Vol. 7, Iss: 2, pp 85-97

TL;DR: Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

read less

Abstract: The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

[...]

Ewan Birney, John A. Stamatoyannopoulos¹, Anindya Dutta², Roderic Guigó³ +317 more•Institutions (44)

14 Jun 2007-Nature

TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.

...read moreread less

Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

...read moreread less

5,091 citations

Journal Article•DOI•

Global variation in copy number in the human genome

[...]

Richard Redon¹, Shumpei Ishikawa², Karen R. Fitch³, Lars Feuk⁴, George H. Perry⁵, T. Daniel Andrews¹, Heike Fiegler¹, Michael H. Shapero³, Andrew R. Carson⁴, Wenwei Chen³, Eun Kyung Cho⁶, Stephanie Dallaire⁶, Jennifer L. Freeman⁶, Juan R. González⁷, Mònica Gratacòs⁷, Jing Huang³, Dimitrios Kalaitzopoulos¹, Daisuke Komura², Jeffrey R. MacDonald⁴, Christian R. Marshall⁴, Rui Mei³, Lyndal Montgomery¹, Keunihiro Nishimura², Kohji Okamura⁴, Fan Shen³, Martin J. Somerville⁸, Joelle Tchinda⁶, Armand Valsesia¹, Cara Woodwark¹, Fengtang Yang¹, Junjun Zhang⁴, Tatiana Zerjal¹, Jane Zhang³, Lluís Armengol⁷, Donald F. Conrad⁹, Xavier Estivill⁷, Chris Tyler-Smith¹, Nigel P. Carter¹, Hiroyuki Aburatani², Charles Lee⁶, Keith W. Jones³, Stephen W. Scherer⁴, Matthew E. Hurles¹ - Show less +39 more•Institutions (9)

Wellcome Trust Sanger Institute¹, University of Tokyo², Thermo Fisher Scientific³, University of Toronto⁴, Brigham and Women's Hospital⁵, Harvard University⁶, Pompeu Fabra University⁷, University of Alberta⁸, University of Chicago⁹

23 Nov 2006-Nature

TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.

...read moreread less

Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

...read moreread less

4,275 citations

Journal Article•DOI•

Strong Association of De Novo Copy Number Mutations with Autism

[...]

Jonathan Sebat¹, B. Lakshmi¹, Dheeraj Malhotra¹, Jennifer Troge¹, Christa Lese-Martin², Tom Walsh³, Boris Yamrom¹, Seungtai Yoon¹, Alexander Krasnitz¹, Jude Kendall¹, Anthony Leotta¹, Deepa Pai¹, Ray Zhang¹, Yoon-ha Lee¹, James W. Hicks¹, Sarah J. Spence⁴, Annette Lee⁵, Kaija Puura⁶, Terho Lehtimäki, David H. Ledbetter², Peter K. Gregersen⁵, Joel D. Bregman⁵, James S. Sutcliffe⁷, Vaidehi Jobanputra⁸, Wendy K. Chung⁸, Dorothy Warburton⁸, Mary Claire King³, David Skuse⁹, Daniel H. Geschwind¹⁰, T. Conrad Gilliam¹¹, Kenny Ye¹², Michael Wigler¹ - Show less +28 more•Institutions (12)

Cold Spring Harbor Laboratory¹, Emory University², University of Washington³, National Institutes of Health⁴, North Shore-LIJ Health System⁵, University of Tampere⁶, Vanderbilt University⁷, Columbia University⁸, University College London⁹, University of California, Los Angeles¹⁰, University of Chicago¹¹, Albert Einstein College of Medicine¹²

20 Apr 2007-Science

TL;DR: Findings establish de novo germline mutation as a more significant risk factor for ASD than previously recognized.

...read moreread less

Abstract: We tested the hypothesis that de novo copy number variation (CNV) is associated with autism spectrum disorders (ASDs). We performed comparative genomic hybridization (CGH) on the genomic DNA of patients and unaffected subjects to detect copy number variants not present in their respective parents. Candidate genomic regions were validated by higher-resolution CGH, paternity testing, cytogenetics, fluorescence in situ hybridization, and microsatellite genotyping. Confirmed de novo CNVs were significantly associated with autism (P = 0.0005). Such CNVs were identified in 12 out of 118 (10%) of patients with sporadic autism, in 2 out of 77 (3%) of patients with an affected first-degree relative, and in 2 out of 196 (1%) of controls. Most de novo CNVs were smaller than microscopic resolution. Affected genomic regions were highly heterogeneous and included mutations of single genes. These findings establish de novo germline mutation as a more significant risk factor for ASD than previously recognized.

...read moreread less

2,770 citations

Journal Article•DOI•

Consensus Statement : Chromosomal Microarray Is a First-Tier Clinical Diagnostic Test for Individuals with Developmental Disabilities or Congenital Anomalies

[...]

David T. Miller¹, Margaret P Adam², Margaret P Adam³, Swaroop Aradhya⁴, Leslie G. Biesecker⁵, Arthur R. Brothman⁶, Nigel P. Carter⁷, Deanna M. Church, John A. Crolla⁸, Evan E. Eichler², Charles J. Epstein⁹, W. Andrew Faucett³, Lars Feuk¹⁰, Jan M. Friedman¹¹, Ada Hamosh¹², Laird G. Jackson¹³, Erin B. Kaminsky³, Klaas Kok¹⁴, Ian D. Krantz¹⁵, Robert M. Kuhn¹⁶, Charles Lee¹⁷, James Ostell, Carla Rosenberg, Stephen W. Scherer¹⁸, Nancy B. Spinner¹⁵, Dimitri J. Stavropoulos, James Tepperberg¹⁹, Erik C. Thorland²⁰, Joris Vermeesch²¹, Darrel Waggoner²², Michael S. Watson²³, Christa Lese Martin³, David H. Ledbetter³ - Show less +29 more•Institutions (23)

Boston Children's Hospital¹, University of Washington², Emory University³, GeneDx⁴, National Institutes of Health⁵, University of Utah⁶, Wellcome Trust Sanger Institute⁷, Salisbury University⁸, University of California, San Francisco⁹, Uppsala University¹⁰, University of British Columbia¹¹, Johns Hopkins University School of Medicine¹², Drexel University¹³, University of Groningen¹⁴, University of Pennsylvania¹⁵, University of California, Santa Cruz¹⁶, Brigham and Women's Hospital¹⁷, The Centre for Applied Genomics¹⁸, Research Triangle Park¹⁹, Mayo Clinic²⁰, Katholieke Universiteit Leuven²¹, University of Chicago²², American College of Medical Genetics²³

14 May 2010-American Journal of Human Genetics

TL;DR: Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA).

...read moreread less

Abstract: Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA). Performing CMA and G-banded karyotyping on every patient substantially increases the total cost of genetic testing. The International Standard Cytogenomic Array (ISCA) Consortium held two international workshops and conducted a literature review of 33 studies, including 21,698 patients tested by CMA. We provide an evidence-based summary of clinical cytogenetic testing comparing CMA to G-banded karyotyping with respect to technical advantages and limitations, diagnostic yield for various types of chromosomal aberrations, and issues that affect test interpretation. CMA offers a much higher diagnostic yield (15%–20%) for genetic testing of individuals with unexplained DD/ID, ASD, or MCA than a G-banded karyotype (~3%, excluding Down syndrome and other recognizable chromosomal syndromes), primarily because of its higher sensitivity for submicroscopic deletions and duplications. Truly balanced rearrangements and low-level mosaicism are generally not detectable by arrays, but these are relatively infrequent causes of abnormal phenotypes in this population (<1%). Available evidence strongly supports the use of CMA in place of G-banded karyotyping as the first-tier cytogenetic diagnostic test for patients with DD/ID, ASD, or MCA. G-banded karyotype analysis should be reserved for patients with obvious chromosomal syndromes (e.g., Down syndrome), a family history of chromosomal rearrangement, or a history of multiple miscarriages.

...read moreread less

2,294 citations

Journal Article•DOI•

The Diploid Genome Sequence of an Individual Human

[...]

Samuel Levy¹, Granger G. Sutton¹, Pauline C. Ng¹, Lars Feuk², Aaron L. Halpern¹, Brian P. Walenz¹, Nelson Axelrod¹, Jiaqi Huang¹, Ewen F. Kirkness¹, Gennady Denisov¹, Yuan Lin¹, Jeffrey R. MacDonald², Andy Wing Chun Pang², Mary Shago², Timothy B. Stockwell¹, Alexia Tsiamouri¹, Vineet Bafna³, Vikas Bansal³, Saul A. Kravitz¹, Dana A. Busam¹, Karen Beeson¹, Tina C McIntosh¹, Karin A. Remington¹, Josep F. Abril⁴, John Gill¹, Jon Borman¹, Yu-Hui Rogers¹, Marvin Frazier¹, Stephen W. Scherer², Robert L. Strausberg¹, J. Craig Venter¹ - Show less +27 more•Institutions (4)

J. Craig Venter Institute¹, University of Toronto², University of California, San Diego³, University of Barcelona⁴

04 Sep 2007-PLOS Biology

TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.

...read moreread less

Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

...read moreread less

1,843 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

Genome sequencing in microfabricated high-density picolitre reactors

[...]

Marcel Margulies, Michael Egholm, William E. Altman, Said Attiya, Joel S. Bader, Lisa A. Bemben, Jan Berka, Michael S. Braverman, Yi-Ju Chen, Zhoutao Chen, Scott Dewell, Lei Du, J. M. Fierro, Xavier V. Gomes, Brian C. Godwin, Wen He, Scott Edward Helgesen, Chun Heen Ho, Gerard P. Irzyk, Szilveszter C. Jando, Maria L. I. Alenquer, Thomas P. Jarvie, Kshama B. Jirage, Jong-Bum Kim, James R. Knight, Janna R. Lanza, John H. Leamon, Steven Lefkowitz, Ming Lei, Jing Li, Kenton Lohman, Hong Lu, Vinod Makhijani, Keith Mcdade, Michael P. McKenna, Eugene W. Myers¹, Elizabeth Nickerson, John Nobile, Ramona Plant, Bernard P. Puc, Michael T. Ronan, George T. Roth, Gary J. Sarkis, Jan Fredrik Simons, John Simpson, Maithreyan Srinivasan, Karrie R. Tartaro, Alexander Tomasz², Kari A. Vogt, Greg A. Volkmer, Shally H. Wang, Yong Wang, Michael P. Weiner³, Pengguang Yu, Richard F. Begley, Jonathan M. Rothberg - Show less +52 more•Institutions (3)

University of California, Berkeley¹, Rockefeller University², Rothberg Institute For Childhood Diseases³

15 Sep 2005-Nature

TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.

...read moreread less

Abstract: The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

...read moreread less

8,434 citations

Journal Article•DOI•

Real time quantitative PCR.

[...]

C A Heid¹, J Stevens¹, Kenneth J. Livak, P M Williams•Institutions (1)

Genentech¹

01 Oct 1996-Genome Research

TL;DR: Unlike other quantitative PCR methods, real-time PCR does not require post-PCR sample handling, preventing potential PCR product carry-over contamination and resulting in much faster and higher throughput assays.

...read moreread less

Abstract: We have developed a novel "real time" quantitative PCR method. The method measures PCR product accumulation through a dual-labeled fluorogenic probe (i.e., TaqMan Probe). This method provides very accurate and reproducible quantitation of gene copies. Unlike other quantitative PCR methods, real-time PCR does not require post-PCR sample handling, preventing potential PCR product carry-over contamination and resulting in much faster and higher throughput assays. The real-time PCR method has a very large dynamic range of starting target molecule determination (at least five orders of magnitude). Real-time quantitative PCR is extremely accurate and less labor-intensive than current quantitative PCR methods.

...read moreread less

6,367 citations

Journal Article•DOI•

A haplotype map of the human genome

[...]

John W. Belmont¹, Andrew Boudreau, Suzanne M. Leal¹, Paul Hardenbol +229 more•Institutions (40)

27 Oct 2005

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

...read moreread less

5,479 citations