scispace - formally typeset
Search or ask a question

Showing papers by "Richard K. Wilson published in 2015"


Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations


01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations


Journal ArticleDOI
TL;DR: Germline mutations in cancer-predisposing genes were identified in 8.5% of the children and adolescents with cancer, and family history did not predict the presence of an underlying predisposition syndrome in most patients.
Abstract: BackgroundThe prevalence and spectrum of predisposing mutations among children and adolescents with cancer are largely unknown. Knowledge of such mutations may improve the understanding of tumorigenesis, direct patient care, and enable genetic counseling of patients and families. MethodsIn 1120 patients younger than 20 years of age, we sequenced the whole genomes (in 595 patients), whole exomes (in 456), or both (in 69). We analyzed the DNA sequences of 565 genes, including 60 that have been associated with autosomal dominant cancer-predisposition syndromes, for the presence of germline mutations. The pathogenicity of the mutations was determined by a panel of medical experts with the use of cancer-specific and locus-specific genetic databases, the medical literature, computational predictions, and second hits identified in the tumor genome. The same approach was used to analyze data from 966 persons who did not have known cancer in the 1000 Genomes Project, and a similar approach was used to analyze data...

886 citations


Journal ArticleDOI
26 Feb 2015-Nature
TL;DR: The total number of somatic single-nucleotide variants and the percentage of chemotherapy-related transversions are similar in t-AML and de novo AML, indicating that previous chemotherapy does not induce genome-wide DNA damage and suggesting a model in which rare HSPCs carrying age-related TP53 mutations are resistant to chemotherapy and expand preferentially after treatment.
Abstract: Therapy-related acute myeloid leukaemia (t-AML) and therapy-related myelodysplastic syndrome (t-MDS) are well-recognized complications of cytotoxic chemotherapy and/or radiotherapy. There are several features that distinguish t-AML from de novo AML, including a higher incidence of TP53 mutations, abnormalities of chromosomes 5 or 7, complex cytogenetics and a reduced response to chemotherapy. However, it is not clear how prior exposure to cytotoxic therapy influences leukaemogenesis. In particular, the mechanism by which TP53 mutations are selectively enriched in t-AML/t-MDS is unknown. Here, by sequencing the genomes of 22 patients with t-AML, we show that the total number of somatic single-nucleotide variants and the percentage of chemotherapy-related transversions are similar in t-AML and de novo AML, indicating that previous chemotherapy does not induce genome-wide DNA damage. We identified four cases of t-AML/t-MDS in which the exact TP53 mutation found at diagnosis was also present at low frequencies (0.003-0.7%) in mobilized blood leukocytes or bone marrow 3-6 years before the development of t-AML/t-MDS, including two cases in which the relevant TP53 mutation was detected before any chemotherapy. Moreover, functional TP53 mutations were identified in small populations of peripheral blood cells of healthy chemotherapy-naive elderly individuals. Finally, in mouse bone marrow chimaeras containing both wild-type and Tp53(+/-) haematopoietic stem/progenitor cells (HSPCs), the Tp53(+/-) HSPCs preferentially expanded after exposure to chemotherapy. These data suggest that cytotoxic therapy does not directly induce TP53 mutations. Rather, they support a model in which rare HSPCs carrying age-related TP53 mutations are resistant to chemotherapy and expand preferentially after treatment. The early acquisition of TP53 mutations in the founding HSPC clone probably contributes to the frequent cytogenetic abnormalities and poor responses to chemotherapy that are typical of patients with t-AML/t-MDS.

604 citations


Journal ArticleDOI
TL;DR: Infant acute lymphoblastic leukemia (ALL) with MLL rearrangements (MLL-R) represents a distinct leukemia with a poor prognosis and has one of the lowest frequencies of somatic mutations of any sequenced cancer, with the predominant leukemic clone carrying a mean of 1.3 non-silent mutations.
Abstract: Infant acute lymphoblastic leukemia (ALL) with MLL rearrangements (MLL-R) represents a distinct leukemia with a poor prognosis. To define its mutational landscape, we performed whole-genome, exome, RNA and targeted DNA sequencing on 65 infants (47 MLL-R and 18 non-MLL-R cases) and 20 older children (MLL-R cases) with leukemia. Our data show that infant MLL-R ALL has one of the lowest frequencies of somatic mutations of any sequenced cancer, with the predominant leukemic clone carrying a mean of 1.3 non-silent mutations. Despite this paucity of mutations, we detected activating mutations in kinase-PI3K-RAS signaling pathway components in 47% of cases. Surprisingly, these mutations were often subclonal and were frequently lost at relapse. In contrast to infant cases, MLL-R leukemia in older children had more somatic mutations (mean of 6.5 mutations/case versus 1.3 mutations/case, P = 7.15 × 10(-5)) and had frequent mutations (45%) in epigenetic regulators, a category of genes that, with the exception of MLL, was rarely mutated in infant MLL-R ALL.

376 citations


Journal ArticleDOI
TL;DR: Recent technological advances that improve both contiguity and accuracy are summarized and the importance of complete de novo assembly as opposed to read mapping is emphasized as the primary means to understanding the full range of human genetic variation.
Abstract: The discovery of genetic variation and the assembly of genome sequences are both inextricably linked to advances in DNA-sequencing technology. Short-read massively parallel sequencing has revolutionized our ability to discover genetic variation but is insufficient to generate high-quality genome assemblies or resolve most structural variation. Full resolution of variation is only guaranteed by complete de novo assembly of a genome. Here, we review approaches to genome assembly, the nature of gaps or missing sequences, and biases in the assembly process. We describe the challenges of generating a complete de novo genome assembly using current technologies and the impact that being able to perfectly sequence the genome would have on understanding human disease and evolution. Finally, we summarize recent technological advances that improve both contiguity and accuracy and emphasize the importance of complete de novo assembly as opposed to read mapping as the primary means to understanding the full range of human genetic variation.

347 citations


Journal ArticleDOI
Rafael D. Mesquita1, Raquel J. Vionette-Amaral1, Carl Lowenberger2, Rolando Rivera-Pomar3, Fernando A. Monteiro1, Fernando A. Monteiro4, Patrick Minx5, John Spieth5, A. Bernardo Carvalho1, Francisco Panzera6, Daniel Lawson7, André Q. Torres4, André Q. Torres1, José M. C. Ribeiro8, Marcos Henrique Ferreira Sorgine1, Robert M. Waterhouse, Michael J. Montague5, Fernando Abad-Franch4, Michele Alves-Bezerra1, Laurence Rodrigues do Amaral9, Helena Araujo1, Ricardo Nascimento Araújo1, Ricardo Nascimento Araújo10, L. Aravind8, Georgia C. Atella1, Patrícia Azambuja1, Patrícia Azambuja4, Mateus Berni1, Paula R. Bittencourt-Cunha1, Glória R.C. Braz1, Gustavo M. Calderón-Fernández3, Claudia M. A. Carareto11, Mikkel B. Christensen7, Igor Rodrigues da Costa1, Samara G. da Costa4, Marilvia Dansa12, Carlos R. O. Daumas-Filho1, Iron F. De-Paula1, Felipe A. Dias1, George Dimopoulos13, Scott J. Emrich14, Natalia Esponda-Behrens3, Patrícia Fampa15, Rita D. Fernandez-Medina4, Rodrigo Nunes da Fonseca1, Marcio Fontenele1, Catrina Fronick5, Lucinda Fulton5, Ana Caroline P. Gandara1, Eloi S. Garcia4, Eloi S. Garcia1, Fernando A. Genta4, Fernando A. Genta1, Gloria I. Giraldo-Calderón14, Bruno Gomes1, Bruno Gomes4, Katia C. Gondim1, Adriana Granzotto11, Alessandra A. Guarneri1, Alessandra A. Guarneri4, Roderic Guigó16, Myriam Harry17, Daniel S.T. Hughes7, Willy Jablonka1, Emmanuelle Jacquin-Joly, M. Patricia Juárez3, Leonardo Koerich1, Angela B. Lange18, Jose Manuel Latorre-Estivalis4, Jose Manuel Latorre-Estivalis1, Andrés Lavore3, Gena G. Lawrence19, Gena G. Lawrence18, Cristiano Lazoski1, Claudio R. Lazzari17, Raphael R.S. Lopes1, Marcelo G. Lorenzo4, Marcelo G. Lorenzo1, Magda D. Lugon12, David Majerowicz1, Paula L. Marcet19, Marco Mariotti16, Hatisaburo Masuda1, Karyn Megy7, Ana C.A. Melo1, Fanis Missirlis20, Theo Mota10, Fernando G. Noriega21, Marcela Nouzova21, Rodrigo Dutra Nunes1, Raquel L.L. Oliveira1, Gilbert Oliveira-Silveira1, Sheila Ons3, Ian Orchard18, Lucia Pagola3, Gabriela O. Paiva-Silva1, Agustina Pascual3, Márcio G. Pavan4, Nicolás Pedrini3, Alexandre A. Peixoto1, Alexandre A. Peixoto4, Marcos H. Pereira10, Marcos H. Pereira1, Andrew Pike13, Carla Polycarpo1, Francisco Prosdocimi1, Rodrigo Ribeiro-Rodrigues22, Hugh M. Robertson23, Ana Paula Salerno, Didier Salmon1, Didac Santesmasses16, Renata Schama1, Renata Schama4, Eloy S. Seabra-Junior, Lívia Silva-Cardoso1, Mário A.C. Silva-Neto1, Matheus Souza-Gomes9, Marcos Sterkel1, Mabel L. Taracena1, Marta Tojo24, Zhijian Jake Tu25, Jose M. C. Tubio26, Raul Ursic-Bedoya2, Thiago M. Venancio12, Thiago M. Venancio1, Ana Beatriz Walter-Nuno1, Derek Wilson7, Wesley C. Warren5, Richard K. Wilson5, Erwin Huebner27, Ellen M. Dotson19, Pedro L. Oliveira1 
TL;DR: The first genome sequence of a nondipteran insect vector of an important human parasitic disease is described, which provides critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Abstract: Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.

293 citations


Journal ArticleDOI
25 Aug 2015-JAMA
TL;DR: The detection of persistent leukemia-associated mutations in at least 5% of bone marrow cells in day 30 remission samples was associated with a significantly increased risk of relapse, and reduced overall survival.
Abstract: Importance Tests that predict outcomes for patients with acute myeloid leukemia (AML) are imprecise, especially for those with intermediate risk AML. Objectives To determine whether genomic approaches can provide novel prognostic information for adult patients with de novo AML. Design, Setting, and Participants Whole-genome or exome sequencing was performed on samples obtained at disease presentation from 71 patients with AML (mean age, 50.8 years) treated with standard induction chemotherapy at a single site starting in March 2002, with follow-up through January 2015. In addition, deep digital sequencing was performed on paired diagnosis and remission samples from 50 patients (including 32 with intermediate-risk AML), approximately 30 days after successful induction therapy. Twenty-five of the 50 were from the cohort of 71 patients, and 25 were new, additional cases. Exposures Whole-genome or exome sequencing and targeted deep sequencing. Risk of identification based on genetic data. Main Outcomes and Measures Mutation patterns (including clearance of leukemia-associated variants after chemotherapy) and their association with event-free survival and overall survival. Results Analysis of comprehensive genomic data from the 71 patients did not improve outcome assessment over current standard-of-care metrics. In an analysis of 50 patients with both presentation and documented remission samples, 24 (48%) had persistent leukemia-associated mutations in at least 5% of bone marrow cells at remission. The 24 with persistent mutations had significantly reduced event-free survival vs the 26 who cleared all mutations (median [95% CI]: 6.0 months [95% CI, 3.7-9.6] for persistent mutations vs 17.9 months [95% CI, 11.3-40.4] for cleared mutations, log-rank P P P = .003; HR, 2.86 [95% CI, 1.39-5.88], P = .004). Among the 32 patients with intermediate cytogenetic risk, the 14 patients with persistent mutations had reduced event-free survival compared with the 18 patients who cleared all mutations (median [95% CI]: 8.8 months [95% CI, 3.7-14.6] for persistent mutations vs 25.6 months [95% CI, 11.4-not estimable] for cleared mutations, log-rank P = .003; HR, 3.32 [95% CI, 1.44-7.67], P = .005) and reduced overall survival (median [95% CI]: 19.3 months [95% CI, 7.5-42.3] for persistent mutations vs 46.8 months [95% CI, 22.6-not estimable] for cleared mutations, log-rank P = .02; HR, 2.88 [95% CI, 1.11-7.45], P = .03). Conclusions and Relevance The detection of persistent leukemia-associated mutations in at least 5% of bone marrow cells in day 30 remission samples was associated with a significantly increased risk of relapse, and reduced overall survival. These data suggest that this genomic approach may improve risk stratification for patients with AML.

292 citations


Journal ArticleDOI
TL;DR: The homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance and enables the detection of rare variants that may affect individual susceptibility to tumour development.
Abstract: Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine.

236 citations


Journal ArticleDOI
TL;DR: This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159) and reassess optimal sequencing strategies.
Abstract: Tumors are typically sequenced to depths of 75-100× (exome) or 30-50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159).

169 citations


Journal ArticleDOI
TL;DR: This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans.
Abstract: Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity. We furthermore identified significant clusters of non-coding rare variants near NTN1 and NOG and found several rare coding variants likely to affect protein function, including four nonsense variants in ARHGAP29. We confirmed 48 de novo mutations and, based on best biological evidence available, chose two of these for functional assays. One mutation in PAX7 disrupted the DNA binding of the encoded transcription factor in an in vitro assay. The second, a non-coding mutation, disrupted the activity of a neural crest enhancer downstream of FGFR2 both in vitro and in vivo. This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans.

Journal ArticleDOI
TL;DR: The nature, timing and potential prognostic significance of key genetic alterations in pediatric ACT are demonstrated and a hypothetical model of pediatric adrenocortical tumorigenesis is outlined.
Abstract: Paediatric adrenocortical carcinoma is a rare malignancy with poor prognosis. Here we analyse 37 adrenocortical tumours (ACTs) by whole-genome, whole-exome and/or transcriptome sequencing. Most cases (91%) show loss of heterozygosity (LOH) of chromosome 11p, with uniform selection against the maternal chromosome. IGF2 on chromosome 11p is overexpressed in 100% of the tumours. TP53 mutations and chromosome 17 LOH with selection against wild-type TP53 are observed in 28 ACTs (76%). Chromosomes 11p and 17 undergo copy-neutral LOH early during tumorigenesis, suggesting tumour-driver events. Additional genetic alterations include recurrent somatic mutations in ATRX and CTNNB1 and integration of human herpesvirus-6 in chromosome 11p. A dismal outcome is predicted by concomitant TP53 and ATRX mutations and associated genomic abnormalities, including massive structural variations and frequent background mutations. Collectively, these findings demonstrate the nature, timing and potential prognostic significance of key genetic alterations in paediatric ACT and outline a hypothetical model of paediatric adrenocortical tumorigenesis. Pediatric adrenocortical carcinoma is a rare malignancy with poor prognosis. Here the authors analyse the genomes, exomes and transcriptomes of 37 such tumours and identify genetic alterations whose nature, timing and potential interactions are key events with prognostic significance in pediatric adrenocortical tumorigenesis.

Journal ArticleDOI
TL;DR: It is concluded that pediatric CM has a very similar UV-induced mutational spectrum to that found in the adult counterpart, emphasizing the need to promote sun protection practices in early life and to improve access to therapeutic agents being explored in adults in young patients.

Journal ArticleDOI
TL;DR: In the C. a.
Abstract: We describe a genome reference of the African green monkey or vervet (Chlorocebus aethiops). This member of the Old World monkey (OWM) superfamily is uniquely valuable for genetic investigations of simian immunodeficiency virus (SIV), for which it is the most abundant natural host species, and of a wide range of health-related phenotypes assessed in Caribbean vervets (C. a. sabaeus), whose numbers have expanded dramatically since Europeans introduced small numbers of their ancestors from West Africa during the colonial era. We use the reference to characterize the genomic relationship between vervets and other primates, the intra-generic phylogeny of vervet subspecies, and genome-wide structural variations of a pedigreed C. a. sabaeus population. Through comparative analyses with human and rhesus macaque, we characterize at high resolution the unique chromosomal fission events that differentiate the vervets and their close relatives from most other catarrhine primates, in whom karyotype is highly conserved. We also provide a summary of transposable elements and contrast these with the rhesus macaque and human. Analysis of sequenced genomes representing each of the main vervet subspecies supports previously hypothesized relationships between these populations, which range across most of sub-Saharan Africa, while uncovering high levels of genetic diversity within each. Sequence-based analyses of major histocompatibility complex (MHC) polymorphisms reveal extremely low diversity in Caribbean C. a. sabaeus vervets, compared to vervets from putatively ancestral West African regions. In the C. a. sabaeus research population, we discover the first structural variations that are, in some cases, predicted to have a deleterious effect; future studies will determine the phenotypic impact of these variations.

Journal ArticleDOI
20 Jan 2015-Leukemia
TL;DR: It is suggested that HOX expression in most AML samples represents a normal stem cell program that is controlled by epigenetic mechanisms at specific regulatory elements.
Abstract: HOX genes are highly expressed in many acute myeloid leukemia (AML) samples, but the patterns of expression and associated regulatory mechanisms are not clearly understood. We analyzed RNA sequencing data from 179 primary AML samples and normal hematopoietic cells to understand the range of expression patterns in normal versus leukemic cells. HOX expression in AML was restricted to specific genes in the HOXA or HOXB loci, and was highly correlated with recurrent cytogenetic abnormalities. However, the majority of samples expressed a canonical set of HOXA and HOXB genes that was nearly identical to the expression signature of normal hematopoietic stem/progenitor cells. Transcriptional profiles at the HOX loci were similar between normal cells and AML samples, and involved bidirectional transcription at the center of each gene cluster. Epigenetic analysis of a subset of AML samples also identified common regions of chromatin accessibility in AML samples and normal CD34(+) cells that displayed differences in methylation depending on HOX expression patterns. These data provide an integrated epigenetic view of the HOX gene loci in primary AML samples, and suggest that HOX expression in most AML samples represents a normal stem cell program that is controlled by epigenetic mechanisms at specific regulatory elements.

Journal ArticleDOI
TL;DR: This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159) and reassess optimal sequencing strategies.
Abstract: Tumors are typically sequenced to depths of 75-100× (exome) or 30-50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159).

Journal ArticleDOI
TL;DR: The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system.
Abstract: In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.

Journal ArticleDOI
TL;DR: The first genome-wide high-resolution polymorphism resource for non-human primate association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey, is reported.
Abstract: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available. We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices. The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

Journal ArticleDOI
01 Apr 2015-Leukemia
TL;DR: Genomic analyses of a patient with primary myelofibrosis transformed to secondary acute myeloid leukemia (sAML) illustrate the complex clonal dynamics associated with disease evolution in MPNs and sAML.
Abstract: Clonal architecture in myeloproliferative neoplasms (MPNs) is poorly understood. Here we report genomic analyses of a patient with primary myelofibrosis (PMF) transformed to secondary acute myeloid leukemia (sAML). Whole genome sequencing (WGS) was performed on PMF and sAML diagnosis samples, with skin included as a germline surrogate. Deep sequencing validation was performed on the WGS samples and an additional sample obtained during sAML remission/relapsed PMF. Clustering analysis of 649 validated somatic single-nucleotide variants revealed four distinct clonal groups, each including putative driver mutations. The first group (including JAK2 and U2AF1), representing the founding clone, included mutations with high frequency at all three disease stages. The second clonal group (including MYB) was present only in PMF, suggesting the presence of a clone that was dispensable for transformation. The third group (including ASXL1) contained mutations with low frequency in PMF and high frequency in subsequent samples, indicating evolution of the dominant clone with disease progression. The fourth clonal group (including IDH1 and RUNX1) was acquired at sAML transformation and was predominantly absent at sAML remission/relapsed PMF. Taken together, these findings illustrate the complex clonal dynamics associated with disease evolution in MPNs and sAML.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to identify novel coding variants with large effect sizes and also novel genes (TREM2, PLD3, UNC5C, and AKAP9) associated with Alzheimer's disease.

01 Jan 2015
TL;DR: Clinical exome sequencing is highly unlikely to be a useful diagnostic test in patients with true ROHHAD, and it remains imperative to expand the search for non-exomic genetic risk factors, as well as to investigate other possible mechanisms of disease.

Journal ArticleDOI
03 Dec 2015-Blood
TL;DR: Digital sequencing of serial bone marrow samples revealed that responding patients can have persistent measurable clonal hematopoiesis for at least one year without disease progression and that tumor burden can be measured even in patients achieving a CR.

Journal ArticleDOI
TL;DR: Recent advances made in biology and animal biotechnology through the draft genome and developmental transcriptome of O. dentatum are reported in order to support biological research of this and related parasitic nematodes as well as the search for new and improved interventions.

Journal ArticleDOI
23 Mar 2015-PLOS ONE
TL;DR: Exome sequencing on mouse iPSC clones derived from skin fibroblasts showed a wide range of genetic heterogeneity, and whether some cells are more genetically “fit” for reprogramming was determined, suggesting that most of the changes are random, and functionally benign.
Abstract: Induced pluripotent stem cells (iPSCs) have tremendous potential as a tool for disease modeling, drug testing, and other applications. Since the generation of iPSCs “captures” the genetic history of the individual cell that was reprogrammed, iPSC clones (even those derived from the same individual) would be expected to demonstrate genetic heterogeneity. To assess the degree of genetic heterogeneity, and to determine whether some cells are more genetically “fit” for reprogramming, we performed exome sequencing on 24 mouse iPSC clones derived from skin fibroblasts obtained from two different sites of the same 8-week-old C57BL/6J male mouse. While no differences in the coding regions were detected in the two parental fibroblast pools, each clone had a unique genetic signature with a wide range of heterogeneity observed among the individual clones: a total of 383 iPSC variants were validated for the 24 clones (mean 16.0/clone, range 0–45). Since these variants were all present in the vast majority of the cells in each clone (variant allele frequencies of 40–60% for heterozygous variants), they most likely preexisted in the individual cells that were reprogrammed, rather than being acquired during reprogramming or cell passaging. We then tested whether this genetic heterogeneity had functional consequences for hematopoietic development by generating hematopoietic progenitors in vitro and enumerating colony forming units (CFUs). While there was a range of hematopoietic potentials among the 24 clones, only one clone failed to differentiate into hematopoietic cells; however, it was able to form a teratoma, proving its pluripotent nature. Further, no specific association was found between the mutational spectrum and the hematopoietic potential of each iPSC clone. These data clearly highlight the genetic heterogeneity present within individual fibroblasts that is captured by iPSC generation, and suggest that most of the changes are random, and functionally benign.

Journal ArticleDOI
29 Dec 2015-PLOS ONE
TL;DR: It is hypothesized that the developing melanoma actively suppresses the immune system responses of the body in reacting to the invasive malignancy, and that this mal-adaptive response contributes to disease progression, a result that suggests the whole-body transcriptomic approach merits further use.
Abstract: The incidence of malignant melanoma continues to increase each year with poor prognosis for survival in many relapse cases. To reverse this trend, whole body response measures are needed to discover collaborative paths to primary and secondary malignancy. Several species of fish provide excellent melanoma models because fish and human melanocytes both appear in the epidermis, and fish and human pigment cell tumors share conserved gene expression signatures. For the first time, we have examined the whole body transcriptome response to invasive melanoma as a prelude to using transcriptome profiling to screen for drugs in a medaka (Oryzias latipes) model. We generated RNA-seq data from whole body RNA isolates for controls and melanoma fish. After testing for differential expression, 396 genes had significantly different expression (adjusted p-value <0.02) in the whole body transcriptome between melanoma and control fish; 379 of these genes were matched to human orthologs with 233 having annotated human gene symbols and 14 matched genes that contain putative deleterious variants in human melanoma at varying levels of recurrence. A detailed canonical pathway evaluation for significant enrichment showed the top scoring pathway to be antigen presentation but also included the expected melanocyte development and pigmentation signaling pathway. Results revealed a profound down-regulation of genes involved in the immune response, especially the innate immune system. We hypothesize that the developing melanoma actively suppresses the immune system responses of the body in reacting to the invasive malignancy, and that this mal-adaptive response contributes to disease progression, a result that suggests our whole-body transcriptomic approach merits further use. In these findings, we also observed novel genes not yet identified in human melanoma expression studies and uncovered known and new candidate drug targets for further testing in this malignant melanoma medaka model.

Journal ArticleDOI
03 Dec 2015-Blood
TL;DR: A number of genes that were highly recurrently mutated in FL were confirmed, including chromatin modifying genes consisting of histone methyl transferases and histone acetyltransferases.

Journal ArticleDOI
03 Dec 2015-Blood
TL;DR: In this article, the authors performed integrated genomic and epigenetic analyses, biochemical studies and leukemogenesis assays to define the genetic basis of B-progenitor ALL, and identified the identification of ERG ALL by unsupervised clustering and predictive analysis of microarrays.

Journal ArticleDOI
03 Dec 2015-Blood
TL;DR: To determine how AML subclonal architecture changes during decitabine treatment, and whether specific mutations might correlate with sensitivity vs. resistance to decit abine, exome sequencing at multiple time points during single agent decitABine therapy was performed.

Journal ArticleDOI
TL;DR: The presence of the 299Gly allele is clearly associated with neuroprotection and Targeting TLR4 signaling pathway could be considered for pharmacological intervention to prevent Alzheimer's disease.
Abstract: the Assessment of Neuropsychological Status (RBANS) was also evaluated for the PREVENT-AD cohort. Results: In the QFP, the 299Gly allele was present in 6.3% of LOAD cases and 10.7% of aged-matched control subjects (p 0.005). Healthy subjects enrolled in the PREVENT-AD study (with a first-degree family history of AD) were positive for the 299Gly allele in a proportion of 12.5%. In ADNI, the 299Gly allele was detected in 9.5% of AD cases, 9.9% ofMCI patients and 11.2% of control subjects (baseline diagnostic, p 0.05). Although not associated with the disease in the latter population, the 299Gly allele significantly correlated with higher CSF APOE levels measured in MCI patients. In unaffected subjects from the PREVENT-AD cohort, the 299Gly allele was significantly associated with enhanced cortical thickness and a higher RBANS visuospatial score. Conclusions: The presence of the 299Gly allele is clearly associated with neuroprotection. Targeting TLR4 signaling pathway could be considered for pharmacological intervention to prevent Alzheimer’s disease.