scispace - formally typeset
Search or ask a question

Showing papers by "Richard K. Wilson published in 2012"


Journal ArticleDOI
04 Oct 2012-Nature
TL;DR: The ability to integrate information across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity.
Abstract: We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at >10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel protein-expression-defined subgroups, possibly produced by stromal/microenvironmental elements, and integrated analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated HER2/EGFR/phosphorylated EGFR signature within the HER2-enriched expression subtype. Comparison of basal-like breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer.

9,355 citations


Journal ArticleDOI
Curtis Huttenhower1, Curtis Huttenhower2, Dirk Gevers1, Rob Knight3  +250 moreInstitutions (42)
14 Jun 2012-Nature
TL;DR: The Human Microbiome Project Consortium reported the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome as discussed by the authors.
Abstract: The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome.

8,410 citations


Journal ArticleDOI
Donna M. Muzny1, Matthew N. Bainbridge1, Kyle Chang1, Huyen Dinh1  +317 moreInstitutions (24)
19 Jul 2012-Nature
TL;DR: Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
Abstract: To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase e (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.

6,883 citations


Journal Article
TL;DR: The Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far, finding the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals.
Abstract: Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.

6,350 citations


Journal ArticleDOI
TL;DR: An analysis tool for the detection of somatic mutations and copy number alterations in exome data from tumor-normal pairs is presented and new light is shed on the landscape of genetic alterations in ovarian cancer.
Abstract: Exome sequencing of tumor samples and matched normal controls has the potential to rapidly identify protein-altering mutations across hundreds of patients, potentially enabling the discovery of recurrent events driving tumor development and growth (International Cancer Genome Consortium 2010; Stratton 2011). Yet the analysis of such data presents significant challenges. Sequencing coverage is nonuniform across targeted regions and from one sample to the next (Ng et al. 2009; Bainbridge et al. 2010; Teer et al. 2010). Many regions achieve high read depth (more than 100×), which can confound variant callers and depth-based filters if not properly addressed (Ku et al. 2011). Repetitive and paralogous sequences can give rise to numerous false positives. The detection of somatic mutations in tumor genomes is even more challenging. The genomes of primary tumors are genetically heterogeneous (Ding et al. 2010), with frequent rearrangements (Campbell et al. 2008) and copy number alterations (CNAs) (Beroukhim et al. 2010). Further, somatic mutations are relatively rare compared with germline variation, often representing <0.1% of variants in a tumor genome (Ley et al. 2008; Mardis et al. 2009). Simply subtracting variants in the matched normal from variants in the tumor (Wei et al. 2011) is poorly suited for the analysis of exome sequence data, because it fails to account for regions that were undersampled in the normal. Accurate mutation detection requires a direct, simultaneous comparison of tumor–normal pairs at every position in the exome, but few algorithms to do so have been described. Numerous algorithms have been developed to assess genome-wide copy number using whole-genome sequencing (WGS) data. Most of these approaches (Campbell et al. 2008; Alkan et al. 2009; Chiang et al. 2009; Yoon et al. 2009; Abyzov et al. 2011) would be confounded by exome data sets, because of the biases introduced by hybridization and the sparse and uneven coverages throughout the genome. However, when both DNA samples in a tumor–normal pair were captured and sequenced under identical hybridization conditions, we reasoned that it might be possible to detect somatic CNAs (SCNAs) as deviations from the log-ratio of sequence coverage depth within a tumor–normal pair, and then quantify the deviations statistically. Such an approach would provide a gene-centric view of copy number in a tumor sample, though it would be limited to the ∼1% of the genome captured by current exome platforms. Previously, we published VarScan (Koboldt et al. 2009), an algorithm for variant detection in next-generation sequencing data. We have since released a new tool, VarScan 2 (http://varscan.sourceforge.net), with several improvements, including the ability to identify somatic mutation, loss of heterozygosity (LOH), and CNA events in tumor–normal pairs. VarScan 2 analyzes sequence data from a tumor sample and its corresponding normal sample simultaneously, applying heuristic methods and a statistical test to detect variants—single nucleotide variants (SNVs) and insertions/deletions (indels)—and classify them by somatic status. By direct comparison of normalized sequence depth, our method also detects SCNAs in the tumor genome. Here, we utilize VarScan 2 for the analysis of exome sequence data from 151 patients with high-grade serous ovarian adenocarcinoma (HGS-OVCa) that were initially characterized within the Cancer Genome Atlas (TCGA) project (Cancer Genome Atlas Research Network 2011). We present a robust pipeline for the detection of both germline (inherited) and somatic (acquired) mutations by exome sequencing and describe filtering approaches for detecting variants with high sensitivity and specificity. To evaluate the performance of our SCNA detection algorithm, we compare our results to copy number data from high-density SNP array and WGS approaches. Our results demonstrate the accuracy of VarScan 2 for somatic mutation and CNA detection and enable a new survey of the genetic landscape in ovarian carcinoma.

4,096 citations


Journal ArticleDOI
Barbara A. Methé1, Karen E. Nelson1, Mihai Pop2, Heather Huot Creasy3  +250 moreInstitutions (42)
14 Jun 2012-Nature
TL;DR: The Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomics data available to the scientific community as mentioned in this paper.
Abstract: A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.

2,172 citations


Journal ArticleDOI
26 Jan 2012-Nature
TL;DR: The sequenced primary tumour and relapse genomes from eight AML patients and validated hundreds of somatic mutations using deep sequencing demonstrated that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped by the chemotherapy that the patients receive to establish and maintain remissions.
Abstract: Most patients with acute myeloid leukaemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level. To determine the mutational spectrum associated with relapse, we sequenced the primary tumour and relapse genomes from eight AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to define clonality and clonal evolution patterns precisely at relapse. In addition to discovering novel, recurrently mutated genes (for example, WAC, SMC3, DIS3, DDX41 and DAXX) in AML, we also found two major clonal evolution patterns during AML relapse: (1) the founding clone in the primary tumour gained mutations and evolved into the relapse clone, or (2) a subclone of the founding clone survived initial therapy, gained additional mutations and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific versus primary tumour mutations in all eight cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped, in part, by the chemotherapy that the patients receive to establish and maintain remissions.

1,925 citations


Journal ArticleDOI
12 Jan 2012-Nature
TL;DR: The mutational spectrum is similar to myeloid tumours, and moreover, the global transcriptional profile of ETP ALL was similar to that of normal andMyeloid leukaemia haematopoietic stem cells, suggesting that addition of myeloids-directed therapies might improve the poor outcome of E TP ALL.
Abstract: Early T-cell precursor acute lymphoblastic leukaemia (ETP ALL) is an aggressive malignancy of unknown genetic basis. We performed whole-genome sequencing of 12 ETP ALL cases and assessed the frequency of the identified somatic mutations in 94 T-cell acute lymphoblastic leukaemia cases. ETP ALL was characterized by activating mutations in genes regulating cytokine receptor and RAS signalling (67% of cases; NRAS, KRAS, FLT3, IL7R, JAK3, JAK1, SH2B3 and BRAF), inactivating lesions disrupting haematopoietic development (58%; GATA3, ETV6, RUNX1, IKZF1 and EP300) and histone-modifying genes (48%; EZH2, EED, SUZ12, SETD2 and EP300). We also identified new targets of recurrent mutation including DNM2, ECT2L and RELN. The mutational spectrum is similar to myeloid tumours, and moreover, the global transcriptional profile of ETP ALL was similar to that of normal and myeloid leukaemia haematopoietic stem cells. These findings suggest that addition of myeloid-directed therapies might improve the poor outcome of ETP ALL.

1,425 citations



Journal ArticleDOI
TL;DR: To identify somatic mutations in pediatric diffuse intrinsic pontine glioma (DIPG), whole-genome sequencing of DNA from seven DIPGs and matched germline tissue and targeted sequencing of an additional 43 DIPG and 36 non-brainstem pediatric glioblastomas (non-BS-PGs) were performed.
Abstract: To identify somatic mutations in pediatric diffuse intrinsic pontine glioma (DIPG), we performed whole-genome sequencing of DNA from seven DIPGs and matched germline tissue and targeted sequencing of an additional 43 DIPGs and 36 non-brainstem pediatric glioblastomas (non-BS-PGs). We found that 78% of DIPGs and 22% of non-BS-PGs contained a mutation in H3F3A, encoding histone H3.3, or in the related HIST1H3B, encoding histone H3.1, that caused a p.Lys27Met amino acid substitution in each protein. An additional 14% of non-BS-PGs had somatic mutations in H3F3A causing a p.Gly34Arg alteration.

1,362 citations


Journal ArticleDOI
26 Apr 2012-Neuron
TL;DR: Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveals de novo small indels and point substitutions, which suggest FMRP-associated genes are especially dosage-sensitive targets of cognitive disorders.

Journal ArticleDOI
14 Sep 2012-Cell
TL;DR: Cell-cycle and JAK-STAT pathways are significantly altered in lung cancer, along with perturbations in 54 genes that are potentially targetable with currently available drugs, including ROS1 and ALK, as well as novel metabolic enzymes.

Journal ArticleDOI
21 Jun 2012-Nature
TL;DR: To correlate the variable clinical features of oestrogen-receptor-positive breast cancer with somatic alterations, pretreatment tumour biopsies accrued from patients in two studies of neoadjuvant aromatase inhibitor therapy are studied by massively parallel sequencing and analysis.
Abstract: Whole-genome analysis of oestrogen-receptor-positive tumours in patients treated with aromatase inhibitors show that distinct phenotypes are associated with specific patterns of somatic mutations; however, most recurrent mutations are relatively infrequent so prospective clinical trials will require comprehensive sequencing and large study populations.

Journal ArticleDOI
02 Aug 2012-Nature
TL;DR: Modelling of mutations in mouse lower rhombic lip progenitors that generate WNT-subgroup tumours identified genes that maintain this cell lineage (DDX3X), as well as mutated genes that initiate (CDH1) or cooperate (PIK3CA) in tumorigenesis.
Abstract: Medulloblastoma is a malignant childhood brain tumour comprising four discrete subgroups. Here, to identify mutations that drive medulloblastoma, we sequenced the entire genomes of 37 tumours and matched normal blood. One-hundred and thirty-six genes harbouring somatic mutations in this discovery set were sequenced in an additional 56 medulloblastomas. Recurrent mutations were detected in 41 genes not yet implicated in medulloblastoma; several target distinct components of the epigenetic machinery in different disease subgroups, such as regulators of H3K27 and H3K4 trimethylation in subgroups 3 and 4 (for example, KDM6A and ZMYM3), and CTNNB1-associated chromatin re-modellers in WNT-subgroup tumours (for example, SMARCA4 and CREBBP). Modelling of mutations in mouse lower rhombic lip progenitors that generate WNT-subgroup tumours identified genes that maintain this cell lineage (DDX3X), as well as mutated genes that initiate (CDH1) or cooperate (PIK3CA) in tumorigenesis. These data provide important new insights into the pathogenesis of medulloblastoma subgroups and highlight targets for therapeutic development.

Journal ArticleDOI
TL;DR: Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived, a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection.
Abstract: Background The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood. Methods We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations. Results Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast co...

Journal ArticleDOI
08 Mar 2012-Nature
TL;DR: A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing.
Abstract: Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.

Journal ArticleDOI
24 Aug 2012-Science
TL;DR: Analyzing five cancer types occurring among several individuals found that three types of epithelial tumors exhibited high rates of element movement relative to brain and blood cancers, and Whole-genome sequencing provides evidence for somatic insertions in colorectal, prostate, and ovarian cancers.
Abstract: Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing data sets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.

Journal ArticleDOI
TL;DR: In this article, a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events.
Abstract: Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.

Journal ArticleDOI
TL;DR: The mathematical basis of the SomaticSniper software for comparing tumor and normal pairs is described, its sensitivity and precision are estimated, and several common sources of error resulting in miscalls are presented.
Abstract: Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample. Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X. Contact: ude.ltsuw@nosraled; ude.ltsuw@gnidl Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: A missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 subjects with de novo MDS, and suggestive evidence of an increased risk of progression to sAML associated with this mutation is found.
Abstract: Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.

Journal ArticleDOI
19 Jan 2012-Nature
TL;DR: It is shown that the retinoblastoma genome is stable, but that multiple cancer pathways can be epigenetically deregulated, and that the proto-oncogene SYK is upregulated in retinOBlastoma and is required for tumour cell survival.
Abstract: Retinoblastoma is an aggressive childhood cancer of the developing retina that is initiated by the biallelic loss of RB1. Tumours progress very quickly following RB1 inactivation but the underlying mechanism is not known. Here we show that the retinoblastoma genome is stable, but that multiple cancer pathways can be epigenetically deregulated. To identify the mutations that cooperate with RB1 loss, we performed whole-genome sequencing of retinoblastomas. The overall mutational rate was very low; RB1 was the only known cancer gene mutated. We then evaluated the role of RB1 in genome stability and considered non-genetic mechanisms of cancer pathway deregulation. For example, the proto-oncogene SYK is upregulated in retinoblastoma and is required for tumour cell survival. Targeting SYK with a small-molecule inhibitor induced retinoblastoma tumour cell death in vitro and in vivo. Thus, retinoblastomas may develop quickly as a result of the epigenetic deregulation of key cancer pathways as a direct or indirect result of RB1 loss.

Journal ArticleDOI
14 Mar 2012-JAMA
TL;DR: Somatic recurrent mutations in tumors from patients with neuroblastoma correlated with the age at diagnosis and telomere length, and ATRX mutations were associated with age atdiagnosis in children and young adults with stage 4 neuroblastomas.
Abstract: Context Neuroblastoma is diagnosed over a wide age range from birth through young adulthood, and older age at diagnosis is associated with a decline in survivability. Objective To identify genetic mutations that are associated with age at diagnosis in patients with metastatic neuroblastoma. Design, Setting, and Patients Whole genome sequencing was performed on DNA from diagnostic tumors and their matched germlines from 40 patients with metastatic neuroblastoma obtained between 1987 and 2009. Age groups at diagnosis included infants (0- Main Outcome Measure Somatic recurrent mutations in tumors from patients with neuroblastoma correlated with the age at diagnosis and telomere length. Results In the discovery cohort (n = 40), mutations in the ATRX gene were identified in 100% (95% CI, 50%-100%) of tumors from patients in the adolescent and young adult group (5 of 5), in 17% (95% CI, 7%-36%) of tumors from children (5 of 29), and 0% (95% CI, 0%-40%) of tumors from infants (0 of 6). In the validation cohort (n = 64), mutations in the ATRX gene were identified in 33% (95% CI, 17%-54%) of tumors from patients in the adolescent and young adult group (9 of 27), in 16% (95% CI, 6%-35%) of tumors from children (4 of 25), and in 0% (95% CI, 0%-24%) of tumors from infants (0 of 12). In both cohorts (N = 104), mutations in the ATRX gene were identified in 44% (95% CI, 28%-62%) of tumors from patients in the adolescent and young adult group (14 of 32), in 17% (95% CI, 9%-29%) of tumors from children (9 of 54), and in 0% (95% CI, 0%-17%) of tumors from infants (0 of 18). ATRX mutations were associated with an absence of the ATRX protein in the nucleus and with long telomeres. Conclusion ATRX mutations were associated with age at diagnosis in children and young adults with stage 4 neuroblastoma. Trial Registration clinicaltrials.gov Identifier: NCT00588068

Journal ArticleDOI
11 May 2012-Cell
TL;DR: The data suggest a mechanism where incomplete duplication created a novel gene function-antagonizing parental SRGAP2 function-immediately "at birth" 2-3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.

Journal ArticleDOI
TL;DR: The frequency of cancer diagnoses and leukemia subtypes in children and adults and the genetic landscape of 15 different types of pediatric cancers determined from whole-genome sequencing of 260 tumors and matching germline samples are presented.
Abstract: Subject terms: Cancer genomics• Paediatric cancer• Sequencing At a glance Figures View all figures Figure 1: Frequency of cancer diagnoses and leukemia subtypes in children and adults. (a) The frequency of cancer types in children (left) and adults (right) on the basis of 2012 Surveillance, Epidemiology and End Results (SEER) data. Each chart is organized with cancers listed from the most common to the least common in a clockwise fashion. (b) The frequency of T-cell lineage (blue text) and B-cell lineage (black text) subtypes of acute lymphoblastic leukemia (ALL) in children (left) and adults (right). Each chart is organized with ALL subtypes listed from the most common to the least common in a clockwise fashion. iAMP21, intrachromosomal amplification of chromosome 21. Full size image View in article Figure 2: Genetic landscape of 15 different types of pediatric cancers determined from whole-genome sequencing of 260 tumors and matching germline samples. The number of somatic mutations in each sample, including single-nucleotide variations (SNVs), insertion and/or deletion events (indels) and structural variations, is shown as the height in the three-dimensional graph. Only high-quality variations or validated somatic mutations are included in the summary. CDS, protein-coding regions; tier 1, mutations in annotated genes; tier 2, mutations in non-coding conserved or regulatory regions; tier 3, mutations in non-repetitive, non-coding and non-conserved regions; tier 4, mutations in repetitive regions. Tier 2 and tier 3/tier 4 mutations were rescaled to 1/10 and 1/100 of the original counts to maintain a consistent scale with the results for other somatic lesions. INF, infant ALL; CBF, core-binding-factor acute myeloid leukemia; TALL, T-cell ALL; AMLM7, acute megakaryoblastic leukemia; HYPO, hypodiploid ALL; PHALL, Philadelphia chromosome–positive BCR-ABL1 ALL; RB, retinoblastoma; RHB, rhabdomyosarcoma; NBL, neuroblastoma; OS, osteosarcoma; ACT, adrenocortical carcinoma; HGG, high-grade glioblastoma; LGG, low-grade glioma; EPD, ependymoma; MB, medulloblastoma. Full size image View in article

Journal ArticleDOI
13 Jun 2012-PLOS ONE
TL;DR: The data production protocols used for this work are those used by the participating centers to produce 16S rDNA sequence for the Human Microbiome Project, and these results can be informative for interpreting the large body of clinical 16s rDNA data produced for this project.
Abstract: The Human Microbiome Project will establish a reference data set for analysis of the microbiome of healthy adults by surveying multiple body sites from 300 people and generating data from over 12,000 samples. To characterize these samples, the participating sequencing centers evaluated and adopted 16S rDNA community profiling protocols for ABI 3730 and 454 FLX Titanium sequencing. In the course of establishing protocols, we examined the performance and error characteristics of each technology, and the relationship of sequence error to the utility of 16S rDNA regions for classification- and OTU-based analysis of community structure. The data production protocols used for this work are those used by the participating centers to produce 16S rDNA sequence for the Human Microbiome Project. Thus, these results can be informative for interpreting the large body of clinical 16S rDNA data produced for this project.

Journal ArticleDOI
01 Mar 2012-Nature
TL;DR: An empirical reconstruction of human MSY evolution is presented, in which each stratum transitioned from rapid, exponential loss of ancestral genes to strict conservation through purifying selection.
Abstract: This evolutionary decay was driven by a series of five ‘stratification’ events. Each event suppressed X–Y crossing over within a chromosome segment or ‘stratum’, incorporated that segment into the MSY and subjected its genes to the erosive forces that attend the absence of crossing over 2,6 . The last of these events occurred 30 million years ago, 5 million years before the human and Old World monkey lineages diverged. Although speculation abounds regarding ongoing decay and looming extinction of the human Y chromosome 7–10 , remarkably little is known about how many MSY genes were lost in the human lineage in the 25 million years that have followed its separation from the Old World monkey lineage. To investigate this question, we sequenced the MSY of the rhesus macaque, an Old World monkey, and compared it to the human MSY. We discovered that during the last 25 million years MSY gene loss in the human lineage was limited to the youngest stratum (stratum 5), which comprises three percent of the human MSY. In the older strata, which collectively comprise the bulk of the human MSY, gene loss evidently ceased more than 25 million years ago. Likewise, the rhesus MSY has not lost any older genes (from strata 1–4) during the past 25 million years, despite its major structural differences to the human MSY. The rhesus MSY is simpler, with few amplified gene families or palindromes that might enable intrachromosomal recombination and repair. We present an empirical reconstruction of human MSY evolution in which each stratum transitioned from rapid, exponential loss of ancestral genes to strict conservation through purifying selection. The human Y chromosome no longer engages in crossing over with its once-identical partner, the X chromosome, except in its pseudoautosomal regions. During evolution, X–Y crossing over was suppressed in five different chromosomal regions at five different times, each probably resulting from an inversion in the Y chromosome 2,3 . Each of these regions of the Y chromosome then began its own individual course of degeneration, experiencing deletions and gene loss. Comparison of the present-day X and Y chromosomes enables identification of these five evolutionary ‘strata’ in the MSY (and X chromosome); their distinctive degrees of X–Y differentiation indicate their evolutionary ages 2,3 . The oldest stratum (stratum 1) dates back over 240 million years (Myr) 2 and is the most highly differentiated, and the youngest stratum (stratum 5) originated only 30 Myr ago and displays the highest X–Y nucleotide sequence similarity within the MSY 3 . The five strata and their respective decay processes, over tens to hundreds of millions of years of mammalian evolution, offer replicate experiments of nature from which to reconstruct the trajectories and kinetics of gene loss in the MSY. Only the human and chimpanzee MSYs had been sequenced before the present study, and they are separated by just 6 Myr of evolution. We decided to examine the MSY of a much more distant relative, the rhesus macaque (Macaca mulatta), to enable us to reconstruct gene loss and conservation in the MSY during the past 25 Myr. We sequenced the rhesus MSY using bacterial artificial chromosome (BAC) clones and the SHIMS (single-haplotype iterative mapping and sequencing) strategy that has previously been used in the human and chimpanzee MSYs 4,11–13 as well as in the chicken Z chromosome 5 . The resulting sequence is comprised of 11.0 megabases (Mb), is complete aside from three small gaps and has an error rate of about one nucleotide per Mb. We ordered and oriented the finished sequence contigs by fluorescence in situ hybridization and radiation hybrid mapping (Supplementary Figs 1–6, Supplementary Table 1, Supplemen

Journal ArticleDOI
TL;DR: In this article, the authors performed transcriptome sequencing on diagnostic blasts from 14 pediatric patients and validated their findings in a recurrency/validation cohort consisting of 34 pediatric and 28 adult AMKL samples.


Journal ArticleDOI
20 Jan 2012-Cell
TL;DR: The unprecedented resolution of high-throughput genomics has enabled the recent discovery of a phenomenon by which specific regions of the genome are shattered and then stitched together via a single devastating event, referred to as chromothripsis.

Journal ArticleDOI
TL;DR: A software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency is presented.
Abstract: Summary: Despite recent progress, computational tools that identify gene fusions from next-generation whole transcriptome sequencing data are often limited in accuracy and scalability. Here, we present a software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency. Availability: http://bioinformatics.mdanderson.org/main/BreakFusion Contact: gro.nosrednadm@3nehck; ude.ltsuw.emoneg@gnidl Supplementary information: Supplementary data are available at Bioinformatics online