scispace - formally typeset
Search or ask a question

Showing papers by "George M. Weinstock published in 2008"


Journal ArticleDOI
23 Oct 2008-Nature
TL;DR: The interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated gliobeasts, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
Abstract: Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas ( TCGA) pilot project aims to assess the value of large- scale multi- dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas - the most common type of primary adult brain cancer - and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol- 3- OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.

6,761 citations


Journal ArticleDOI
Li Ding1, Gad Getz2, David A. Wheeler3, Elaine R. Mardis1, Michael D. McLellan1, Kristian Cibulskis2, Carrie Sougnez2, Heidi Greulich2, Heidi Greulich4, Donna M. Muzny3, Margaret Morgan3, Lucinda Fulton1, Robert S. Fulton1, Qunyuan Zhang1, Michael C. Wendl1, Michael S. Lawrence2, David E. Larson1, Ken Chen1, David J. Dooling1, Aniko Sabo3, Alicia Hawes3, Hua Shen3, Shalini N. Jhangiani3, Lora Lewis3, Otis Hall3, Yiming Zhu3, Tittu Mathew3, Yanru Ren3, Jiqiang Yao3, Steven E. Scherer3, Kerstin Clerc3, Ginger A. Metcalf3, Brian Ng3, Aleksandar Milosavljevic3, Manuel L. Gonzalez-Garay3, John R. Osborne1, Rick Meyer1, Xiaoqi Shi1, Yuzhu Tang1, Daniel C. Koboldt1, Ling Lin1, Rachel Abbott1, Tracie L. Miner1, Craig Pohl1, Ginger A. Fewell1, Carrie A. Haipek1, Heather Schmidt1, Brian H. Dunford-Shore1, Aldi T. Kraja1, Seth D. Crosby1, Christopher S. Sawyer1, Tammi L. Vickery1, Sacha N. Sander1, Jody S. Robinson1, Wendy Winckler4, Wendy Winckler2, Jennifer Baldwin2, Lucian R. Chirieac4, Amit Dutt2, Amit Dutt4, Timothy Fennell2, Megan Hanna2, Megan Hanna4, Bruce E. Johnson4, Robert C. Onofrio2, Roman K. Thomas5, Giovanni Tonon4, Barbara A. Weir2, Barbara A. Weir4, Xiaojun Zhao2, Xiaojun Zhao4, Liuda Ziaugra2, Michael C. Zody2, Thomas J. Giordano6, Mark B. Orringer6, Jack A. Roth, Margaret R. Spitz7, Ignacio I. Wistuba, Bradley A. Ozenberger8, Peter J. Good8, Andrew C. Chang6, David G. Beer6, Mark A. Watson1, Marc Ladanyi9, Stephen R. Broderick9, Akihiko Yoshizawa9, William D. Travis9, William Pao9, Michael A. Province1, George M. Weinstock1, Harold E. Varmus9, Stacey Gabriel2, Eric S. Lander2, Richard A. Gibbs3, Matthew Meyerson2, Matthew Meyerson4, Richard K. Wilson1 
23 Oct 2008-Nature
TL;DR: Somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B are found.
Abstract: Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.

2,615 citations


Journal ArticleDOI
17 Apr 2008-Nature
TL;DR: This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods and demonstrated the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing, which is the first genome sequenced by next-generation technologies.
Abstract: Next-generation sequencing technologies are revolutionizing human genomics, promising to yield draft genomes cheaply and quickly. One such technology has now been used to analyse much of the genetic code of a single individual — who happens to be James D. Watson. The procedure, which involves no cloning of the genomic DNA, makes use of the latest 454 parallel sequencing instrument. The sequence cost less than US$1 million (and a mere two months) to produce, compared to the approximately US$100 million reported for sequencing Craig Venter's genome by traditional methods. Still a major undertaking, but another step towards the goal of 'personalized genomes' and 'personalized medicine'. The DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels is reported. The association of genetic variation with disease and drug response, and improvements in nucleic acid technologies, have given great optimism for the impact of ‘genomic medicine’. However, the formidable size of the diploid human genome1, approximately 6 gigabases, has prevented the routine application of sequencing methods to deciphering complete individual human genomes. To realize the full potential of genomics for human health, this limitation must be overcome. Here we report the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence. In addition, we accurately identified small-scale (2–40,000 base pair (bp)) insertion and deletion polymorphism as well as copy number variation resulting in the large-scale gain and loss of chromosomal segments ranging from 26,000 to 1.5 million base pairs. Overall, these results agree well with recent results of sequencing of a single individual2 by traditional methods. However, in addition to being faster and significantly less expensive, this sequencing technology avoids the arbitrary loss of genomic sequences inherent in random shotgun sequencing by bacterial cloning because it amplifies DNA in a cell-free system. As a result, we further demonstrate the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing. This is the first genome sequenced by next-generation technologies. Therefore it is a pilot for the future challenges of ‘personalized genome sequencing’.

1,879 citations


Journal ArticleDOI
24 Apr 2008-Nature
TL;DR: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products.
Abstract: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.

1,248 citations


Stephen Richards, R. A. Gibbs, George M. Weinstock, Susan J. Brown, R. E. Denell, Richard W. Beeman, Richard A. Gibbs, Gregor Bucher, Markus Friedrich, Cornelis J. P. Grimmelikhuijzen, Martin Klingler, Marcé D. Lorenzen, Siegfried Roth, Reinhard Schröder, Diethard Tautz, Evgeny M. Zdobnov, Donna M. Muzny, Tony Attaway, Stephanie Bell, Christian J. Buhay, Mimi N. Chandrabose, Dean Chavez, KP Clerk-Blankenburg, Andy Cree, Marvin Diep Dao, Clay Davis, Joseph Chacko, Huyen Dinh, Shannon Dugan-Rocha, Gerald R. Fowler, Toni T. Garner, Jeffrey Garnes, Andreas Gnirke, Alicia Hawes, Judith Hernandez, Sandra Hines, M. Holder, Jennifer Hume, Shalini N. Jhangiani, Joshi, Ziad Khan, LaRonda Jackson, Christie Kovar, A Kowis, Sandra L. Lee, Lora Lewis, Jonathan Margolis, Michael J. Morgan, Lynne V. Nazareth, Ngoc Nguyen, Geoffrey Okwuonu, David Parker, San Juana Ruiz, Jireh Santibanez, Joël Savard, Steve Scherer, Brian W. Schneider, Erica Sodergren, S Vattahil, Donna Villasana, Courtney Sherell White, Rita A. Wright, Yoonseong Park, Joanne Lord, Brenda Oppert, Stephen Brown, Liangjiang Wang, G Weinstock, Yue Liu, Kim C. Worley, Christine G. Elsik, Justin T. Reese, Eran Elhaik, Giddy Landan, Dan Graur, Peter Arensburger, Peter W. Atkinson, J Beidler, Jeffery P. Demuth, Douglas W. Drury, YZ Du, Haruhiko Fujiwara, Maselli, Mizuko Osanai, Hugh M. Robertson, Zhijian Tu, Jianjun Wang, Suzhi Wang, Henry Song, Lan Zhang, Doreen Werner, Mario Stanke, Burkhard Morgenstern, Solovyev, Peter Kosarev, Garth Brown, Hsiu Chuan Chen, Olga Ermolaeva, Wratko Hlavina, Yuri Kapustin 
01 Jan 2008
TL;DR: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products as discussed by the authors.
Abstract: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products We describe its genome sequence here This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development Systemic RNA interference in T castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control

1,081 citations


Journal ArticleDOI
TL;DR: The complete genomic sequence of DH10B is reported by using reads accumulated from the bovine sequencing project at Baylor College of Medicine and assembled with DNAStar's SeqMan genome assembler, confirming most of the reported alleles and necessitating reexamination of the assumed basis for the high transformability of DH 10B.
Abstract: Escherichia coli DH10B was designed for the propagation of large insert DNA library clones. It is used extensively, taking advantage of properties such as high DNA transformation efficiency and maintenance of large plasmids. The strain was constructed by serial genetic recombination steps, but the underlying sequence changes remained unverified. We report the complete genomic sequence of DH10B by using reads accumulated from the bovine sequencing project at Baylor College of Medicine and assembled with DNAStar's SeqMan genome assembler. The DH10B genome is largely colinear with that of the wild-type K-12 strain MG1655, although it is substantially more complex than previously appreciated, allowing DH10B biology to be further explored. The 226 mutated genes in DH10B relative to MG1655 are mostly attributable to the extensive genetic manipulations the strain has undergone. However, we demonstrate that DH10B has a 13.5-fold higher mutation rate than MG1655, resulting from a dramatic increase in insertion sequence (IS) transposition, especially IS150. IS elements appear to have remodeled genome architecture, providing homologous recombination sites for a 113,260-bp tandem duplication and an inversion. DH10B requires leucine for growth on minimal medium due to the deletion of leuLABCD and harbors both the relA1 and spoT1 alleles causing both sensitivity to nutritional downshifts and slightly lower growth rates relative to the wild type. Finally, while the sequence confirms most of the reported alleles, the sequence of deoR is wild type, necessitating reexamination of the assumed basis for the high transformability of DH10B.

397 citations


Journal ArticleDOI
TL;DR: OG1RF's effects in experimental models suggest that mediators of virulence may be diverse between different E. faecalis strains and that virulence is not dependent on the presence of mobile genetic elements.
Abstract: Background Enterococcus faecalis has emerged as a major hospital pathogen. To explore its diversity, we sequenced E. faecalis strain OG1RF, which is commonly used for molecular manipulation and virulence studies.

271 citations


Journal ArticleDOI
TL;DR: The draft genome of strain TX0016 was analysed for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs) and 22 predicted cell-wall-anchored E. faecium surface proteins, of which 15 had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains.
Abstract: Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about the pathogenic determinants of this organism. We have previously identified a cell-wall-anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad-spectrum binding to extracellular matrix proteins. Here, we analysed the draft genome of strain TX0016 for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Genome-based bioinformatics identified 22 predicted cell-wall-anchored E. faecium surface proteins (Fms), of which 15 (including Acm) had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one [Fms10; redesignated second collagen adhesin of E. faecium (Scm)] revealed that recombinant Scm65 (A- and B-domains) and Scm36 (A-domain) bound to collagen type V efficiently in a concentration-dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism measurements of recombinant Scm36 and of Acm37 indicated that these proteins were rich in β-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; nine of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated EbpCfm), detected a ‘ladder’ pattern of high-molecular-mass protein bands in a Western blot analysis of cell surface extracts from E. faecium, suggesting that EbpCfm is polymerized into a pilus structure. Further analysis of the transcripts of the corresponding gene cluster indicated that fms1 (ebpAfm ), fms5 (ebpBfm ) and ebpCfm are co-transcribed, a result consistent with those for pilus-encoding gene clusters of other Gram-positive bacteria. All 15 genes occurred frequently in 30 clinically derived diverse E. faecium isolates tested. The common occurrence of MSCRAMM- and pilus-encoding genes and the presence of a second collagen-binding protein may have important implications for our understanding of this emerging pathogen.

99 citations


Journal ArticleDOI
TL;DR: It is demonstrated that TP0136 is expressed on the outer membrane of the treponeme during infection and may be involved in attachment to host extracellular matrix components.
Abstract: The antigenicity, structural location, and function of the predicted lipoprotein TP0136 of Treponema pallidum subsp. pallidum were investigated based on previous screening studies indicating that anti-TP0136 antibodies are present in the sera of syphilis patients and experimentally infected rabbits. Recombinant TP0136 (rTP0136) protein was purified and shown to be strongly antigenic during human and experimental rabbit infection. The TP0136 protein was exposed on the surface of the bacterial outer membrane and bound to the host extracellular matrix glycoproteins fibronectin and laminin. In addition, the TP0136 open reading frame was shown to be highly polymorphic among T. pallidum subspecies and strains at the nucleotide and amino acid levels. Finally, the ability of rTP0136 protein to act as a protective antigen to subsequent challenge with infectious T. pallidum in the rabbit model of infection was assessed. Immunization with rTP0136 delayed ulceration but did not prevent infection or the formation of lesions. These results demonstrate that TP0136 is expressed on the outer membrane of the treponeme during infection and may be involved in attachment to host extracellular matrix components.

97 citations


Journal ArticleDOI
TL;DR: The observed genetic changes do not have influence on the ability of Treponema pallidum to cause syphilitic infection, since both SS14 and Nichols are virulent in rabbit.
Abstract: Syphilis spirochete Treponema pallidum ssp. pallidum remains the enigmatic pathogen, since no virulence factors have been identified and the pathogenesis of the disease is poorly understood. Increasing rates of new syphilis cases per year have been observed recently. The genome of the SS14 strain was sequenced to high accuracy by an oligonucleotide array strategy requiring hybridization to only three arrays (Comparative Genome Sequencing, CGS). Gaps in the resulting sequence were filled with targeted dideoxy-terminators (DDT) sequencing and the sequence was confirmed by whole genome fingerprinting (WGF). When compared to the Nichols strain, 327 single nucleotide substitutions (224 transitions, 103 transversions), 14 deletions, and 18 insertions were found. On the proteome level, the highest frequency of amino acid-altering substitution polymorphisms was in novel genes, while the lowest was in housekeeping genes, as expected by their evolutionary conservation. Evidence was also found for hypervariable regions and multiple regions showing intrastrain heterogeneity in the T. pallidum chromosome. The observed genetic changes do not have influence on the ability of Treponema pallidum to cause syphilitic infection, since both SS14 and Nichols are virulent in rabbit. However, this is the first assessment of the degree of variation between the two syphilis pathogens and paves the way for phylogenetic studies of this fascinating organism.

72 citations


Journal ArticleDOI
TL;DR: All available expressed sequence tag (EST) data is compiled and analyzed to facilitate accurate annotation and detailed functional analysis of this genome, suggesting that a considerable number of transcribed sequences were missed by the gene prediction programs or were removed by GLEAN.

Journal ArticleDOI
TL;DR: The current status of the rat genome sequence is described and the plans for its impending 'upgrade' are described, including the new SNP views at Ensembl and the key online resources providing access to therat genome.
Abstract: It has been four years since the original publication of the draft sequence of the rat genome. Five groups are now working together to assemble, annotate and release an updated version of the rat genome. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat's genomic resources to keep pace with the rat's prominence in the laboratory. In this commentary, we describe the current status of the rat genome sequence and the plans for its impending 'upgrade'. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, the RefSeq and Genes databases at the US National Center for Biotechnology Information, Genome Browser at the University of California Santa Cruz and the disease portals for cardiovascular disease and obesity at the Rat Genome Database.

Journal ArticleDOI
TL;DR: It is shown that the selective pressure acting on CD4 is highly variable between regions in the protein and identified codon sites under strong positive selection, which may reflect forces driven by SIV infection and provide a link between changes in sequence and structure of CD4 during evolution and the interaction with the immunodeficiency virus.
Abstract: CD4, an integral membrane glycoprotein, plays a critical role in the immune response and in the life cycle of simian and human immunodeficiency virus (SIV and HIV). Pairwise comparisons of orthologous human and mouse genes show that CD4 is evolving much faster than the majority of mammalian genes. The acceleration is too great to be attributed to a simple relaxation of the action of purifying selection alone. Here we show that the selective pressure acting on CD4 is highly variable between regions in the protein and identify codon sites under strong positive selection. We reconstruct the coding sequences for ancestral primate CD4s and model tertiary structures of all ancestral and extant sequences. Structural mapping of positively selected sites shows they distribute on the surface of the D1 domain of CD4, where the exogenous SIV gp120 protein binds. Moreover, structural models of the ancestral sequences show substantially larger variation in the interfacial electrostatic charge on CD4 and in the surface complementary between CD4 and gp120 in CD4 lineages from primates with natural SIV infections than those without. Thus, positive selection on CD4 among primates may reflect forces driven by SIV infection and could provide a link between changes in sequence and structure of CD4 during evolution and the interaction with the immunodeficiency virus.

Journal ArticleDOI
TL;DR: The rat genome project and the resources it has generated are transforming the translation of rat biology to human medicine and the progress and future plans for the rat genome are discussed.
Abstract: The rat genome project and the resources that it has generated are transforming the translation of rat biology to human medicine. The rat genome was sequenced to a high quality “draft,” the structu...

Journal ArticleDOI
TL;DR: This study describes a high-throughput percentage-of-binding strategy to measure the binding energy in DNA–protein interactions between the Shewanella oneidensis ArcA two-component transcription factor protein and a systematic set of mutants in an ArcA-P (phosphorylated ArcA) binding site.
Abstract: Quantifying the binding energy in DNA–protein interactions is of critical importance to understand transcriptional regulation. Based on a simple computational model, this study describes a high-throughput percentage-of-binding strategy to measure the binding energy in DNA–protein interactions between the Shewanella oneidensis ArcA two-component transcription factor protein and a systematic set of mutants in an ArcA-P (phosphorylated ArcA) binding site. The binding energies corresponding to each of the 4 nt at each position in the 15-bp binding site were used to construct a position-specific energy matrix (PEM) that allowed a reliable prediction of ArcA-P binding sites not only in Shewanella but also in related bacterial genomes.

Journal ArticleDOI
20 Aug 2008-PLOS ONE
TL;DR: A comprehensive gene collection for S. oneidensis was constructed using the lambda recombinase (Gateway) cloning system, and the clone set facilitates a wide variety of integrated studies, including protein expression, purification, and functional analyses of proteins both in vivo and in vitro.
Abstract: A comprehensive gene collection for S. oneidensis was constructed using the lambda recombinase (Gateway) cloning system. A total of 3584 individual ORFs (85%) have been successfully cloned into the entry plasmids. To validate the use of the clone set, three sets of ORFs were examined within three different destination vectors constructed in this study. Success rates for heterologous protein expression of S. oneidensis His- or His/GST- tagged proteins in E. coli were approximately 70%. The ArcA and NarP transcription factor proteins were tested in an in vitro binding assay to demonstrate that functional proteins can be successfully produced using the clone set. Further functional validation of the clone set was obtained from phage display experiments in which a phage encoding thioredoxin was successfully isolated from a pool of 80 different clones after three rounds of biopanning using immobilized anti-thioredoxin antibody as a target. This clone set complements existing genomic (e.g., whole-genome microarray) and other proteomic tools (e.g., mass spectrometry-based proteomic analysis), and facilitates a wide variety of integrated studies, including protein expression, purification, and functional analyses of proteins both in vivo and in vitro.