scispace - formally typeset
Search or ask a question

Showing papers by "Broad Institute published in 2008"


Journal ArticleDOI
TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.
Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

13,008 citations


Journal ArticleDOI
TL;DR: This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.
Abstract: The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.

2,908 citations


Journal ArticleDOI
Li Ding1, Gad Getz2, David A. Wheeler3, Elaine R. Mardis1, Michael D. McLellan1, Kristian Cibulskis2, Carrie Sougnez2, Heidi Greulich4, Heidi Greulich2, Donna M. Muzny3, Margaret Morgan3, Lucinda Fulton1, Robert S. Fulton1, Qunyuan Zhang1, Michael C. Wendl1, Michael S. Lawrence2, David E. Larson1, Ken Chen1, David J. Dooling1, Aniko Sabo3, Alicia Hawes3, Hua Shen3, Shalini N. Jhangiani3, Lora Lewis3, Otis Hall3, Yiming Zhu3, Tittu Mathew3, Yanru Ren3, Jiqiang Yao3, Steven E. Scherer3, Kerstin Clerc3, Ginger A. Metcalf3, Brian Ng3, Aleksandar Milosavljevic3, Manuel L. Gonzalez-Garay3, John R. Osborne1, Rick Meyer1, Xiaoqi Shi1, Yuzhu Tang1, Daniel C. Koboldt1, Ling Lin1, Rachel Abbott1, Tracie L. Miner1, Craig Pohl1, Ginger A. Fewell1, Carrie A. Haipek1, Heather Schmidt1, Brian H. Dunford-Shore1, Aldi T. Kraja1, Seth D. Crosby1, Christopher S. Sawyer1, Tammi L. Vickery1, Sacha N. Sander1, Jody S. Robinson1, Wendy Winckler4, Wendy Winckler2, Jennifer Baldwin2, Lucian R. Chirieac4, Amit Dutt2, Amit Dutt4, Timothy Fennell2, Megan Hanna4, Megan Hanna2, Bruce E. Johnson4, Robert C. Onofrio2, Roman K. Thomas5, Giovanni Tonon4, Barbara A. Weir4, Barbara A. Weir2, Xiaojun Zhao4, Xiaojun Zhao2, Liuda Ziaugra2, Michael C. Zody2, Thomas J. Giordano6, Mark B. Orringer6, Jack A. Roth, Margaret R. Spitz7, Ignacio I. Wistuba, Bradley A. Ozenberger8, Peter J. Good8, Andrew C. Chang6, David G. Beer6, Mark A. Watson1, Marc Ladanyi9, Stephen R. Broderick9, Akihiko Yoshizawa9, William D. Travis9, William Pao9, Michael A. Province1, George M. Weinstock1, Harold E. Varmus9, Stacey Gabriel2, Eric S. Lander2, Richard A. Gibbs3, Matthew Meyerson4, Matthew Meyerson2, Richard K. Wilson1 
23 Oct 2008-Nature
TL;DR: Somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B are found.
Abstract: Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.

2,615 citations


Journal ArticleDOI
13 Mar 2008-Nature
TL;DR: It is demonstrated that M2 expression is necessary for aerobic glycolysis and that this metabolic phenotype provides a selective growth advantage for tumour cells in vivo.
Abstract: Many tumour cells express the M2 form of pyruvate kinase rather than the usual M1 form. PKM2 is now shown to promote tumorigenesis and switch the cellular metabolism to increased lactate production and reduced oxygen consumption, recapitulating key aspects of the Warburg effect.

2,532 citations


Journal ArticleDOI
07 Aug 2008-Nature
TL;DR: Low-throughput reduced representation bisulphite sequencing is established as a powerful technology for epigenetic profiling of cell populations relevant to developmental biology, cancer and regenerative medicine.
Abstract: DNA methylation is essential for normal development and has been implicated in many pathologies including cancer. Our knowledge about the genome-wide distribution of DNA methylation, how it changes during cellular differentiation and how it relates to histone methylation and other chromatin modifications in mammals remains limited. Here we report the generation and analysis of genome-scale DNA methylation profiles at nucleotide resolution in mammalian cells. Using high-throughput reduced representation bisulphite sequencing and single-molecule-based sequencing, we generated DNA methylation maps covering most CpG islands, and a representative sampling of conserved non-coding elements, transposons and other genomic features, for mouse embryonic stem cells, embryonic-stem-cell-derived and primary neural cells, and eight other primary tissues. Several key findings emerge from the data. First, DNA methylation patterns are better correlated with histone methylation patterns than with the underlying genome sequence context. Second, methylation of CpGs are dynamic epigenetic marks that undergo extensive changes during cellular differentiation, particularly in regulatory regions outside of core promoters. Third, analysis of embryonic-stem-cell-derived and primary cells reveals that 'weak' CpG islands associated with a specific set of developmentally regulated genes undergo aberrant hypermethylation during extended proliferation in vitro, in a pattern reminiscent of that reported in some primary tumours. More generally, the results establish reduced representation bisulphite sequencing as a powerful technology for epigenetic profiling of cell populations relevant to developmental biology, cancer and regenerative medicine.

2,482 citations


Journal ArticleDOI
13 Jun 2008-Science
TL;DR: It is found that the Rag proteins—a family of four related small guanosine triphosphatases (GTPases)—interact with mTORC1 in an amino acid–sensitive manner and are necessary for the activation of the m TORC1 pathway by amino acids.
Abstract: The multiprotein mTORC1 protein kinase complex is the central component of a pathway that promotes growth in response to insulin, energy levels, and amino acids and is deregulated in common cancers. We find that the Rag proteins--a family of four related small guanosine triphosphatases (GTPases)--interact with mTORC1 in an amino acid-sensitive manner and are necessary for the activation of the mTORC1 pathway by amino acids. A Rag mutant that is constitutively bound to guanosine triphosphate interacted strongly with mTORC1, and its expression within cells made the mTORC1 pathway resistant to amino acid deprivation. Conversely, expression of a guanosine diphosphate-bound Rag mutant prevented stimulation of mTORC1 by amino acids. The Rag proteins do not directly stimulate the kinase activity of mTORC1, but, like amino acids, promote the intracellular localization of mTOR to a compartment that also contains its activator Rheb.

2,451 citations


Journal ArticleDOI
05 Sep 2008-Cell
TL;DR: The Warburg effect of aerobic glycolysis is re-examine and a framework for understanding its contribution to the altered metabolism of cancer cells is established.

2,081 citations


Journal ArticleDOI
TL;DR: In this article, an automated eukaryotic gene structure annotation tool, EVM, is presented as a weighted consensus of all available evidence, combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein coding genes and alternatively spliced isoforms.
Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

1,996 citations


Journal ArticleDOI
11 Jul 2008-Cell
TL;DR: This work predicts 19 proteins to be important for the function of complex I (CI) of the electron transport chain and validate a subset of these predictions using RNAi, including C8orf38, which is shown to have an inherited mutation in a lethal, infantile CI deficiency.

1,836 citations


Journal ArticleDOI
Hreinn Stefansson1, Dan Rujescu2, Sven Cichon3, Olli Pietiläinen, Andres Ingason1, Stacy Steinberg1, Ragnheidur Fossdal1, Engilbert Sigurdsson, Thordur Sigmundsson, Jacobine E. Buizer-Voskamp4, Thomas Hansen5, Thomas Hansen6, Klaus D. Jakobsen5, Klaus D. Jakobsen6, Pierandrea Muglia7, Clyde Francks7, Paul M. Matthews8, Arnaldur Gylfason1, Bjarni V. Halldorsson1, Daniel F. Gudbjartsson1, Thorgeir E. Thorgeirsson1, Asgeir Sigurdsson1, Adalbjorg Jonasdottir1, Aslaug Jonasdottir1, Asgeir Björnsson1, Sigurborg Mattiasdottir1, Thorarinn Blondal1, Magnús Haraldsson, Brynja B. Magnusdottir, Ina Giegling2, Hans-Jürgen Möller2, Annette M. Hartmann2, Kevin V. Shianna9, Dongliang Ge9, Anna C. Need9, Caroline Crombie10, Gillian Fraser10, Nicholas Walker, Jouko Lönnqvist, Jaana Suvisaari, Annamarie Tuulio-Henriksson, Tiina Paunio, T. Toulopoulou11, Elvira Bramon11, Marta Di Forti11, Robin M. Murray11, Mirella Ruggeri12, Evangelos Vassos11, Sarah Tosato12, Muriel Walshe11, Tao Li11, Tao Li13, Catalina Vasilescu3, Thomas W. Mühleisen3, August G. Wang6, Henrik Ullum6, Srdjan Djurovic14, Ingrid Melle, Jes Olesen15, Lambertus A. Kiemeney16, Barbara Franke16, Chiara Sabatti17, Nelson B. Freimer17, Jeffrey R. Gulcher1, Unnur Thorsteinsdottir1, Augustine Kong1, Ole A. Andreassen14, Roel A. Ophoff17, Roel A. Ophoff4, Alexander Georgi18, Marcella Rietschel18, Thomas Werge6, Hannes Petursson, David Goldstein9, Markus M. Nöthen3, Leena Peltonen19, Leena Peltonen20, David A. Collier13, David A. Collier11, David St Clair10, Kari Stefansson21, Kari Stefansson1 
11 Sep 2008-Nature
TL;DR: In a genome-wide search for CNVs associating with schizophrenia, a population-based sample was used to identify de novo CNVs by analysing 9,878 transmissions from parents to offspring and three deletions significantly associate with schizophrenia and related psychoses in the combined sample.
Abstract: Reduced fecundity, associated with severe mental disorders, places negative selection pressure on risk alleles and may explain, in part, why common variants have not been found that confer risk of disorders such as autism, schizophrenia and mental retardation. Thus, rare variants may account for a larger fraction of the overall genetic risk than previously assumed. In contrast to rare single nucleotide mutations, rare copy number variations (CNVs) can be detected using genome-wide single nucleotide polymorphism arrays. This has led to the identification of CNVs associated with mental retardation and autism. In a genome-wide search for CNVs associating with schizophrenia, we used a population-based sample to identify de novo CNVs by analysing 9,878 transmissions from parents to offspring. The 66 de novo CNVs identified were tested for association in a sample of 1,433 schizophrenia cases and 33,250 controls. Three deletions at 1q21.1, 15q11.2 and 15q13.3 showing nominal association with schizophrenia in the first sample (phase I) were followed up in a second sample of 3,285 cases and 7,951 controls (phase II). All three deletions significantly associate with schizophrenia and related psychoses in the combined sample. The identification of these rare, recurrent risk variants, having occurred independently in multiple founders and being subject to negative selection, is important in itself. CNV analysis may also point the way to the identification of additional and more prevalent risk variants in genes and pathways involved in schizophrenia.

1,767 citations


Journal ArticleDOI
01 Mar 2008-Genetics
TL;DR: A new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping and takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows for substantially increase the computational speed and reliability of the results.
Abstract: Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.

Journal ArticleDOI
Jennifer Stone1, Jennifer Stone2, Jennifer Stone3, Michael Conlon O'Donovan4, Hugh Gurling5, George Kirov4, Douglas Blackwood6, Aiden Corvin7, Nicholas John Craddock4, Michael Gill7, Christina M. Hultman8, Christina M. Hultman9, Paul Lichtenstein8, Andrew McQuillin5, Carlos N. Pato10, Douglas M. Ruderfer1, Douglas M. Ruderfer2, Douglas M. Ruderfer3, Michael John Owen4, David St Clair11, Patrick F. Sullivan12, Pamela Sklar3, Pamela Sklar2, Pamela Sklar1, Shaun Purcell3, Shaun Purcell1, Shaun Purcell2, Joshua M. Korn3, Joshua M. Korn2, Stuart MacGregor13, Derek W. Morris7, Colm O'Dushlaine7, Mark J. Daly2, Mark J. Daly3, Mark J. Daly1, Peter M. Visscher13, Peter Holmans4, Edward M. Scolnick2, Edward M. Scolnick1, Nigel Williams4, Lucy Georgieva4, Ivan Nikolov4, Nadine Norton4, Hywel Williams4, Draga Toncheva, Vihra Milanova, Emma Flordal Thelander8, Patrick Sullivan12, Elaine Kenny7, John L. Waddington14, Khalid Choudhury5, Susmita Datta5, Jonathan Pimm5, Srinivasa Thirumalai15, Vinay Puri5, Robert Krasucki5, Jacob Lawrence5, Digby Quested16, Nicholas Bass5, David Curtis17, Caroline Crombie11, Gillian Fraser11, Soh Leh Kwan11, Nicholas Walker, Walter J. Muir6, Kevin A. McGhee6, Ben S. Pickard6, P. Malloy6, Alan W Maclean6, Margaret Van Beck6, Michele T. Pato10, Helena Medeiros10, Frank A. Middleton18, Célia Barreto Carvalho10, Christopher P. Morley18, Ayman H. Fanous, David V. Conti10, James A. Knowles10, Carlos Ferreira, António Macedo19, M. Helena Azevedo19, Steve McCarroll2, Steve McCarroll3, Mark J. Daly3, Mark J. Daly2, Mark J. Daly1, Kimberly Chambert1, Kimberly Chambert2, Casey Gates2, Stacey Gabriel2, Scott Mahon2, Kristen Ardlie2 
11 Sep 2008-Nature
TL;DR: A genome-wide survey of rare CNVs in 3,391 patients with schizophrenia and 3,181 ancestrally matched controls provides strong support for a model of schizophrenia pathogenesis that includes the effects of multiple rare structural variants, both genome- wide and at specific loci.
Abstract: Schizophrenia is a severe mental disorder marked by hallucinations, delusions, cognitive deficits and apathy, with a heritability estimated at 73 - 90% ( ref. 1). Inheritance patterns are complex, and the number and type of genetic variants involved are not understood. Copy number variants ( CNVs) have been identified in individual patients with schizophrenia(2-7) and also in neurodevelopmental disorders(8-11), but large- scale genome- wide surveys have not been performed. Here we report a genome- wide survey of rare CNVs in 3,391 patients with schizophrenia and 3,181 ancestrally matched controls, using high- density microarrays. For CNVs that were observed in less than 1% of the sample and were more than 100 kilobases in length, the total burden is increased 1.15- fold in patients with schizophrenia in comparison with controls. This effect was more pronounced for rarer, single- occurrence CNVs and for those that involved genes as opposed to those that did not. As expected, deletions were found within the region critical for velo- cardio- facial syndrome, which includes psychotic symptoms in 30% of patients(12). Associations with schizophrenia were also found for large deletions on chromosome 15q13.3 and 1q21.1. These associations have not previously been reported, and they remained significant after genome- wide correction. Our results provide strong support for a model of schizophrenia pathogenesis that includes the effects of multiple rare structural variants, both genome- wide and at specific loci.

Journal ArticleDOI
07 Nov 2008-Science
TL;DR: The intellectual foundations of genetic mapping of Mendelian and complex traits in humans are discussed, lessons emerging from linkage analysis of MendELian diseases and genome-wide association studies of common diseases are examined, and questions and challenges that lie ahead are discussed.
Abstract: Genetic mapping provides a powerful approach to identify genes and biological processes underlying any trait influenced by inheritance, including human diseases We discuss the intellectual foundations of genetic mapping of Mendelian and complex traits in humans, examine lessons emerging from linkage analysis of Mendelian diseases and genome-wide association studies of common diseases, and discuss questions and challenges that lie ahead

Journal ArticleDOI
TL;DR: In this paper, the E-cadherin binding partner beta-catenin was found to be necessary, but not sufficient, for the formation of anoikis resistance.
Abstract: Loss of the epithelial adhesion molecule E-cadherin is thought to enable metastasis by disrupting intercellular contacts-an early step in metastatic dissemination. To further investigate the molecular basis of this notion, we use two methods to inhibit E-cadherin function that distinguish between E-cadherin's cell-cell adhesion and intracellular signaling functions. Whereas the disruption of cell-cell contacts alone does not enable metastasis, the loss of E-cadherin protein does, through induction of an epithelial-to-mesenchymal transition, invasiveness, and anoikis resistance. We find the E-cadherin binding partner beta-catenin to be necessary, but not sufficient, for induction of these phenotypes. In addition, gene expression analysis shows that E-cadherin loss results in the induction of multiple transcription factors, at least one of which, Twist, is necessary for E-cadherin loss-induced metastasis. These findings indicate that E-cadherin loss in tumors contributes to metastatic dissemination by inducing wide-ranging transcriptional and functional changes.

Journal ArticleDOI
03 Jul 2008-Nature
TL;DR: It is demonstrated that RNA inhibition of transcription factors can facilitate reprogramming, and that treatment with DNA methyltransferase inhibitors can improve the overall efficiency of the reprograming process.
Abstract: Somatic cells can be reprogrammed to a pluripotent state through the ectopic expression of defined transcription factors. Understanding the mechanism and kinetics of this transformation may shed light on the nature of developmental potency and suggest strategies with improved efficiency or safety. Here we report an integrative genomic analysis of reprogramming of mouse fibroblasts and B lymphocytes. Lineage-committed cells show a complex response to the ectopic expression involving induction of genes downstream of individual reprogramming factors. Fully reprogrammed cells show gene expression and epigenetic states that are highly similar to embryonic stem cells. In contrast, stable partially reprogrammed cell lines show reactivation of a distinctive subset of stem-cell-related genes, incomplete repression of lineage-specifying transcription factors, and DNA hypermethylation at pluripotency-related loci. These observations suggest that some cells may become trapped in partially reprogrammed states owing to incomplete repression of transcription factors, and that DNA de-methylation is an inefficient step in the transition to pluripotency. We demonstrate that RNA inhibition of transcription factors can facilitate reprogramming, and that treatment with DNA methyltransferase inhibitors can improve the overall efficiency of the reprogramming process.

Journal ArticleDOI
07 Aug 2008-Oncogene
TL;DR: It is shown that BIBW2992, an anilino-quinazoline designed to irreversibly bind EGFR and HER2, potently suppresses the kinase activity of wild-type and activated EGFRand HER2 mutants, including erlotinib-resistant isoforms.
Abstract: Genetic alterations in the kinase domain of the epidermal growth factor receptor (EGFR) in non-small cell lung cancer (NSCLC) patients are associated with sensitivity to treatment with small molecule tyrosine kinase inhibitors. Although first-generation reversible, ATP-competitive inhibitors showed encouraging clinical responses in lung adenocarcinoma tumors harboring such EGFR mutations, almost all patients developed resistance to these inhibitors over time. Such resistance to first-generation EGFR inhibitors was frequently linked to an acquired T790M point mutation in the kinase domain of EGFR, or upregulation of signaling pathways downstream of HER3. Overcoming these mechanisms of resistance, as well as primary resistance to reversible EGFR inhibitors driven by a subset of EGFR mutations, will be necessary for development of an effective targeted therapy regimen. Here, we show that BIBW2992, an anilino-quinazoline designed to irreversibly bind EGFR and HER2, potently suppresses the kinase activity of wild-type and activated EGFR and HER2 mutants, including erlotinib-resistant isoforms. Consistent with this activity, BIBW2992 suppresses transformation in isogenic cell-based assays, inhibits survival of cancer cell lines and induces tumor regression in xenograft and transgenic lung cancer models, with superior activity over erlotinib. These findings encourage further testing of BIBW2992 in lung cancer patients harboring EGFR or HER2 oncogenes.

Journal ArticleDOI
TL;DR: The results show that numerous genes, some with known immune-related functions, predispose to SLE, and evidence of association with replication is found at FCGR2A, PTPN22 and STAT4, regions previously associated with SLE and other autoimmune diseases.
Abstract: Systemic lupus erythematosus (SLE) is a common systemic autoimmune disease with complex etiology but strong clustering in families (lambda(S) = approximately 30). We performed a genome-wide association scan using 317,501 SNPs in 720 women of European ancestry with SLE and in 2,337 controls, and we genotyped consistently associated SNPs in two additional independent sample sets totaling 1,846 affected women and 1,825 controls. Aside from the expected strong association between SLE and the HLA region on chromosome 6p21 and the previously confirmed non-HLA locus IRF5 on chromosome 7q32, we found evidence of association with replication (1.1 x 10(-7) or =9 other loci (P < 2 x 10(-7)). Our results show that numerous genes, some with known immune-related functions, predispose to SLE.

Journal ArticleDOI
24 Apr 2008-Nature
TL;DR: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products.
Abstract: Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.

Journal ArticleDOI
01 May 2008-Nature
TL;DR: This work employs a clone-based method to interrogate intermediate structural variation in eight individuals of diverse geographic ancestry and provides the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
Abstract: Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

Journal ArticleDOI
TL;DR: The results suggest that ion channelopathies may be involved in the pathogenesis of bipolar disorder and found further support for the previously reported CACNA1C.
Abstract: To identify susceptibility loci for bipolar disorder, we tested 1.8 million variants in 4,387 cases and 6,209 controls and identified a region of strong association (rs10994336, P = 9.1 x 10(-9)) in ANK3 (ankyrin G). We also found further support for the previously reported CACNA1C (alpha 1C subunit of the L-type voltage-gated calcium channel; combined P = 7.0 x 10(-8), rs1006737). Our results suggest that ion channelopathies may be involved in the pathogenesis of bipolar disorder.

Journal ArticleDOI
TL;DR: A large, training–testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas, providing the largest available set of microarray data with extensive pathological and clinical annotation for lungAdenocARCinomas.
Abstract: Although prognostic gene expression signatures for survival in early-stage lung cancer have been proposed, for clinical application, it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) could be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early-stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas.

Journal ArticleDOI
TL;DR: It is proposed that large CpG islands depleted of activating motifs confer epigenetic memory by recruiting the full repertoire of Polycomb complexes in pluripotent cells.
Abstract: In embryonic stem (ES) cells, bivalent chromatin domains with overlapping repressive (H3 lysine 27 tri-methylation) and activating (H3 lysine 4 tri-methylation) histone modifications mark the promoters of more than 2,000 genes. To gain insight into the structure and function of bivalent domains, we mapped key histone modifications and subunits of Polycomb-repressive complexes 1 and 2 (PRC1 and PRC2) genomewide in human and mouse ES cells by chromatin immunoprecipitation, followed by ultra high-throughput sequencing. We find that bivalent domains can be segregated into two classes—the first occupied by both PRC2 and PRC1 (PRC1-positive) and the second specifically bound by PRC2 (PRC2-only). PRC1-positive bivalent domains appear functionally distinct as they more efficiently retain lysine 27 tri-methylation upon differentiation, show stringent conservation of chromatin state, and associate with an overwhelming number of developmental regulator gene promoters. We also used computational genomics to search for sequence determinants of Polycomb binding. This analysis revealed that the genomewide locations of PRC2 and PRC1 can be largely predicted from the locations, sizes, and underlying motif contents of CpG islands. We propose that large CpG islands depleted of activating motifs confer epigenetic memory by recruiting the full repertoire of Polycomb complexes in pluripotent cells.

Journal ArticleDOI
TL;DR: The first known small-molecule inhibitor of BMP signaling-dorsomorphin is described, which was identified in a screen for compounds that perturb dorsoventral axis formation in zebrafish and found that dorsomorphin selectively inhibits the BMP type I receptors ALK2, ALK3 and ALK6 and thus blocks BMP-mediated SMAD1/5/8 phosphorylation, target gene transcription and osteogenic differentiation.
Abstract: Bone morphogenetic protein (BMP) signals coordinate developmental patterning and have essential physiological roles in mature organisms. Here we describe the first known small-molecule inhibitor of BMP signaling—dorsomorphin, which we identified in a screen for compounds that perturb dorsoventral axis formation in zebrafish. We found that dorsomorphin selectively inhibits the BMP type I receptors ALK2, ALK3 and ALK6 and thus blocks BMP-mediated SMAD1/5/8 phosphorylation, target gene transcription and osteogenic differentiation. Using dorsomorphin, we examined the role of BMP signaling in iron homeostasis. In vitro, dorsomorphin inhibited BMP-, hemojuvelin- and interleukin 6–stimulated expression of the systemic iron regulator hepcidin, which suggests that BMP receptors regulate hepcidin induction by all of these stimuli. In vivo, systemic challenge with iron rapidly induced SMAD1/5/8 phosphorylation and hepcidin expression in the liver, whereas treatment with dorsomorphin blocked SMAD1/5/8 phosphorylation, normalized hepcidin expression and increased serum iron levels. These findings suggest an essential physiological role for hepatic BMP signaling in iron-hepcidin homeostasis.

Journal ArticleDOI
TL;DR: A map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1% is developed, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported.
Abstract: Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.

Journal ArticleDOI
TL;DR: A general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads is described.
Abstract: New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun “microreads.” For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80× coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.

Journal ArticleDOI
17 Jan 2008-Nature
TL;DR: It is found that partial loss of function of the ribosomal subunit protein RPS14 phenocopies the disease in normal haematopoietic progenitor cells, and also that forced expression of RPS 14 rescues the disease phenotype in patient-derived bone marrow cells.
Abstract: Somatic chromosomal deletions in cancer are thought to indicate the location of tumour suppressor genes, by which a complete loss of gene function occurs through biallelic deletion, point mutation or epigenetic silencing, thus fulfilling Knudson's two-hit hypothesis. In many recurrent deletions, however, such biallelic inactivation has not been found. One prominent example is the 5q- syndrome, a subtype of myelodysplastic syndrome characterized by a defect in erythroid differentiation. Here we describe an RNA-mediated interference (RNAi)-based approach to discovery of the 5q- disease gene. We found that partial loss of function of the ribosomal subunit protein RPS14 phenocopies the disease in normal haematopoietic progenitor cells, and also that forced expression of RPS14 rescues the disease phenotype in patient-derived bone marrow cells. In addition, we identified a block in the processing of pre-ribosomal RNA in RPS14-deficient cells that is functionally equivalent to the defect in Diamond-Blackfan anaemia, linking the molecular pathophysiology of the 5q- syndrome to a congenital syndrome causing bone marrow failure. These results indicate that the 5q- syndrome is caused by a defect in ribosomal protein function and suggest that RNAi screening is an effective strategy for identifying causal haploinsufficiency disease genes.

Journal ArticleDOI
16 Oct 2008-Nature
TL;DR: In this paper, the authors reported the detection of previously unknown mutations in the ALK gene, which encodes a receptor tyrosine kinase, in 8% of primary neuroblastomas.
Abstract: Neuroblastoma, an embryonal tumour of the peripheral sympathetic nervous system, accounts for approximately 15% of all deaths due to childhood cancer. High-risk neuroblastomas are rapidly progressive; even with intensive myeloablative chemotherapy, relapse is common and almost uniformly fatal. Here we report the detection of previously unknown mutations in the ALK gene, which encodes a receptor tyrosine kinase, in 8% of primary neuroblastomas. Five non-synonymous sequence variations were identified in the kinase domain of ALK, of which three were somatic and two were germ line. The most frequent mutation, F1174L, was also identified in three different neuroblastoma cell lines. ALK complementary DNAs encoding the F1174L and R1275Q variants, but not the wild-type ALK cDNA, transformed interleukin-3-dependent murine haematopoietic Ba/F3 cells to cytokine-independent growth. Ba/F3 cells expressing these mutations were sensitive to the small-molecule inhibitor of ALK, TAE684 (ref. 4). Furthermore, two human neuroblastoma cell lines harbouring the F1174L mutation were also sensitive to the inhibitor. Cytotoxicity was associated with increased amounts of apoptosis as measured by TdT-mediated dUTP nick end labelling (TUNEL). Short hairpin RNA (shRNA)-mediated knockdown of ALK expression in neuroblastoma cell lines with the F1174L mutation also resulted in apoptosis and impaired cell proliferation. Thus, activating alleles of the ALK receptor tyrosine kinase are present in primary neuroblastoma tumours and in established neuroblastoma cell lines, and confer sensitivity to ALK inhibition with small molecules, providing a molecular rationale for targeted therapy of this disease.

Journal ArticleDOI
19 Dec 2008-Science
TL;DR: BCL11A emerges as a therapeutic target for reactivation of HbF in β-hemoglobin disorders and occupies several discrete sites in the β-globin gene cluster, consistent with a direct role of BCL 11A in globin gene regulation.
Abstract: Differences in the amount of fetal hemoglobin (HbF) that persists into adulthood affect the severity of sickle cell disease and the beta-thalassemia syndromes. Genetic association studies have identified sequence variants in the gene BCL11A that influence HbF levels. Here, we examine BCL11A as a potential regulator of HbF expression. The high-HbF BCL11A genotype is associated with reduced BCL11A expression. Moreover, abundant expression of full-length forms of BCL11A is developmentally restricted to adult erythroid cells. Down-regulation of BCL11A expression in primary adult erythroid cells leads to robust HbF expression. Consistent with a direct role of BCL11A in globin gene regulation, we find that BCL11A occupies several discrete sites in the beta-globin gene cluster. BCL11A emerges as a therapeutic target for reactivation of HbF in beta-hemoglobin disorders.

Journal ArticleDOI
05 Jun 2008-Nature
TL;DR: It is shown that Drosophila generates a third small RNA class, endogenous small interfering RNAs, in both gonadal and somatic tissues, adding a class that blurs distinctions based on known biogenesis mechanisms and functional roles.
Abstract: Drosophila endogenous small RNAs are categorized according to their mechanisms of biogenesis and the Argonaute protein to which they bind. MicroRNAs are a class of ubiquitously expressed RNAs of approximately 22 nucleotides in length, which arise from structured precursors through the action of Drosha-Pasha and Dicer-1-Loquacious complexes. These join Argonaute-1 to regulate gene expression. A second endogenous small RNA class, the Piwi-interacting RNAs, bind Piwi proteins and suppress transposons. Piwi-interacting RNAs are restricted to the gonad, and at least a subset of these arises by Piwi-catalysed cleavage of single-stranded RNAs. Here we show that Drosophila generates a third small RNA class, endogenous small interfering RNAs, in both gonadal and somatic tissues. Production of these RNAs requires Dicer-2, but a subset depends preferentially on Loquacious rather than the canonical Dicer-2 partner, R2D2 (ref. 14). Endogenous small interfering RNAs arise both from convergent transcription units and from structured genomic loci in a tissue-specific fashion. They predominantly join Argonaute-2 and have the capacity, as a class, to target both protein-coding genes and mobile elements. These observations expand the repertoire of small RNAs in Drosophila, adding a class that blurs distinctions based on known biogenesis mechanisms and functional roles.

Journal ArticleDOI
TL;DR: A comparison of the strongest associations with the genome-wide scan of 1868 patients with BP disorder and 2938 controls who completed the scan as part of the Wellcome Trust Case–Control Consortium indicates concordant signals for SNPs within the voltage-dependent calcium channel, L-type, alpha 1C subunit (CACNA1C) gene.
Abstract: We performed a genome-wide association scan in 1461 patients with bipolar (BP) 1 disorder, 2008 controls drawn from the Systematic Treatment Enhancement Program for Bipolar Disorder and the University College London sample collections with successful genotyping for 372,193 single nucleotide polymorphisms (SNPs). Our strongest single SNP results are found in myosin5B (MYO5B; P=1.66 x 10(-7)) and tetraspanin-8 (TSPAN8; P=6.11 x 10(-7)). Haplotype analysis further supported single SNP results highlighting MYO5B, TSPAN8 and the epidermal growth factor receptor (MYO5B; P=2.04 x 10(-8), TSPAN8; P=7.57 x 10(-7) and EGFR; P=8.36 x 10(-8)). For replication, we genotyped 304 SNPs in family-based NIMH samples (n=409 trios) and University of Edinburgh case-control samples (n=365 cases, 351 controls) that did not provide independent replication after correction for multiple testing. A comparison of our strongest associations with the genome-wide scan of 1868 patients with BP disorder and 2938 controls who completed the scan as part of the Wellcome Trust Case-Control Consortium indicates concordant signals for SNPs within the voltage-dependent calcium channel, L-type, alpha 1C subunit (CACNA1C) gene. Given the heritability of BP disorder, the lack of agreement between studies emphasizes that susceptibility alleles are likely to be modest in effect size and require even larger samples for detection.