scispace - formally typeset
Search or ask a question

Showing papers by "Broad Institute published in 2013"


Journal ArticleDOI
TL;DR: TopHat2 is described, which incorporates many significant enhancements to TopHat, and combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes.
Abstract: TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.

11,380 citations


Journal ArticleDOI
Ludmil B. Alexandrov1, Serena Nik-Zainal2, Serena Nik-Zainal3, David C. Wedge1, Samuel Aparicio4, Sam Behjati5, Sam Behjati1, Andrew V. Biankin, Graham R. Bignell1, Niccolo Bolli5, Niccolo Bolli1, Åke Borg3, Anne Lise Børresen-Dale6, Anne Lise Børresen-Dale7, Sandrine Boyault8, Birgit Burkhardt8, Adam Butler1, Carlos Caldas9, Helen Davies1, Christine Desmedt, Roland Eils5, Jorunn E. Eyfjord10, John A. Foekens11, Mel Greaves12, Fumie Hosoda13, Barbara Hutter5, Tomislav Ilicic1, Sandrine Imbeaud14, Sandrine Imbeaud15, Marcin Imielinsk14, Natalie Jäger5, David T. W. Jones16, David T. Jones1, Stian Knappskog11, Stian Knappskog17, Marcel Kool11, Sunil R. Lakhani18, Carlos López-Otín18, Sancha Martin1, Nikhil C. Munshi19, Nikhil C. Munshi20, Hiromi Nakamura13, Paul A. Northcott16, Marina Pajic21, Elli Papaemmanuil1, Angelo Paradiso22, John V. Pearson23, Xose S. Puente18, Keiran Raine1, Manasa Ramakrishna1, Andrea L. Richardson20, Andrea L. Richardson22, Julia Richter22, Philip Rosenstiel22, Matthias Schlesner5, Ton N. Schumacher24, Paul N. Span25, Jon W. Teague1, Yasushi Totoki13, Andrew Tutt24, Rafael Valdés-Mas18, Marit M. van Buuren25, Laura van ’t Veer26, Anne Vincent-Salomon27, Nicola Waddell23, Lucy R. Yates1, Icgc PedBrain24, Jessica Zucman-Rossi15, Jessica Zucman-Rossi14, P. Andrew Futreal1, Ultan McDermott1, Peter Lichter24, Matthew Meyerson20, Matthew Meyerson14, Sean M. Grimmond23, Reiner Siebert22, Elias Campo28, Tatsuhiro Shibata13, Stefan M. Pfister11, Stefan M. Pfister16, Peter J. Campbell29, Peter J. Campbell2, Peter J. Campbell30, Michael R. Stratton2, Michael R. Stratton31 
22 Aug 2013-Nature
TL;DR: It is shown that hypermutation localized to small genomic regions, ‘kataegis’, is found in many cancer types, and this results reveal the diversity of mutational processes underlying the development of cancer.
Abstract: All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.

7,904 citations


Journal ArticleDOI
TL;DR: The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution.
Abstract: Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today’s sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license.

6,930 citations


Journal ArticleDOI
TL;DR: The results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.
Abstract: Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community's functional capabilities. Here we describe PICRUSt (phylogenetic investigation of communities by reconstruction of unobserved states), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.

6,860 citations


Journal ArticleDOI
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

6,369 citations


Journal ArticleDOI
TL;DR: This unit describes how to use BWA and the Genome Analysis Toolkit to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses.
Abstract: This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

5,150 citations


Journal ArticleDOI
TL;DR: A method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples and prediction accuracy is corroborated using 3,809 transcriptional profiles available elsewhere in the public domain.
Abstract: Infiltrating stromal and immune cells form the major fraction of normal cells in tumour tissue and not only perturb the tumour signal in molecular studies but also have an important role in cancer biology. Here we describe 'Estimation of STromal and Immune cells in MAlignant Tumours using Expression data' (ESTIMATE)--a method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples. ESTIMATE scores correlate with DNA copy number-based tumour purity across samples from 11 different tumour types, profiled on Agilent, Affymetrix platforms or based on RNA sequencing and available through The Cancer Genome Atlas. The prediction accuracy is further corroborated using 3,809 transcriptional profiles available elsewhere in the public domain. The ESTIMATE method allows consideration of tumour-associated normal cells in genomic and transcriptomic studies. An R-library is available on https://sourceforge.net/projects/estimateproject/.

4,651 citations


Journal ArticleDOI
Michael S. Lawrence1, Petar Stojanov1, Petar Stojanov2, Paz Polak1, Paz Polak2, Paz Polak3, Gregory V. Kryukov1, Gregory V. Kryukov2, Gregory V. Kryukov3, Kristian Cibulskis1, Andrey Sivachenko1, Scott L. Carter1, Chip Stewart1, Craig H. Mermel1, Craig H. Mermel2, Steven A. Roberts4, Adam Kiezun1, Peter S. Hammerman1, Peter S. Hammerman2, Aaron McKenna5, Aaron McKenna1, Yotam Drier, Lihua Zou1, Alex H. Ramos1, Trevor J. Pugh2, Trevor J. Pugh1, Nicolas Stransky1, Elena Helman6, Elena Helman1, Jaegil Kim1, Carrie Sougnez1, Lauren Ambrogio1, Elizabeth Nickerson1, Erica Shefler1, Maria L. Cortes1, Daniel Auclair1, Gordon Saksena1, Douglas Voet1, Michael S. Noble1, Daniel DiCara1, Pei Lin1, Lee Lichtenstein1, David I. Heiman1, Timothy Fennell1, Marcin Imielinski2, Marcin Imielinski1, Bryan Hernandez1, Eran Hodis2, Eran Hodis1, Sylvan C. Baca2, Sylvan C. Baca1, Austin M. Dulak2, Austin M. Dulak1, Jens G. Lohr1, Jens G. Lohr2, Dan A. Landau1, Dan A. Landau2, Dan A. Landau7, Catherine J. Wu2, Jorge Melendez-Zajgla, Alfredo Hidalgo-Miranda, Amnon Koren1, Amnon Koren2, Steven A. McCarroll2, Steven A. McCarroll1, Jaume Mora8, Ryan S. Lee2, Ryan S. Lee9, Brian D. Crompton9, Brian D. Crompton2, Robert C. Onofrio1, Melissa Parkin1, Wendy Winckler1, Kristin G. Ardlie1, Stacey Gabriel1, Charles W. M. Roberts2, Charles W. M. Roberts9, Jaclyn A. Biegel10, Kimberly Stegmaier2, Kimberly Stegmaier9, Kimberly Stegmaier1, Adam J. Bass2, Adam J. Bass1, Levi A. Garraway1, Levi A. Garraway2, Matthew Meyerson2, Matthew Meyerson1, Todd R. Golub, Dmitry A. Gordenin4, Shamil R. Sunyaev3, Shamil R. Sunyaev2, Shamil R. Sunyaev1, Eric S. Lander6, Eric S. Lander2, Eric S. Lander1, Gad Getz1, Gad Getz2 
11 Jul 2013-Nature
TL;DR: A fundamental problem with cancer genome studies is described: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds and the list includes many implausible genes, suggesting extensive false-positive findings that overshadow true driver events.
Abstract: Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.

4,411 citations


Journal ArticleDOI
TL;DR: In this article, the Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of single-guide RNAs (sgRNAs) to enable genome editing.
Abstract: The Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of single-guide RNAs (sgRNAs) to enable genome editing. Here, we characterize SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. Our study evaluates >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. We find that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. We also show that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. To facilitate mammalian genome engineering applications, we provide a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.

4,113 citations


Journal ArticleDOI
TL;DR: The MuTect algorithm for calling somatic point mutations enables subclonal analysis of the whole-genome or whole-exome sequencing data being generated in large-scale cancer genomics projects as discussed by the authors.
Abstract: The MuTect algorithm for calling somatic point mutations enables subclonal analysis of the whole-genome or whole-exome sequencing data being generated in large-scale cancer genomics projects.

3,773 citations


Journal ArticleDOI
TL;DR: In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.
Abstract: Eleven susceptibility loci for late-onset Alzheimer's disease (LOAD) were identified by previous studies; however, a large portion of the genetic risk for this disease remains unexplained. We conducted a large, two-stage meta-analysis of genome-wide association studies (GWAS) in individuals of European ancestry. In stage 1, we used genotyped and imputed data (7,055,881 SNPs) to perform meta-analysis on 4 previously published GWAS data sets consisting of 17,008 Alzheimer's disease cases and 37,154 controls. In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls. In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.

Journal ArticleDOI
10 Oct 2013-Cell
TL;DR: Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM.

01 Sep 2013
TL;DR: It is found that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches.
Abstract: The Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of single-guide RNAs (sgRNAs) to enable genome editing. Here, we characterize SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. Our study evaluates >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. We find that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. We also show that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. To facilitate mammalian genome engineering applications, we provide a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.

Journal ArticleDOI
09 May 2013-Cell
TL;DR: The CRISPR/Cas system allows the one-step generation of animals carrying mutations in multiple genes, an approach that will greatly accelerate the in vivo study of functionally redundant genes and of epistatic gene interactions.

Journal ArticleDOI
12 Sep 2013-Cell
TL;DR: In this paper, an approach that combines a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks is described. But the approach is limited to mouse zygotes.

Journal ArticleDOI
TL;DR: It is demonstrated that high-quality read length and abundance are the primary factors differentiating correct from erroneous reads produced by Illumina GAIIx, HiSeq and MiSeq instruments.
Abstract: High-throughput sequencing has revolutionized microbial ecology, but read quality remains a considerable barrier to accurate taxonomy assignment and α-diversity assessment for microbial communities. We demonstrate that high-quality read length and abundance are the primary factors differentiating correct from erroneous reads produced by Illumina GAIIx, HiSeq and MiSeq instruments. We present guidelines for user-defined quality-filtering strategies, enabling efficient extraction of high-quality data and facilitating interpretation of Illumina sequencing results.

Journal ArticleDOI
TL;DR: It is shown that the CRISPR-Cas system functions in vivo to induce targeted genetic modifications in zebrafish embryos with efficiencies similar to those obtained using zinc finger nucleases and transcription activator-like effector nucleases.
Abstract: In bacteria, foreign nucleic acids are silenced by clustered, regularly interspaced, short palindromic repeats (CRISPR)--CRISPR-associated (Cas) systems. Bacterial type II CRISPR systems have been adapted to create guide RNAs that direct site-specific DNA cleavage by the Cas9 endonuclease in cultured cells. Here we show that the CRISPR-Cas system functions in vivo to induce targeted genetic modifications in zebrafish embryos with efficiencies similar to those obtained using zinc finger nucleases and transcription activator-like effector nucleases.

Journal ArticleDOI
Cristen J. Willer1, Ellen M. Schmidt1, Sebanti Sengupta1, Gina M. Peloso2  +316 moreInstitutions (87)
TL;DR: It is found that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index.
Abstract: Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.

Journal ArticleDOI
TL;DR: Key concepts in the function of DNA methylation in mammals are discussed, stemming from more than two decades of research, including many recent studies that have elucidated when and whereDNA methylation has a regulatory role in the genome.
Abstract: DNA methylation is among the best studied epigenetic modifications and is essential to mammalian development. Although the methylation status of most CpG dinucleotides in the genome is stably propagated through mitosis, improvements to methods for measuring methylation have identified numerous regions in which it is dynamically regulated. In this Review, we discuss key concepts in the function of DNA methylation in mammals, stemming from more than two decades of research, including many recent studies that have elucidated when and where DNA methylation has a regulatory role in the genome. We include insights from early development, embryonic stem cells and adult lineages, particularly haematopoiesis, to highlight the general features of this modification as it participates in both global and localized epigenetic regulation.

Journal ArticleDOI
11 Apr 2013-Nature
TL;DR: The authors showed that inhibition of glycolysis with 2-deoxyglucose suppresses lipopolysaccharide-induced interleukin-1β but not tumour-necrosis factor-α in mouse macrophages.
Abstract: Macrophages activated by the Gram-negative bacterial product lipopolysaccharide switch their core metabolism from oxidative phosphorylation to glycolysis. Here we show that inhibition of glycolysis with 2-deoxyglucose suppresses lipopolysaccharide-induced interleukin-1β but not tumour-necrosis factor-α in mouse macrophages. A comprehensive metabolic map of lipopolysaccharide-activated macrophages shows upregulation of glycolytic and downregulation of mitochondrial genes, which correlates directly with the expression profiles of altered metabolites. Lipopolysaccharide strongly increases the levels of the tricarboxylic-acid cycle intermediate succinate. Glutamine-dependent anerplerosis is the principal source of succinate, although the 'GABA (γ-aminobutyric acid) shunt' pathway also has a role. Lipopolysaccharide-induced succinate stabilizes hypoxia-inducible factor-1α, an effect that is inhibited by 2-deoxyglucose, with interleukin-1β as an important target. Lipopolysaccharide also increases succinylation of several proteins. We therefore identify succinate as a metabolite in innate immune signalling, which enhances interleukin-1β production during inflammation.

Journal ArticleDOI
TL;DR: The exhaustively analyze dual-RNA:Cas9 target requirements to define the range of targetable sequences and show strategies for editing sites that do not meet these requirements, suggesting the versatility of this technique for bacterial genome engineering.
Abstract: The targeting of nucleases to specific DNA sequences facilitates genome editing. Recent work demonstrated that the CRISPR-associated (Cas) nuclease Cas9 can be targeted to sequences in vitro simply by modifying a short7 CRISPR RNA (crRNA) guide. Here we use this CRISPR-Cas system to introduce marker-free mutations in Streptococcus pneumoniae and Escherichia coli. The approach involves re-programming Cas9 by using a crRNA complementary to a target chromosomal locus and introducing a template DNA harboring a desired mutation and an altered crRNA recognition site for recombination with the target locus. We exhaustively analyze Cas9 target requirements to define the range of targetable sequences and show strategies for editing sites that do not meet these requirements. Alone or together with recombineering, CRISPR assisted editing induces recombination at the targeted locus and kills non-edited cells leading to a recovery of close to a 100% of edited cells. Multiple crRNA can be used to modify several loci simultaneously. Our results show that CRISPR-mediated genome editing only requires programming of the crRNA and template sequences and thus constitutes a useful tool for genetic engineering.

01 Dec 2013
TL;DR: A pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library is described and it is shown that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs.
Abstract: The bacterial CRISPR/Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, while another for the DNA topoisomerase II (TOP2A) poison etoposide identified TOP2A, as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Finally, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/ sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.

Journal ArticleDOI
S. Hong Lee1, Stephan Ripke2, Stephan Ripke3, Benjamin M. Neale2  +402 moreInstitutions (124)
TL;DR: Empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.
Abstract: Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.

01 Sep 2013
TL;DR: It is demonstrated that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency.
Abstract: Targeted genome editing technologies have enabled a broad range of research and medical applications. The Cas9 nuclease from the microbial CRISPR-Cas system is targeted to specific genomic loci by a 20 nt guide sequence, which can tolerate certain mismatches to the DNA target and thereby promote undesired off-target mutagenesis. Here, we describe an approach that combines a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks. Because individual nicks in the genome are repaired with high fidelity, simultaneous nicking via appropriately offset guide RNAs is required for double-stranded breaks and extends the number of specifically recognized bases for target cleavage. We demonstrate that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency. This versatile strategy enables a wide variety of genome editing applications that require high specificity.

Journal ArticleDOI
26 Sep 2013-Nature
TL;DR: Se sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences discover extremely widespread genetic variation affecting the regulation of most genes.
Abstract: Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

Journal ArticleDOI
TL;DR: Data suggest that, through recruitment of tumor-infiltrating immune cells, fusobacteria generate a proinflammatory microenvironment that is conducive for colorectal neoplasia progression, and this work finds that F.nucleatum does not exacerbate colitis, enteritis, or inflammation-associated intestinal carcinogenesis.

Journal ArticleDOI
22 Feb 2013-Science
TL;DR: Two independent mutations within the core promoter of telomerase reverse transcriptase (TERT) are described, which collectively occur in 50 of 70 melanomas examined, suggesting somatic mutations in regulatory regions of the genome may represent an important tumorigenic mechanism.
Abstract: Systematic sequencing of human cancer genomes has identified many recurrent mutations in the protein-coding regions of genes but rarely in gene regulatory regions. Here, we describe two independent mutations within the core promoter of telomerase reverse transcriptase (TERT), the gene coding for the catalytic subunit of telomerase, which collectively occur in 50 of 70 (71%) melanomas examined. These mutations generate de novo consensus binding motifs for E-twenty-six (ETS) transcription factors, and in reporter assays, the mutations increased transcriptional activity from the TERT promoter by two- to fourfold. Examination of 150 cancer cell lines derived from diverse tumor types revealed the same mutations in 24 cases (16%), with preliminary evidence of elevated frequency in bladder and hepatocellular cancer cells. Thus, somatic mutations in regulatory regions of the genome may represent an important tumorigenic mechanism.

Journal ArticleDOI
TL;DR: The Cancer Genome Atlas Pan-Cancer data set was used in this article to investigate the role of SCNAs in cancer-related SCNA patterns, including whole-genome doubling, TP53 mutations, CCNE1 amplifications and alterations of PPP2R complex.
Abstract: Determining how somatic copy number alterations (SCNAs) promote cancer is an important goal. We characterized SCNA patterns in 4,934 cancers from The Cancer Genome Atlas Pan-Cancer data set. Whole-genome doubling, observed in 37% of cancers, was associated with higher rates of every other type of SCNA, TP53 mutations, CCNE1 amplifications and alterations of the PPP2R complex. SCNAs that were internal to chromosomes tended to be shorter than telomere-bounded SCNAs, suggesting different mechanisms underlying their generation. Significantly recurrent focal SCNAs were observed in 140 regions, including 102 without known oncogene or tumor suppressor gene targets and 50 with significantly mutated genes. Amplified regions without known oncogenes were enriched for genes involved in epigenetic regulation. When levels of genomic disruption were accounted for, 7% of region pairs were anticorrelated, and these regions tended to encompass genes whose proteins physically interact, suggesting related functions. These results provide insights into mechanisms of generation and functional consequences of cancer-related SCNAs.

Journal ArticleDOI
TL;DR: An association analysis in CAD cases and controls identifies 15 loci reaching genome-wide significance, taking the number of susceptibility loci for CAD to 46, and a further 104 independent variants strongly associated with CAD at a 5% false discovery rate (FDR).
Abstract: Coronary artery disease (CAD) is the commonest cause of death. Here, we report an association analysis in 63,746 CAD cases and 130,681 controls identifying 15 loci reaching genome-wide significance, taking the number of susceptibility loci for CAD to 46, and a further 104 independent variants (r(2) < 0.2) strongly associated with CAD at a 5% false discovery rate (FDR). Together, these variants explain approximately 10.6% of CAD heritability. Of the 46 genome-wide significant lead SNPs, 12 show a significant association with a lipid trait, and 5 show a significant association with blood pressure, but none is significantly associated with diabetes. Network analysis with 233 candidate genes (loci at 10% FDR) generated 5 interaction networks comprising 85% of these putative genes involved in CAD. The four most significant pathways mapping to these networks are linked to lipid metabolism and inflammation, underscoring the causal role of these activities in the genetic etiology of CAD. Our study provides insights into the genetic basis of CAD and identifies key biological pathways.

Journal ArticleDOI
Stephan Ripke1, Stephan Ripke2, Colm O'Dushlaine2, Kimberly Chambert2, Jennifer L. Moran2, Anna K. Kähler3, Anna K. Kähler4, Anna K. Kähler5, Susanne Akterin5, Sarah E. Bergen5, Ann L. Collins4, James J. Crowley4, Menachem Fromer6, Menachem Fromer2, Menachem Fromer1, Yunjung Kim4, Sang Hong Lee7, Patrik K. E. Magnusson5, Nicholas E. Sanchez2, Eli A. Stahl6, Stephanie Williams4, Naomi R. Wray7, Kai Xia4, F Bettella8, Anders D. Børglum9, Anders D. Børglum10, Anders D. Børglum11, Brendan Bulik-Sullivan1, Paul Cormican12, Nicholas John Craddock13, Christiaan de Leeuw14, Christiaan de Leeuw15, Naser Durmishi, Michael Gill12, Vera Golimbet16, Marian L. Hamshere13, Peter Holmans13, David M. Hougaard17, Kenneth S. Kendler18, Kuang Fei Lin19, Derek W. Morris12, Ole Mors9, Ole Mors11, Preben Bo Mortensen11, Preben Bo Mortensen10, Benjamin M. Neale1, Benjamin M. Neale2, Francis A. O'Neill20, Michael John Owen13, MilicaPejovic Milovancevic21, Danielle Posthuma22, Danielle Posthuma15, John Powell19, Alexander Richards13, Brien P. Riley18, Douglas M. Ruderfer6, Dan Rujescu23, Dan Rujescu24, Engilbert Sigurdsson25, Teimuraz Silagadze26, August B. Smit15, Hreinn Stefansson8, Stacy Steinberg8, Jaana Suvisaari27, Sarah Tosato28, Matthijs Verhage15, James T.R. Walters13, Elvira Bramon19, Elvira Bramon29, Aiden Corvin12, Michael Conlon O'Donovan13, Kari Stefansson8, Edward M. Scolnick2, Shaun Purcell, Steve McCarroll1, Steve McCarroll2, Pamela Sklar6, Christina M. Hultman5, Patrick F. Sullivan4, Patrick F. Sullivan5 
TL;DR: The authors conducted a multi-stage genome-wide association study (GWAS) for schizophrenia and found that 8,300 independent, mostly common SNPs (95% credible interval of 6,300-10,200 SNPs) contribute to risk for schizophrenia.
Abstract: Schizophrenia is an idiopathic mental disorder with a heritable component and a substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases and 6,243 controls) followed by meta-Analysis with previous schizophrenia GWAS (8,832 cases and 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls and 581 parent-offspring trios). We identified 22 loci associated at genome-wide significance; 13 of these are new, and 1 was previously implicated in bipolar disorder. Examination of candidate genes at these loci suggests the involvement of neuronal calcium signaling. We estimate that 8,300 independent, mostly common SNPs (95% credible interval of 6,300-10,200 SNPs) contribute to risk for schizophrenia and that these collectively account for at least 32% of the variance in liability. Common genetic variation has an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this disorder.