Institution
Broad Institute
Nonprofit•Cambridge, Massachusetts, United States•
About: Broad Institute is a nonprofit organization based out in Cambridge, Massachusetts, United States. It is known for research contribution in the topics: Population & Genome-wide association study. The organization has 6584 authors who have published 11618 publications receiving 1522743 citations. The organization is also known as: Eli and Edythe L. Broad Institute of MIT and Harvard.
Topics: Population, Genome-wide association study, Genome, Gene, Chromatin
Papers published on a yearly basis
Papers
More filters
••
University of Duisburg-Essen1, ETH Zurich2, Swiss Institute of Bioinformatics3, European Bioinformatics Institute4, Harvard University5, Broad Institute6, Stanford University7, German Cancer Research Center8, Humboldt University of Berlin9, University of Basel10, Microsoft11, University of Tübingen12
TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
519 citations
••
University of Washington1, Johns Hopkins University2, University of Texas Health Science Center at Houston3, University of British Columbia4, Washington University in St. Louis5, National Institutes of Health6, University of Michigan7, Vanderbilt University8, University of Southern California9, University of Hawaii at Manoa10, Mayo Clinic11, Northwestern University12, University of Pittsburgh13, University of Iowa14, Statens Serum Institut15, Harvard University16, Fred Hutchinson Cancer Research Center17, Memorial Sloan Kettering Cancer Center18, Broad Institute19
TL;DR: Clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) is detected using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies to identify common deleted regions with genes previously associated with hematological cancers.
Abstract: We detected clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells with the same abnormal karyotype (>5-10%; presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rapidly rises to 2-3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions with genes previously associated with these cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer before DNA sampling, those without a previous diagnosis have an estimated tenfold higher risk of a subsequent hematological cancer (95% confidence interval = 6-18).
519 citations
••
TL;DR: In this paper, a software called ichorCNA was proposed to quantitatively measure tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations.
Abstract: Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.
519 citations
••
TL;DR: eCAVIAR is presented, a probabilistic method that has several key advantages over existing methods and can account for more than one causal variant in any given locus, and can leverage summary statistics without accessing the individual genotype data.
Abstract: The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.
519 citations
••
TL;DR: Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
Abstract: Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these 'hotspot' sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
518 citations
Authors
Showing all 7146 results
Name | H-index | Papers | Citations |
---|---|---|---|
Eric S. Lander | 301 | 826 | 525976 |
Albert Hofman | 267 | 2530 | 321405 |
Frank B. Hu | 250 | 1675 | 253464 |
David J. Hunter | 213 | 1836 | 207050 |
Kari Stefansson | 206 | 794 | 174819 |
Mark J. Daly | 204 | 763 | 304452 |
Lewis C. Cantley | 196 | 748 | 169037 |
Matthew Meyerson | 194 | 553 | 243726 |
Gad Getz | 189 | 520 | 247560 |
Stacey Gabriel | 187 | 383 | 294284 |
Stuart H. Orkin | 186 | 715 | 112182 |
Ralph Weissleder | 184 | 1160 | 142508 |
Chris Sander | 178 | 713 | 233287 |
Michael I. Jordan | 176 | 1016 | 216204 |
Richard A. Young | 173 | 520 | 126642 |