scispace - formally typeset
Search or ask a question
Institution

Broad Institute

NonprofitCambridge, Massachusetts, United States
About: Broad Institute is a nonprofit organization based out in Cambridge, Massachusetts, United States. It is known for research contribution in the topics: Population & Genome-wide association study. The organization has 6584 authors who have published 11618 publications receiving 1522743 citations. The organization is also known as: Eli and Edythe L. Broad Institute of MIT and Harvard.


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

519 citations

Journal ArticleDOI
TL;DR: Clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) is detected using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies to identify common deleted regions with genes previously associated with hematological cancers.
Abstract: We detected clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells with the same abnormal karyotype (>5-10%; presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rapidly rises to 2-3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions with genes previously associated with these cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer before DNA sampling, those without a previous diagnosis have an estimated tenfold higher risk of a subsequent hematological cancer (95% confidence interval = 6-18).

519 citations

Journal ArticleDOI
TL;DR: In this paper, a software called ichorCNA was proposed to quantitatively measure tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations.
Abstract: Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.

519 citations

Journal ArticleDOI
TL;DR: eCAVIAR is presented, a probabilistic method that has several key advantages over existing methods and can account for more than one causal variant in any given locus, and can leverage summary statistics without accessing the individual genotype data.
Abstract: The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.

519 citations

Journal ArticleDOI
TL;DR: Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
Abstract: Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these 'hotspot' sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.

518 citations


Authors

Showing all 7146 results

NameH-indexPapersCitations
Eric S. Lander301826525976
Albert Hofman2672530321405
Frank B. Hu2501675253464
David J. Hunter2131836207050
Kari Stefansson206794174819
Mark J. Daly204763304452
Lewis C. Cantley196748169037
Matthew Meyerson194553243726
Gad Getz189520247560
Stacey Gabriel187383294284
Stuart H. Orkin186715112182
Ralph Weissleder1841160142508
Chris Sander178713233287
Michael I. Jordan1761016216204
Richard A. Young173520126642
Network Information
Related Institutions (5)
Howard Hughes Medical Institute
34.6K papers, 5.2M citations

96% related

Salk Institute for Biological Studies
13.1K papers, 1.6M citations

94% related

Fred Hutchinson Cancer Research Center
30.9K papers, 2.2M citations

93% related

Scripps Research Institute
32.8K papers, 2.9M citations

93% related

Genentech
17.1K papers, 1.4M citations

93% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202337
2022627
20211,727
20201,534
20191,364
20181,107