scispace - formally typeset
Search or ask a question
Author

Jack A. Kosmicki

Bio: Jack A. Kosmicki is an academic researcher from Broad Institute. The author has contributed to research in topics: Population & Exome sequencing. The author has an hindex of 29, co-authored 59 publications receiving 18977 citations. Previous affiliations of Jack A. Kosmicki include Harvard University & Beth Israel Deaconess Medical Center.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria1, James S. Ware, Andrew J. Hill1, Andrew J. Hill2, Andrew J. Hill4, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada1, Karol Estrada2, Fengmei Zhao2, Fengmei Zhao1, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick1, Jason Flannick2, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta2, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun2, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler2, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel2, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins16, Hugh Watkins17, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Journal ArticleDOI
Silvia De Rubeis1, Xin-Xin He2, Arthur P. Goldberg1, Christopher S. Poultney1, Kaitlin E. Samocha3, A. Ercument Cicek2, Yan Kou1, Li Liu2, Menachem Fromer3, Menachem Fromer1, R. Susan Walker4, Tarjinder Singh5, Lambertus Klei6, Jack A. Kosmicki3, Shih-Chen Fu1, Branko Aleksic7, Monica Biscaldi8, Patrick Bolton9, Jessica M. Brownfeld1, Jinlu Cai1, Nicholas G. Campbell10, Angel Carracedo11, Angel Carracedo12, Maria H. Chahrour3, Andreas G. Chiocchetti, Hilary Coon13, Emily L. Crawford10, Lucy Crooks5, Sarah Curran9, Geraldine Dawson14, Eftichia Duketis, Bridget A. Fernandez15, Louise Gallagher16, Evan T. Geller17, Stephen J. Guter18, R. Sean Hill19, R. Sean Hill3, Iuliana Ionita-Laza20, Patricia Jiménez González, Helena Kilpinen, Sabine M. Klauck21, Alexander Kolevzon1, Irene Lee22, Jing Lei2, Terho Lehtimäki, Chiao-Feng Lin17, Avi Ma'ayan1, Christian R. Marshall4, Alison L. McInnes23, Benjamin M. Neale24, Michael John Owen25, Norio Ozaki7, Mara Parellada26, Jeremy R. Parr27, Shaun Purcell1, Kaija Puura, Deepthi Rajagopalan4, Karola Rehnström5, Abraham Reichenberg1, Aniko Sabo28, Michael Sachse, Stephen Sanders29, Chad M. Schafer2, Martin Schulte-Rüther30, David Skuse31, David Skuse22, Christine Stevens24, Peter Szatmari32, Kristiina Tammimies4, Otto Valladares17, Annette Voran33, Li-San Wang17, Lauren A. Weiss29, A. Jeremy Willsey29, Timothy W. Yu3, Timothy W. Yu19, Ryan K. C. Yuen4, Edwin H. Cook18, Christine M. Freitag, Michael Gill16, Christina M. Hultman34, Thomas Lehner35, Aarno Palotie36, Aarno Palotie24, Aarno Palotie3, Gerard D. Schellenberg17, Pamela Sklar1, Matthew W. State29, James S. Sutcliffe10, Christopher A. Walsh19, Christopher A. Walsh3, Stephen W. Scherer4, Michael E. Zwick37, Jeffrey C. Barrett5, David J. Cutler37, Kathryn Roeder2, Bernie Devlin6, Mark J. Daly24, Mark J. Daly3, Joseph D. Buxbaum1 
13 Nov 2014-Nature
TL;DR: Using exome sequencing, it is shown that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate of < 0.05, plus a set of 107 genes strongly enriched for those likely to affect risk (FDR < 0.30).
Abstract: The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.

2,228 citations

Posted ContentDOI
30 Oct 2015-bioRxiv
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.

1,552 citations

Journal ArticleDOI
06 Feb 2020-Cell
TL;DR: The largest exome sequencing study of autism spectrum disorder (ASD) to date, using an enhanced analytical framework to integrate de novo and case-control rare variation, identifies 102 risk genes at a false discovery rate of 0.1 or less, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

1,169 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel2, Eric Vallabh Minikel1, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill4, Andrew J. Hill1, Andrew J. Hill2, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen2, Taru Tukiainen1, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan1, Laramie E. Duncan2, Karol Estrada2, Karol Estrada1, Fengmei Zhao1, Fengmei Zhao2, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick1, Jason Flannick2, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta2, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun2, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso1, Gina M. Peloso2, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler2, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel2, Gad Getz1, Gad Getz2, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

01 Jan 2020
TL;DR: Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future.
Abstract: Summary Background Since December, 2019, Wuhan, China, has experienced an outbreak of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Epidemiological and clinical characteristics of patients with COVID-19 have been reported but risk factors for mortality and a detailed clinical course of illness, including viral shedding, have not been well described. Methods In this retrospective, multicentre cohort study, we included all adult inpatients (≥18 years old) with laboratory-confirmed COVID-19 from Jinyintan Hospital and Wuhan Pulmonary Hospital (Wuhan, China) who had been discharged or had died by Jan 31, 2020. Demographic, clinical, treatment, and laboratory data, including serial samples for viral RNA detection, were extracted from electronic medical records and compared between survivors and non-survivors. We used univariable and multivariable logistic regression methods to explore the risk factors associated with in-hospital death. Findings 191 patients (135 from Jinyintan Hospital and 56 from Wuhan Pulmonary Hospital) were included in this study, of whom 137 were discharged and 54 died in hospital. 91 (48%) patients had a comorbidity, with hypertension being the most common (58 [30%] patients), followed by diabetes (36 [19%] patients) and coronary heart disease (15 [8%] patients). Multivariable regression showed increasing odds of in-hospital death associated with older age (odds ratio 1·10, 95% CI 1·03–1·17, per year increase; p=0·0043), higher Sequential Organ Failure Assessment (SOFA) score (5·65, 2·61–12·23; p Interpretation The potential risk factors of older age, high SOFA score, and d-dimer greater than 1 μg/mL could help clinicians to identify patients with poor prognosis at an early stage. Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future. Funding Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences; National Science Grant for Distinguished Young Scholars; National Key Research and Development Program of China; The Beijing Science and Technology Project; and Major Projects of National Science and Technology on New Drug Creation and Development.

4,408 citations