scispace - formally typeset
Search or ask a question
Author

Benjamin M. Neale

Bio: Benjamin M. Neale is an academic researcher from Harvard University. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 115, co-authored 429 publications receiving 105365 citations. Previous affiliations of Benjamin M. Neale include University of Melbourne & Virginia Commonwealth University.


Papers
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, James S. Ware, Andrew J. Hill1, Andrew J. Hill2, Andrew J. Hill4, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen2, Taru Tukiainen1, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada1, Karol Estrada2, Fengmei Zhao2, Fengmei Zhao1, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick1, Jason Flannick2, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta2, Daniel P. Howrigan1, Daniel P. Howrigan2, Adam Kiezun2, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso1, Gina M. Peloso2, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler2, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel2, Gad Getz1, Gad Getz2, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins16, Hugh Watkins17, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur1, Daniel G. MacArthur2 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
Stephan Ripke1, Stephan Ripke2, Benjamin M. Neale1, Benjamin M. Neale2  +351 moreInstitutions (102)
24 Jul 2014-Nature
TL;DR: Associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses.
Abstract: Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain, providing biological plausibility for the findings. Many findings have the potential to provide entirely new insights into aetiology, but associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that have important roles in immunity, providing support for the speculated link between the immune system and schizophrenia.

6,809 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Journal ArticleDOI
TL;DR: It is found that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size, and the LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control.
Abstract: Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.

3,708 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The main innovations of the new version of the Arlequin program include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans.
Abstract: We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework.

13,581 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal Article
TL;DR: For the next few weeks the course is going to be exploring a field that’s actually older than classical population genetics, although the approach it’ll be taking to it involves the use of population genetic machinery.
Abstract: So far in this course we have dealt entirely with the evolution of characters that are controlled by simple Mendelian inheritance at a single locus. There are notes on the course website about gametic disequilibrium and how allele frequencies change at two loci simultaneously, but we didn’t discuss them. In every example we’ve considered we’ve imagined that we could understand something about evolution by examining the evolution of a single gene. That’s the domain of classical population genetics. For the next few weeks we’re going to be exploring a field that’s actually older than classical population genetics, although the approach we’ll be taking to it involves the use of population genetic machinery. If you know a little about the history of evolutionary biology, you may know that after the rediscovery of Mendel’s work in 1900 there was a heated debate between the “biometricians” (e.g., Galton and Pearson) and the “Mendelians” (e.g., de Vries, Correns, Bateson, and Morgan). Biometricians asserted that the really important variation in evolution didn’t follow Mendelian rules. Height, weight, skin color, and similar traits seemed to

9,847 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, James S. Ware, Andrew J. Hill1, Andrew J. Hill4, Andrew J. Hill2, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada2, Karol Estrada1, Fengmei Zhao1, Fengmei Zhao2, James Zou2, Emma Pierce-Hoffman2, Emma Pierce-Hoffman1, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta2, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun2, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler2, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel2, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur1, Daniel G. MacArthur2 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Book
01 Jan 2009

8,216 citations