scispace - formally typeset
Search or ask a question
Author

David Craig

Bio: David Craig is an academic researcher from University of Southern California. The author has contributed to research in topics: Genome-wide association study & Medicine. The author has an hindex of 87, co-authored 351 publications receiving 49511 citations. Previous affiliations of David Craig include Hobart and William Smith Colleges & University of California, Los Angeles.


Papers
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

Journal ArticleDOI
TL;DR: In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.
Abstract: Eleven susceptibility loci for late-onset Alzheimer's disease (LOAD) were identified by previous studies; however, a large portion of the genetic risk for this disease remains unexplained. We conducted a large, two-stage meta-analysis of genome-wide association studies (GWAS) in individuals of European ancestry. In stage 1, we used genotyped and imputed data (7,055,881 SNPs) to perform meta-analysis on 4 previously published GWAS data sets consisting of 17,008 Alzheimer's disease cases and 37,154 controls. In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls. In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.

3,726 citations

01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations

Journal ArticleDOI
Denise Harold1, Richard Abraham2, Paul Hollingworth2, Rebecca Sims2, Amy Gerrish2, Marian L. Hamshere3, Jaspreet Singh Pahwa2, Valentina Moskvina2, Kimberley Dowzell2, Amy L. Williams2, Nicola L. Jones2, Charlene Thomas2, Alexandra Stretton2, Angharad R. Morgan2, Simon Lovestone4, John Powell5, Petroula Proitsi5, Michelle K. Lupton5, Carol Brayne6, David C. Rubinsztein7, Michael Gill6, Brian A. Lawlor6, Aoibhinn Lynch6, Kevin Morgan8, Kristelle Brown8, Peter Passmore9, David Craig9, Bernadette McGuinness9, Stephen Todd9, Clive Holmes10, David M. A. Mann11, A. David Smith12, Seth Love3, Patrick G. Kehoe3, John Hardy, Simon Mead13, Nick C. Fox13, Martin N. Rossor13, John Collinge13, Wolfgang Maier14, Frank Jessen14, Britta Schürmann14, Hendrik van den Bussche15, Isabella Heuser16, Johannes Kornhuber17, Jens Wiltfang18, Martin Dichgans19, Lutz Frölich20, Harald Hampel21, Harald Hampel19, Michael Hüll22, Dan Rujescu19, Alison Goate23, John S. K. Kauwe24, Carlos Cruchaga23, Petra Nowotny23, John C. Morris23, Kevin Mayo23, Kristel Sleegers25, Karolien Bettens25, Sebastiaan Engelborghs25, Peter Paul De Deyn25, Christine Van Broeckhoven25, Gill Livingston26, Nicholas Bass26, Hugh Gurling26, Andrew McQuillin26, Rhian Gwilliam27, Panagiotis Deloukas27, Ammar Al-Chalabi28, Christopher Shaw28, Magda Tsolaki29, Andrew B. Singleton30, Rita Guerreiro30, Thomas W. Mühleisen14, Markus M. Nöthen14, Susanne Moebus18, Karl-Heinz Jöckel18, Norman Klopp, H-Erich Wichmann19, Minerva M. Carrasquillo31, V. Shane Pankratz31, Steven G. Younkin31, Peter Holmans2, Michael Conlon O'Donovan2, Michael John Owen2, Julie Williams2 
TL;DR: A two-stage genome-wide association study of Alzheimer's disease involving over 16,000 individuals, the most powerful AD GWAS to date, produced compelling evidence for association with Alzheimer's Disease in the combined dataset.
Abstract: We undertook a two-stage genome-wide association study (GWAS) of Alzheimer's disease (AD) involving over 16,000 individuals, the most powerful AD GWAS to date. In stage 1 (3,941 cases and 7,848 controls), we replicated the established association with the apolipoprotein E (APOE) locus (most significant SNP, rs2075650, P = 1.8 10-157) and observed genome-wide significant association with SNPs at two loci not previously associated with the disease: at the CLU (also known as APOJ) gene (rs11136000, P = 1.4 10-9) and 5' to the PICALM gene (rs3851179, P = 1.9 10-8). These associations were replicated in stage 2 (2,023 cases and 2,340 controls), producing compelling evidence for association with Alzheimer's disease in the combined dataset (rs11136000, P = 8.5 10-10, odds ratio = 0.86; rs3851179, P = 1.3 10-9, odds ratio = 0.86).

2,956 citations

Journal ArticleDOI
S. Hong Lee1, Stephan Ripke2, Stephan Ripke3, Benjamin M. Neale2  +402 moreInstitutions (124)
TL;DR: Empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.
Abstract: Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.

2,058 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: NAMD as discussed by the authors is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that scales to hundreds of processors on high-end parallel platforms, as well as tens of processors in low-cost commodity clusters, and also runs on individual desktop and laptop computers.
Abstract: NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of processors on high-end parallel platforms, as well as tens of processors on low-cost commodity clusters, and also runs on individual desktop and laptop computers. NAMD works with AMBER and CHARMM potential functions, parameters, and file formats. This article, directed to novices as well as experts, first introduces concepts and methods used in the NAMD program, describing the classical molecular dynamics force field, equations of motion, and integration methods along with the efficient electrostatics evaluation algorithms employed and temperature and pressure controls used. Features for steering the simulation across barriers and for calculating both alchemical and conformational free energy differences are presented. The motivations for and a roadmap to the internal design of NAMD, implemented in C++ and based on Charm++ parallel objects, are outlined. The factors affecting the serial and parallel performance of a simulation are discussed. Finally, typical NAMD use is illustrated with representative applications to a small, a medium, and a large biomolecular system, highlighting particular features of NAMD, for example, the Tcl scripting language. The article also provides a list of the key features of NAMD and discusses the benefits of combining NAMD with the molecular graphics/sequence analysis software VMD and the grid computing/collaboratory software BioCoRE. NAMD is distributed free of charge with source code at www.ks.uiuc.edu.

14,558 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill2, Andrew J. Hill1, Andrew J. Hill4, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada1, Karol Estrada2, Fengmei Zhao2, Fengmei Zhao1, James Zou2, Emma Pierce-Hoffman2, Emma Pierce-Hoffman1, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta2, Daniel P. Howrigan1, Daniel P. Howrigan2, Adam Kiezun2, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler2, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel2, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins16, Hugh Watkins17, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations