Search or ask a question

Literature Review

Citation generator

Chrome Extension

Home
/
Papers
/
Second-generation PLINK: rising to the challenge of larger and richer datasets

Journal Article•DOI•

Second-generation PLINK: rising to the challenge of larger and richer datasets

Christopher C. Chang, Carson C. Chow¹, Laurent C. A. M. Tellier², Shashaank Vattikuti¹, Shaun Purcell³, James J. Lee⁴ - Show less +2 more•Institutions (4)

National Institutes of Health¹, University of Copenhagen², Icahn School of Medicine at Mount Sinai³, University of Minnesota⁴

17 Oct 2014-arXiv: Genomics-

TL;DR: PLINK as discussed by the authors is a C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics, which has been widely used in the literature.

read less

Abstract: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

[...]

Amit Khera¹, Mark Chaffin², Krishna G. Aragam, Mary E. Haas², Carolina Roselli², Seung Hoan Choi², Pradeep Natarajan¹, Eric S. Lander², Steven A. Lubitz², Steven A. Lubitz¹, Patrick T. Ellinor¹, Patrick T. Ellinor², Sekar Kathiresan - Show less +9 more•Institutions (2)

Harvard University¹, Broad Institute²

13 Aug 2018-Nature Genetics

TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.

...read moreread less

Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

...read moreread less

1,962 citations

Journal Article•DOI•

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

[...]

James J. Lee¹, Robbee Wedow², Aysu Okbay³, Edward Kong⁴, Omeed Maghzian⁴, Meghan Zacher⁴, Tuan Anh Nguyen-Viet⁵, Peter Bowers⁴, Julia Sidorenko⁶, Julia Sidorenko⁷, Richard Karlsson Linnér³, Richard Karlsson Linnér⁸, Mark Alan Fontana⁹, Mark Alan Fontana⁵, Tushar Kundu⁵, Chanwook Lee⁴, Hui Li⁴, Ruoxi Li⁵, Rebecca Royer⁵, Pascal Timshel¹⁰, Pascal Timshel¹¹, Raymond K. Walters⁴, Raymond K. Walters¹², Emily A. Willoughby¹, Loic Yengo⁷, Maris Alver⁶, Yanchun Bao¹³, David W. Clark¹⁴, Felix R. Day¹⁵, Nicholas A. Furlotte, Peter K. Joshi¹⁴, Peter K. Joshi¹⁶, Kathryn E. Kemper⁷, Aaron Kleinman, Claudia Langenberg¹⁵, Reedik Mägi⁶, Joey W. Trampush⁵, Shefali S. Verma¹⁷, Yang Wu⁷, Max Lam, Jing Hua Zhao¹⁵, Zhili Zheng¹⁸, Zhili Zheng⁷, Jason D. Boardman², Harry Campbell¹⁴, Jeremy Freese¹⁹, Kathleen Mullan Harris²⁰, Caroline Hayward¹⁴, Pamela Herd¹³, Pamela Herd²¹, Meena Kumari¹³, Todd Lencz²², Todd Lencz²³, Jian'an Luan¹⁵, Anil K. Malhotra²³, Anil K. Malhotra²², Andres Metspalu⁶, Lili Milani⁶, Ken K. Ong¹⁵, John R. B. Perry¹⁵, David J. Porteous¹⁴, Marylyn D. Ritchie¹⁷, Melissa C. Smart¹⁴, Blair H. Smith²⁴, Joyce Y. Tung, Nicholas J. Wareham¹⁵, James F. Wilson¹⁴, Jonathan P. Beauchamp²⁵, Dalton Conley²⁶, Tõnu Esko⁶, Steven F. Lehrer²⁷, Steven F. Lehrer²⁸, Steven F. Lehrer²⁹, Patrik K. E. Magnusson³⁰, Sven Oskarsson³¹, Tune H. Pers¹¹, Tune H. Pers¹⁰, Matthew R. Robinson³², Matthew R. Robinson⁷, Kevin Thom³³, Chelsea Watson⁵, Christopher F. Chabris¹⁷, Michelle N. Meyer¹⁷, David Laibson⁴, Jian Yang⁷, Magnus Johannesson³⁴, Philipp Koellinger⁸, Philipp Koellinger³, Patrick Turley¹², Patrick Turley⁴, Peter M. Visscher⁷, Daniel J. Benjamin²⁷, Daniel J. Benjamin⁵, David Cesarini²⁷, David Cesarini³³ - Show less +91 more•Institutions (34)

University of Minnesota¹, University of Colorado Boulder², VU University Amsterdam³, Harvard University⁴, University of Southern California⁵, University of Tartu⁶, University of Queensland⁷, Erasmus University Rotterdam⁸, Hospital for Special Surgery⁹, Statens Serum Institut¹⁰, University of Copenhagen¹¹, Broad Institute¹², University of Essex¹³, University of Edinburgh¹⁴, University of Cambridge¹⁵, University Hospital of Lausanne¹⁶, Geisinger Health System¹⁷, Wenzhou Medical College¹⁸, Stanford University¹⁹, University of North Carolina at Chapel Hill²⁰, University of Wisconsin-Madison²¹, The Feinstein Institute for Medical Research²², Hofstra University²³, University of Dundee²⁴, University of Toronto²⁵, Princeton University²⁶, National Bureau of Economic Research²⁷, Queen's University²⁸, New York University Shanghai²⁹, Karolinska Institutet³⁰, Uppsala University³¹, University of Lausanne³², New York University³³, Stockholm School of Economics³⁴

23 Jul 2018-Nature Genetics

TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.

...read moreread less

Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

...read moreread less

1,658 citations

Journal Article•DOI•

Genomewide Association Study of Severe Covid-19 with Respiratory Failure.

[...]

David Ellinghaus¹, Frauke Degenhardt¹, Luis Bujanda¹, Maria Buti¹, Agustín Albillos¹, Pietro Invernizzi¹, J. Fernández¹, Daniele Prati¹, Guido Baselli¹, Rosanna Asselta¹, Marit Mæhle Grimsrud¹, Chiara Milani¹, Fatima Aziz¹, Jan Christian Kässens¹, Sandra May¹, Mareike Wendorff¹, Lars Wienbrandt¹, Florian Uellendahl-Werth¹, Tenghao Zheng¹, Xiaoli Yi¹, Raúl de Pablo¹, Adolfo Garrido Chercoles¹, Adriana Palom¹, Alba Estela Garcia-Fernandez¹, Francisco Rodriguez-Frias¹, Alberto Zanella¹, Alessandra Bandera¹, Alessandro Protti¹, Alessio Aghemo¹, Ana Lleo¹, Andrea Biondi¹, Andrea Caballero-Garralda¹, Andrea Gori¹, Anja Tanck¹, Anna Carreras Nolla¹, Anna Latiano¹, Anna Ludovica Fracanzani¹, Anna Peschuck¹, Antonio Julià¹, Antonio Pesenti¹, Antonio Voza¹, David Jiménez¹, Beatriz Mateos¹, Beatriz Nafria Jimenez¹, Carmen Quereda¹, Cinzia Paccapelo¹, Christoph Gassner¹, Claudio Angelini¹, Cristina Cea¹, Aurora Solier¹, David Pestana¹, Eduardo Muñiz-Diaz¹, Elena Sandoval¹, Elvezia Maria Paraboschi¹, Enrique Navas¹, Félix García Sánchez¹, Ferruccio Ceriotti¹, F. Martinelli-Boneschi¹, Flora Peyvandi¹, Francesco Blasi¹, Luis Téllez¹, Albert Blanco-Grau¹, Georg Hemmrich-Stanisak¹, Giacomo Grasselli¹, Giorgio Costantino¹, Giulia Cardamone¹, Giuseppe Foti¹, Serena Aneli¹, Hayato Kurihara¹, Hesham ElAbd¹, Ilaria My¹, Iván Galván-Femenía¹, Javier Martin¹, Jeanette Erdmann¹, José Ferrusquía-Acosta¹, Koldo Garcia-Etxebarria¹, Laura Izquierdo-Sanchez¹, Laura Rachele Bettini¹, Lauro Sumoy¹, Leonardo Terranova¹, Leticia Moreira¹, Luigi Santoro¹, Luigia Scudeller¹, Francisco Mesonero¹, Luisa Roade¹, Malte C. Rühlemann¹, Marco Schaefer¹, Maria Carrabba¹, Mar Riveiro-Barciela¹, Maria Eloina Figuera Basso¹, Maria Grazia Valsecchi¹, María Hernández-Tejero¹, Marialbert Acosta-Herrera¹, Mariella D'Angiò¹, Marina Baldini¹, Marina Cazzaniga¹, Martin Schulzky¹, Maurizio Cecconi¹, Michael Wittig¹, Michele Ciccarelli¹, Miguel Rodríguez-Gandía¹, Monica Bocciolone¹, Monica Miozzo¹, Nicola Montano¹, Nicole Braun¹, Nicoletta Sacchi¹, Nilda Martinez¹, Onur Özer¹, Orazio Palmieri¹, Paola Faverio¹, Paoletta Preatoni¹, Paolo Bonfanti¹, Paolo Omodei¹, Paolo Tentorio¹, Pedro Castro¹, Pedro M. Rodrigues¹, Aaron Blandino Ortiz¹, Rafael de Cid¹, Ricard Ferrer¹, Roberta Gualtierotti¹, Rosa Nieto¹, Siegfried Goerg¹, Salvatore Badalamenti¹, Sara Marsal¹, Giuseppe Matullo¹, Serena Pelusi¹, Simonas Juzenas¹, Stefano Aliberti¹, Valter Monzani¹, Victor Moreno¹, Tanja Wesse¹, Tobias L. Lenz¹, Tomás Pumarola¹, Valeria Rimoldi¹, Silvano Bosari¹, Wolfgang Albrecht¹, Wolfgang Peter¹, Manuel Romero-Gómez¹, Mauro D'Amato¹, Stefano Duga¹, Jesus M. Banales¹, Johannes R. Hov¹, Trine Folseraas¹, Luca Valenti¹, Andre Franke¹, Tom H. Karlsen¹ - Show less +142 more•Institutions (1)

University of Kiel¹

17 Jun 2020-The New England Journal of Medicine

TL;DR: A 3p21.31 gene cluster is identified as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and a potential involvement of the ABO blood-group system is confirmed.

...read moreread less

Abstract: Background There is considerable variation in disease behavior among patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 (Covid-19) Genomewide association analysis may allow for the identification of potential genetic factors involved in the development of Covid-19 Methods We conducted a genomewide association study involving 1980 patients with Covid-19 and severe disease (defined as respiratory failure) at seven hospitals in the Italian and Spanish epicenters of the SARS-CoV-2 pandemic in Europe After quality control and the exclusion of population outliers, 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain were included in the final analysis In total, we analyzed 8,582,968 single-nucleotide polymorphisms and conducted a meta-analysis of the two case-control panels Results We detected cross-replicating associations with rs11385942 at locus 3p2131 and with rs657152 at locus 9q342, which were significant at the genomewide level (P Conclusions We identified a 3p2131 gene cluster as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and confirmed a potential involvement of the ABO blood-group system (Funded by Stein Erik Hagen and others)

...read moreread less

1,529 citations

Journal Article•DOI•

Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk.

[...]

Iris E. Jansen¹, Jeanne E. Savage¹, Kyoko Watanabe¹, Julien Bryois², Dylan M. Williams², Stacy Steinberg³, Julia Sealock⁴, Ida K. Karlsson⁵, Ida K. Karlsson², Sara Hägg², Lavinia Athanasiu⁶, Lavinia Athanasiu⁷, Nicola Voyle⁸, Petroula Proitsi⁸, Aree Witoelar⁷, Sven Stringer¹, Dag Aarsland⁸, Dag Aarsland⁹, Ina S. Almdahl¹⁰, Ina S. Almdahl⁷, Ina S. Almdahl⁶, Fred Andersen¹¹, Sverre Bergh¹², Francesco Bettella⁷, Sigurbjorn Bjornsson, Anne Brækhus⁶, Geir Bråthen¹³, Christiaan de Leeuw¹, Rahul S. Desikan¹⁴, Srdjan Djurovic⁷, Srdjan Djurovic⁶, Logan Dumitrescu¹⁵, Tormod Fladby¹⁰, Tormod Fladby⁷, Timothy J. Hohman¹⁵, Palmi V. Jonsson¹⁶, Steven J. Kiddle¹⁷, Arvid Rongve¹⁸, Ingvild Saltvedt¹³, Sigrid Botne Sando¹³, Geir Selbæk⁷, Maryam Shoai, Nathan G. Skene², Nathan G. Skene¹⁹, Jon Snaedal, Eystein Stordal²⁰, Eystein Stordal¹³, Ingun Ulstein⁶, Yunpeng Wang⁷, Linda R. White¹³, John Hardy, Jens Hjerling-Leffler², Patrick F. Sullivan²¹, Patrick F. Sullivan², Wiesje M. van der Flier¹, Richard Dobson, Lea K. Davis¹⁵, Hreinn Stefansson³, Kari Stefansson³, Nancy L. Pedersen², Stephan Ripke²², Stephan Ripke²³, Stephan Ripke²⁴, Ole A. Andreassen⁷, Danielle Posthuma¹, Danielle Posthuma²⁵ - Show less +62 more•Institutions (25)

VU University Amsterdam¹, Karolinska Institutet², deCODE genetics³, Vanderbilt University⁴, Jönköping University⁵, Oslo University Hospital⁶, University of Oslo⁷, King's College London⁸, Stavanger University Hospital⁹, Akershus University Hospital¹⁰, University of Tromsø¹¹, Innlandet Hospital Trust¹², Norwegian University of Science and Technology¹³, University of California, San Francisco¹⁴, Vanderbilt University Medical Center¹⁵, University of Iceland¹⁶, University of Cambridge¹⁷, University of Bergen¹⁸, University College London¹⁹, Namsos Hospital²⁰, University of North Carolina at Chapel Hill²¹, Harvard University²², Charité²³, Broad Institute²⁴, VU University Medical Center²⁵

01 Mar 2019-Nature Genetics

TL;DR: A large genome-wide association study of clinically diagnosed AD and AD-by-proxy identifies new loci and functional pathways that contribute to AD risk and adds novel insights into the neurobiology of AD.

...read moreread less

Abstract: Alzheimer's disease (AD) is highly heritable and recent studies have identified over 20 disease-associated genomic loci. Yet these only explain a small proportion of the genetic variance, indicating that undiscovered loci remain. Here, we performed a large genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 cases, 383,378 controls). AD-by-proxy, based on parental diagnoses, showed strong genetic correlation with AD (rg = 0.81). Meta-analysis identified 29 risk loci, implicating 215 potential causative genes. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver, and microglia). Gene-set analyses indicate biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomization results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD.

...read moreread less

1,460 citations

Journal Article•DOI•

Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder

[...]

Ditte Demontis¹, Ditte Demontis², Raymond K. Walters³, Raymond K. Walters⁴, Joanna Martin⁴, Joanna Martin⁵, Joanna Martin⁶, Manuel Mattheisen, Thomas Damm Als², Thomas Damm Als¹, Esben Agerbo¹, Esben Agerbo², Gisli Baldursson, Rich Belliveau⁴, Jonas Bybjerg-Grauholm⁷, Jonas Bybjerg-Grauholm², Marie Bækvad-Hansen², Marie Bækvad-Hansen⁷, Felecia Cerrato⁴, Kimberly Chambert⁴, Claire Churchhouse³, Claire Churchhouse⁴, Ashley Dumont⁴, Nicholas Eriksson, Michael J. Gandal, Jacqueline I. Goldstein⁴, Jacqueline I. Goldstein³, Katrina L. Grasby⁸, Jakob Grove, Olafur O Gudmundsson⁹, Olafur O Gudmundsson¹⁰, Christine Søholm Hansen⁷, Christine Søholm Hansen², Christine Søholm Hansen¹¹, Mads E. Hauberg¹, Mads E. Hauberg², Mads V. Hollegaard², Mads V. Hollegaard⁷, Daniel P. Howrigan³, Daniel P. Howrigan⁴, Hailiang Huang³, Hailiang Huang⁴, Julian Maller⁴, Alicia R. Martin³, Alicia R. Martin⁴, Nicholas G. Martin⁸, Jennifer L. Moran⁴, Jonatan Pallesen², Jonatan Pallesen¹, Duncan S. Palmer³, Duncan S. Palmer⁴, Carsten Bøcker Pedersen¹, Carsten Bøcker Pedersen², Marianne Giørtz Pedersen², Marianne Giørtz Pedersen¹, Timothy Poterba⁴, Timothy Poterba³, Jesper Buchhave Poulsen², Jesper Buchhave Poulsen⁷, Stephan Ripke¹², Stephan Ripke⁴, Stephan Ripke³, Elise B. Robinson³, F. Kyle Satterstrom⁴, F. Kyle Satterstrom³, Hreinn Stefansson⁹, Christine Stevens⁴, Patrick Turley³, Patrick Turley⁴, G. Bragi Walters¹⁰, G. Bragi Walters⁹, Hyejung Won¹³, Hyejung Won¹⁴, Margaret J. Wright¹⁵, Ole A. Andreassen¹⁶, Philip Asherson¹⁷, Christie L. Burton¹⁸, Dorret I. Boomsma¹⁹, Bru Cormand, Søren Dalsgaard¹, Barbara Franke²⁰, Joel Gelernter²¹, Joel Gelernter²², Daniel H. Geschwind¹⁴, Daniel H. Geschwind¹³, Hakon Hakonarson²³, Jan Haavik²⁴, Jan Haavik²⁵, Henry R. Kranzler²⁶, Henry R. Kranzler²², Jonna Kuntsi¹⁷, Kate Langley⁶, Klaus-Peter Lesch²⁷, Klaus-Peter Lesch²⁸, Klaus-Peter Lesch²⁹, Christel M. Middeldorp¹⁵, Christel M. Middeldorp¹⁹, Andreas Reif³⁰, Luis Augusto Rohde³¹, Panos Roussos, Russell Schachar¹⁸, Pamela Sklar³², Edmund J.S. Sonuga-Barke¹⁷, Patrick F. Sullivan³³, Patrick F. Sullivan⁵, Anita Thapar⁶, Joyce Y. Tung, Irwin D. Waldman³⁴, Sarah E. Medland⁸, Kari Stefansson¹⁰, Kari Stefansson⁹, Merete Nordentoft², Merete Nordentoft³⁵, David M. Hougaard⁷, David M. Hougaard², Thomas Werge³⁵, Thomas Werge², Thomas Werge¹¹, Ole Mors³⁶, Ole Mors², Preben Bo Mortensen, Mark J. Daly, Stephen V. Faraone³⁷, Anders D. Børglum¹, Anders D. Børglum², Benjamin M. Neale³, Benjamin M. Neale⁴ - Show less +123 more•Institutions (37)

Aarhus University¹, Lundbeck², Harvard University³, Broad Institute⁴, Karolinska Institutet⁵, Cardiff University⁶, Statens Serum Institut⁷, QIMR Berghofer Medical Research Institute⁸, deCODE genetics⁹, University of Iceland¹⁰, Mental Health Services¹¹, Charité¹², Semel Institute for Neuroscience and Human Behavior¹³, University of California, Los Angeles¹⁴, University of Queensland¹⁵, Oslo University Hospital¹⁶, King's College London¹⁷, University of Toronto¹⁸, VU University Amsterdam¹⁹, Radboud University Nijmegen²⁰, Yale University²¹, Veterans Health Administration²², Children's Hospital of Philadelphia²³, University of Bergen²⁴, Haukeland University Hospital²⁵, University of Pennsylvania²⁶, I.M. Sechenov First Moscow State Medical University²⁷, Maastricht University²⁸, University of Würzburg²⁹, Goethe University Frankfurt³⁰, Universidade Federal do Rio Grande do Sul³¹, Icahn School of Medicine at Mount Sinai³², University of North Carolina at Chapel Hill³³, Emory University³⁴, University of Copenhagen³⁵, Aarhus University Hospital³⁶, State University of New York Upstate Medical University³⁷

01 Jan 2019-Nature Genetics

TL;DR: A genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls identifies variants surpassing genome- wide significance in 12 independent loci and implicates neurodevelopmental pathways and conserved regions of the genome as being involved in underlying ADHD biology.

...read moreread less

Abstract: Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits.

...read moreread less

1,436 citations

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

[...]

Shaun Purcell¹, Shaun Purcell², Benjamin M. Neale², Benjamin M. Neale³, Kathe Todd-Brown¹, Lori Thomas¹, Manuel A. R. Ferreira¹, David Bender², David Bender¹, Julian Maller², Julian Maller¹, Pamela Sklar¹, Pamela Sklar², Paul I.W. de Bakker¹, Paul I.W. de Bakker², Mark J. Daly¹, Mark J. Daly², Pak C. Sham⁴ - Show less +14 more•Institutions (4)

Harvard University¹, Massachusetts Institute of Technology², University of London³, University of Hong Kong⁴

01 Sep 2007-American Journal of Human Genetics

TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.

...read moreread less

Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read moreread less

26,280 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Haploview: analysis and visualization of LD and haplotype maps

[...]

Jeffrey C. Barrett¹, Ben Fry¹, Julian Maller¹, Mark J. Daly¹•Institutions (1)

Massachusetts Institute of Technology¹

15 Jan 2005-Bioinformatics

TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.

...read moreread less

Abstract: Summary: Research over the last few years has revealed significant haplotype structure in the human genome. The characterization of these patterns, particularly in the context of medical genetic association studies, is becoming a routine research activity. Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface. Availability: http://www.broad.mit.edu/mpg/haploview/ Contact: jcbarret@broad.mit.edu

...read moreread less

13,862 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations

1
2
3
4
…
5
6
7
8

Collapse

Related Papers (5)

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

[...]

01 Sep 2007-American Journal of Human Genetics

Shaun Purcell, Shaun Purcell, Benjamin M. Neale, Benjamin M. Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, David Bender, David Bender, Julian Maller, Julian Maller, Pamela Sklar, Pamela Sklar, Paul I.W. de Bakker, Paul I.W. de Bakker, Mark J. Daly, Mark J. Daly, Pak C. Sham - Show less +15 more

GCTA: a tool for genome-wide complex trait analysis.

[...]

07 Jan 2011-American Journal of Human Genetics

Jian Yang, S. Hong Lee, Michael E. Goddard, Peter M. Visscher - Show less +1 more

A global reference for human genetic variation.

[...]

01 Oct 2015-Nature

Adam Auton, Gonçalo R. Abecasis, David Altshuler +515 more

A reference panel of 64,976 haplotypes for genotype imputation

[...]

22 Aug 2016-Nature Genetics

Shane A. McCarthy, Sayantan Das, Warren W. Kretzschmar, Olivier Delaneau, Andrew R. Wood, Alexander Teumer, Hyun Min Kang, Christian Fuchsberger, Petr Danecek, Kevin Sharp, Yang Luo, C Sidore, Alan Kwong, Nicholas J. Timpson, Seppo Koskinen, Scott I. Vrieze, Laura J. Scott, He Zhang, Anubha Mahajan, Jan H. Veldink, Ulrike Peters, Ulrike Peters, Carlos N. Pato, Cornelia M. van Duijn, Christopher E. Gillies, Ilaria Gandin, Massimo Mezzavilla, Arthur Gilly, Massimiliano Cocca, Michela Traglia, Andrea Angius, Jeffrey C. Barrett, D.I. Boomsma, Kari Branham, Gerome Breen, Gerome Breen, Chad M. Brummett, Fabio Busonero, Harry Campbell, Andrew T. Chan, Sai Chen, Emily Y. Chew, Francis S. Collins, Laura J Corbin, George Davey Smith, George Dedoussis, Marcus Dörr, Aliki-Eleni Farmaki, Luigi Ferrucci, Lukas Forer, Ross M. Fraser, Stacey Gabriel, Shawn Levy, Leif Groop, Leif Groop, Tabitha A. Harrison, Andrew T. Hattersley, Oddgeir L. Holmen, Kristian Hveem, Matthias Kretzler, James Lee, Matt McGue, Thomas Meitinger, David Melzer, Josine L. Min, Karen L. Mohlke, John B. Vincent, Matthias Nauck, Deborah A. Nickerson, Aarno Palotie, Aarno Palotie, Michele T. Pato, Nicola Pirastu, Melvin G. McInnis, J. Brent Richards, J. Brent Richards, Cinzia Sala, Veikko Salomaa, David Schlessinger, Sebastian Schoenherr, P. Eline Slagboom, Kerrin S. Small, Tim D. Spector, Dwight Stambolian, Marcus A. Tuke, Jaakko Tuomilehto, Leonard H. van den Berg, Wouter van Rheenen, Uwe Völker, Cisca Wijmenga, Daniela Toniolo, Eleftheria Zeggini, Paolo Gasparini, Matthew G. Sampson, James F. Wilson, Timothy M. Frayling, Paul I.W. de Bakker, Morris A. Swertz, Steven A. McCarroll, Charles Kooperberg, Annelot M. Dekker, David Altshuler, Cristen J. Willer, William G. Iacono, Samuli Ripatti, Nicole Soranzo, Nicole Soranzo, Klaudia Walter, Anand Swaroop, Francesco Cucca, Carl A. Anderson, Richard M. Myers, Michael Boehnke, Mark I. McCarthy, Mark I. McCarthy, Richard Durbin, Gonçalo R. Abecasis, Jonathan Marchini - Show less +115 more

LD score regression distinguishes confounding from polygenicity in genome-wide association studies :

[...]

02 Feb 2015-Nature Genetics

Brendan Bulik-Sullivan, Po-Ru Loh, Hilary K. Finucane, Stephan Ripke, Jian Yang, Nick Patterson, Mark J. Daly, Alkes L. Price, Benjamin M. Neale - Show less +6 more