scispace - formally typeset
Author

Mark J. Daly

Bio: Mark J. Daly is a academic researcher at University of Helsinki who has co-authored 763 publication(s) receiving 304452 citation(s). The author has an hindex of 204. Previous affiliations of Mark J. Daly include Cleveland Clinic Lerner Research Institute & Boston Children's Hospital. The author has done significant research in the topic(s): Genome-wide association study & Population.

...read more

Papers
  More

Open accessJournal ArticleDOI: 10.1086/519795
Shaun Purcell1, Shaun Purcell2, Benjamin M. Neale1, Benjamin M. Neale3  +14 moreInstitutions (4)
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read more

22,115 Citations


Open accessJournal ArticleDOI: 10.1101/GR.107524.110
Aaron McKenna1, Matthew Hanna, Eric Banks, Andrey Sivachenko  +7 moreInstitutions (1)
01 Sep 2010-Genome Research
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read more

Topics: Variant Call Format (52%), Software framework (50%)

16,404 Citations


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTH457
15 Jan 2005-Bioinformatics
Abstract: Summary: Research over the last few years has revealed significant haplotype structure in the human genome. The characterization of these patterns, particularly in the context of medical genetic association studies, is becoming a routine research activity. Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface. Availability: http://www.broad.mit.edu/mpg/haploview/ Contact: jcbarret@broad.mit.edu

...read more

Topics: Haploview (76%), Haplotype estimation (56%), International HapMap Project (55%) ...read more

13,185 Citations


Open accessJournal ArticleDOI: 10.1038/NATURE15393
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read more

Topics: 1000 Genomes Project (62%), Exome sequencing (59%), Genome-wide association study (59%) ...read more

9,821 Citations


Open accessJournal ArticleDOI: 10.1038/NG.806
Mark A. DePristo1, Eric Banks1, Ryan Poplin1, Kiran V. Garimella1  +19 moreInstitutions (3)
01 May 2011-Nature Genetics
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

...read more

Topics: DNA sequencing theory (61%), Variant Call Format (58%), 1000 Genomes Project (55%) ...read more

8,715 Citations


Cited by
  More

Open accessJournal ArticleDOI: 10.1038/NMETH.1923
01 Apr 2012-Nature Methods
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read more

27,973 Citations


Open accessJournal ArticleDOI: 10.1073/PNAS.0506580102
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

...read more

26,320 Citations


Open accessJournal ArticleDOI: 10.1086/519795
Shaun Purcell1, Shaun Purcell2, Benjamin M. Neale1, Benjamin M. Neale3  +14 moreInstitutions (4)
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read more

22,115 Citations


Open accessJournal ArticleDOI: 10.1038/35057062
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read more

Topics: Cancer genome sequencing (61%), Hybrid genome assembly (59%), Cancer Genome Project (58%) ...read more

21,023 Citations


Open access
28 Jul 2005-
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read more

18,940 Citations


Performance
Metrics

Author's H-index: 204

No. of papers from the Author in previous years
YearPapers
202152
202049
201967
201852
201740
201654

Top Attributes

Show by:

Author's top 5 most impactful journals

Nature Genetics

98 papers, 71.5K citations

bioRxiv

91 papers, 3.8K citations

American Journal of Human Genetics

57 papers, 33.1K citations

Nature

44 papers, 80.1K citations

medRxiv

25 papers, 120 citations

Network Information
Related Authors (5)
Steven A. McCarroll

244 papers, 105.5K citations

93% related
Shaun Purcell

326 papers, 132.9K citations

93% related
Claire Churchhouse

30 papers, 5.1K citations

93% related
Timothy Poterba

30 papers, 5.8K citations

92% related
Christine Stevens

60 papers, 21.5K citations

92% related