scispace - formally typeset
Search or ask a question
Author

Adrian Tan

Bio: Adrian Tan is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 11, co-authored 12 publications receiving 12791 citations. Previous affiliations of Adrian Tan include Agency for Science, Technology and Research & University of Minnesota.

Papers
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations

Journal ArticleDOI
TL;DR: A common genetic variant at the TRAF1-C5 locus on chromosome 9 is associated with an increased risk of anti-CCP-positive rheumatoid arthritis.
Abstract: A B S T R AC T Background Rheumatoid arthritis has a complex mode of inheritance. Although HLA-DRB1 and PTPN22 are well-established susceptibility loci, other genes that confer a modest level of risk have been identified recently. We carried out a genomewide association analysis to identify additional genetic loci associated with an increased risk of rheu- matoid arthritis. Methods We genotyped 317,503 single-nucleotide polymorphisms (SNPs) in a combined case- control study of 1522 case subjects with rheumatoid arthritis and 1850 matched con - trol subjects. The patients were seropositive for autoantibodies against cyclic citrul- linated peptide (CCP). We obtained samples from two data sets, the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Inves- tigation of Rheumatoid Arthritis (EIRA). Results from NARAC and EIRA for 297,086 SNPs that passed quality-control filters were combined with the use of Cochran- Mantel-Haenszel stratified analysis. SNPs showing a significant association with disease (P<1×10 −8 ) were genotyped in an independent set of case subjects with anti- CCP-positive rheumatoid arthritis (485 from NARAC and 512 from EIRA) and in control subjects (1282 from NARAC and 495 from EIRA). Results We observed associations between disease and variants in the major-histocompat- ibility-complex locus, in PTPN22, and in a SNP (rs3761847) on chromosome 9 for all samples tested, the latter with an odds ratio of 1.32 (95% confidence interval, 1.23 to 1.42; P = 4×10 − � 4 ). The SNP is in linkage disequilibrium with two genes relevant to chronic inflammation: TRAF1 (encoding tumor necrosis factor receptor-associated factor 1) and C5 (encoding complement component 5). Conclusions A common genetic variant at the TRAF1-C5 locus on chromosome 9 is associated with an increased risk of anti-CCP-positive rheumatoid arthritis.

820 citations

Journal ArticleDOI
Mahmood Ameen Abdulla1, Ikhlak Ahmed2, Anunchai Assawamakin3, Anunchai Assawamakin4, Jong Bhak5, Samir K. Brahmachari2, Gayvelline C. Calacal6, Amit Kumar Chaurasia2, Chien-Hsiun Chen7, Jieming Chen8, Yuan-Tsong Chen7, Jiayou Chu9, Eva Maria Cutiongco-de la Paz6, Maria Corazon A. De Ungria6, Frederick C. Delfin6, Juli Edo1, Suthat Fuchareon4, Ho Ghang5, Takashi Gojobori10, Junsong Han, Sheng Feng Ho7, Boon Peng Hoh11, Wei Huang12, Hidetoshi Inoko13, Pankaj Jha2, Timothy A. Jinam1, Li Jin14, Jongsun Jung, Daoroong Kangwanpong15, Jatupol Kampuansai15, Giulia C. Kennedy16, Preeti Khurana2, Hyung Lae Kim, Kwangjoong Kim, Sangsoo Kim17, Woo Yeon Kim5, Kuchan Kimm18, Ryosuke Kimura19, Tomohiro Koike, Supasak Kulawonganunchai3, Vikrant Kumar8, Poh San Lai20, Jong-Young Lee, Sunghoon Lee5, Edison T. Liu8, Partha P. Majumder21, Kiran Kumar Mandapati2, Sangkot Marzuki22, Wayne Mitchell8, Wayne Mitchell23, Mitali Mukerji2, Kenji Naritomi24, Chumpol Ngamphiw3, Norio Niikawa25, Nao Nishida19, Bermseok Oh, Sangho Oh5, Jun Ohashi19, Akira Oka13, Rick Twee-Hee Ong8, Carmencita Padilla6, Prasit Palittapongarnpim3, Henry B. Perdigon6, Maude E. Phipps26, Maude E. Phipps1, Eileen Png8, Yoshiyuki Sakaki, Jazelyn M. Salvador6, Yuliana Sandraling22, Vinod Scaria2, Mark Seielstad8, Mohd Ros Sidek11, Amit Sinha2, Metawee Srikummool15, Herawati Sudoyo22, Sumio Sugano19, Helena Suryadi22, Yoshiyuki Suzuki, Kristina A. Tabbada6, Adrian Tan8, Katsushi Tokunaga19, Sissades Tongsima3, Lilian P. Villamor6, Eric Wang16, Ying Wang12, Haifeng Wang12, Jer-Yuarn Wu7, Huasheng Xiao, Shuhua Xu, Jin Ok Yang5, Yin Yao Shugart27, Hyang Sook Yoo5, Wentao Yuan12, Guoping Zhao12, Bin Alwi Zilfalil11 
11 Dec 2009-Science
TL;DR: The results suggest that there may have been a single major migration of people into Asia and a subsequent south-to-north migration across the continent, and that genetic ancestry is strongly correlated with linguistic affiliations as well as geography.
Abstract: Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations.

545 citations

Journal ArticleDOI
TL;DR: A software tool vt normalize is presented that normalizes representation of genetic variants in the VCF and demonstrates the inconsistent representation of variants across existing sequence analysis tools and shows that the tool facilitates integration of diverse variant types and call sets.
Abstract: Summary: A genetic variant can be represented in the Variant Call Format (VCF) in multiple different ways. Inconsistent representation of variants between variant callers and analyses will magnify discrepancies between them and complicate variant filtering and duplicate removal. We present a software tool vt normalize that normalizes representation of genetic variants in the VCF. We formally define variant normalization as the consistent representation of genetic variants in an unambiguous and concise way and derive a simple general algorithm to enforce it. We demonstrate the inconsistent representation of variants across existing sequence analysis tools and show that our tool facilitates integration of diverse variant types and call sets. Availability and implementation: The source code is available for download at http://github.com/atks/vt. More detailed documentation is available at http://genome.sph.umich.edu/wiki/Variant_Normalization. Contact: ude.hcimu@gnakmh Supplementary information: Supplementary data are available at Bioinformatics online.

363 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

10,056 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill4, Andrew J. Hill2, Andrew J. Hill1, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada2, Karol Estrada1, Fengmei Zhao2, Fengmei Zhao1, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta2, Daniel P. Howrigan1, Daniel P. Howrigan2, Adam Kiezun2, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas2, Brett Thomas1, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler2, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel2, Gad Getz1, Gad Getz2, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

4,658 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

Journal ArticleDOI
12 Oct 2017-Nature
TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

3,289 citations