scispace - formally typeset
Search or ask a question
Author

Mingyao Li

Bio: Mingyao Li is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Genome-wide association study & Single-nucleotide polymorphism. The author has an hindex of 69, co-authored 249 publications receiving 37005 citations. Previous affiliations of Mingyao Li include Case Western Reserve University & University of Michigan.


Papers
More filters
Journal ArticleDOI
TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.
Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

10,461 citations

Journal ArticleDOI
05 Aug 2010-Nature
TL;DR: The results identify several novel loci associated with plasma lipids that are also associated with CAD and provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.
Abstract: Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.

3,469 citations

Journal ArticleDOI
Benjamin F. Voight1, Benjamin F. Voight2, Benjamin F. Voight3, Gina M. Peloso4, Gina M. Peloso5, Marju Orho-Melander6, Ruth Frikke-Schmidt7, Maja Barbalić8, Majken K. Jensen3, George Hindy6, Hilma Holm9, Eric L. Ding3, Toby Johnson10, Heribert Schunkert11, Nilesh J. Samani12, Nilesh J. Samani13, Robert Clarke14, Jemma C. Hopewell14, John F. Thompson13, Mingyao Li1, Gudmar Thorleifsson9, Christopher Newton-Cheh, Kiran Musunuru3, Kiran Musunuru2, James P. Pirruccello2, James P. Pirruccello3, Danish Saleheen15, Li Chen16, Alexandre F.R. Stewart16, Arne Schillert11, Unnur Thorsteinsdottir9, Unnur Thorsteinsdottir17, Gudmundur Thorgeirsson17, Sonia S. Anand18, James C. Engert19, Thomas M. Morgan20, John A. Spertus21, Monika Stoll22, Klaus Berger22, Nicola Martinelli23, Domenico Girelli23, Pascal P. McKeown24, Christopher Patterson24, Stephen E. Epstein25, Joseph M. Devaney25, Mary Susan Burnett25, Vincent Mooser26, Samuli Ripatti27, Ida Surakka27, Markku S. Nieminen27, Juha Sinisalo27, Marja-Liisa Lokki27, Markus Perola5, Aki S. Havulinna5, Ulf de Faire28, Bruna Gigante28, Erik Ingelsson28, Tanja Zeller29, Philipp S. Wild29, Paul I.W. de Bakker, Olaf H. Klungel30, Anke-Hilse Maitland-van der Zee30, Bas J M Peters30, Anthonius de Boer30, Diederick E. Grobbee30, Pieter Willem Kamphuisen31, Vera H.M. Deneer, Clara C. Elbers30, N. Charlotte Onland-Moret30, Marten H. Hofker31, Cisca Wijmenga31, W. M. Monique Verschuren, Jolanda M. A. Boer, Yvonne T. van der Schouw30, Asif Rasheed, Philippe M. Frossard, Serkalem Demissie4, Serkalem Demissie5, Cristen J. Willer32, Ron Do3, Jose M. Ordovas33, Jose M. Ordovas34, Gonçalo R. Abecasis32, Michael Boehnke32, Karen L. Mohlke35, Mark J. Daly2, Mark J. Daly3, Candace Guiducci2, Noël P. Burtt2, Aarti Surti2, Elena Gonzalez2, Shaun Purcell3, Shaun Purcell2, Stacey Gabriel2, Jaume Marrugat, John F. Peden14, Jeanette Erdmann11, Patrick Diemert11, Christina Willenborg11, Inke R. König11, Marcus Fischer36, Christian Hengstenberg36, Andreas Ziegler11, Ian Buysschaert37, Diether Lambrechts37, Frans Van de Werf37, Keith A.A. Fox38, Nour Eddine El Mokhtari39, Diana Rubin, Jürgen Schrezenmeir, Stefan Schreiber39, Arne Schäfer39, John Danesh15, Stefan Blankenberg29, Robert Roberts16, Ruth McPherson16, Hugh Watkins14, Alistair S. Hall40, Kim Overvad41, Eric B. Rimm3, Eric Boerwinkle8, Anne Tybjærg-Hansen7, L. Adrienne Cupples5, L. Adrienne Cupples4, Muredach P. Reilly1, Olle Melander6, Pier Mannuccio Mannucci42, Diego Ardissino, David S. Siscovick43, Roberto Elosua, Kari Stefansson9, Kari Stefansson17, Christopher J. O'Donnell5, Christopher J. O'Donnell3, Veikko Salomaa5, Daniel J. Rader1, Leena Peltonen44, Leena Peltonen27, Stephen M. Schwartz43, David Altshuler, Sekar Kathiresan 
11 Aug 2012
TL;DR: In this paper, a Mendelian randomisation analysis was performed to compare the effect of HDL cholesterol, LDL cholesterol, and genetic score on risk of myocardial infarction.
Abstract: Methods We performed two mendelian randomisation analyses. First, we used as an instrument a single nucleotide polymorphism (SNP) in the endothelial lipase gene (LIPG Asn396Ser) and tested this SNP in 20 studies (20 913 myocardial infarction cases, 95 407 controls). Second, we used as an instrument a genetic score consisting of 14 common SNPs that exclusively associate with HDL cholesterol and tested this score in up to 12 482 cases of myocardial infarction and 41 331 controls. As a positive control, we also tested a genetic score of 13 common SNPs exclusively associated with LDL cholesterol. – ¹³) but similar levels of other lipid and non-lipid risk factors for myocardial infarction compared with noncarriers. This diff erence in HDL cholesterol is expected to decrease risk of myocardial infarction by 13% (odds ratio [OR] 0·87, 95% CI 0·84–0·91). However, we noted that the 396Ser allele was not associated with risk of myocardial infarction (OR 0·99, 95% CI 0·88–1·11, p=0·85). From observational epidemiology, an increase of 1 SD in HDL cholesterol was associated with reduced risk of myocardial infarction (OR 0·62, 95% CI 0·58–0·66). However, a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction (OR 0·93, 95% CI 0·68–1·26, p=0·63). For LDL cholesterol, the estimate from observational epidemiology (a 1 SD increase in LDL cholesterol associated with OR 1·54, 95% CI 1·45–1·63) was concordant with that from genetic score (OR 2·13, 95% CI 1·69–2·69, p=2×10

1,878 citations

Journal ArticleDOI
TL;DR: PennCNV, a hidden Markov model (HMM) based approach, is presented for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data, demonstrating the feasibility of whole-genome fine-mapping ofCNVs via high- density SNP genotypesing.
Abstract: Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected approximately 27 CNVs for each individual with a median size of approximately 12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.

1,752 citations

Journal ArticleDOI
TL;DR: This paper performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals.
Abstract: We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 - 10'8 and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.

1,705 citations


Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

01 Jan 2014
TL;DR: These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care.
Abstract: XI. STRATEGIES FOR IMPROVING DIABETES CARE D iabetes is a chronic illness that requires continuing medical care and patient self-management education to prevent acute complications and to reduce the risk of long-term complications. Diabetes care is complex and requires that many issues, beyond glycemic control, be addressed. A large body of evidence exists that supports a range of interventions to improve diabetes outcomes. These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care. While individual preferences, comorbidities, and other patient factors may require modification of goals, targets that are desirable for most patients with diabetes are provided. These standards are not intended to preclude more extensive evaluation and management of the patient by other specialists as needed. For more detailed information, refer to Bode (Ed.): Medical Management of Type 1 Diabetes (1), Burant (Ed): Medical Management of Type 2 Diabetes (2), and Klingensmith (Ed): Intensive Diabetes Management (3). The recommendations included are diagnostic and therapeutic actions that are known or believed to favorably affect health outcomes of patients with diabetes. A grading system (Table 1), developed by the American Diabetes Association (ADA) and modeled after existing methods, was utilized to clarify and codify the evidence that forms the basis for the recommendations. The level of evidence that supports each recommendation is listed after each recommendation using the letters A, B, C, or E.

9,618 citations

Journal ArticleDOI
01 Apr 2012-Fly
TL;DR: It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.
Abstract: We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in...

8,017 citations

Journal ArticleDOI
08 Oct 2009-Nature
TL;DR: This paper examined potential sources of missing heritability and proposed research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
Abstract: Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, 'missing' heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.

7,797 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations