Home
/
Authors
/
Simona Volpi

Author

Simona Volpi

Bio: Simona Volpi is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Regulation of gene expression & Translation (biology). The author has an hindex of 19, co-authored 29 publications receiving 10987 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Genotype-Tissue Expression (GTEx) project

[...]

John T. Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando U. Garcia¹, Nancy Young², Barbara A. Foster³, Mike Moser³, Ellen Karasik³, Bryan Gillard³, Kimberley Ramsey³, Susan L. Sullivan, Jason Bridge, Harold Magazine, John Syron, Johnelle Fleming, Laura A. Siminoff⁴, Heather M. Traino⁴, Maghboeba Mosavel⁴, Laura Barker⁴, Scott D. Jewell⁵, Daniel C. Rohrer⁵, Dan Maxim⁵, Dana Filkins⁵, Philip Harbach⁵, Eddie Cortadillo⁵, Bree Berghuis⁵, Lisa Turner⁵, Eric Hudson⁵, Kristin Feenstra⁵, Leslie H. Sobin⁶, James A. Robb⁶, Phillip Branton, Greg E. Korzeniewski⁶, Charles Shive⁶, David Tabor⁶, Liqun Qi⁶, Kevin Groch⁶, Sreenath Nampally⁶, Steve Buia⁶, Angela Zimmerman⁶, Anna M. Smith⁶, Robin Burges⁶, Karna Robinson⁶, Kim Valentino⁶, Deborah Bradbury⁶, Mark Cosentino⁶, Norma Diaz-Mayoral⁶, Mary Kennedy⁶, Theresa Engel⁶, Penelope Williams⁶, Kenyon Erickson, Kristin G. Ardlie⁷, Wendy Winckler⁷, Gad Getz⁸, Gad Getz⁷, David S. DeLuca⁷, MacArthur Daniel MacArthur⁸, MacArthur Daniel MacArthur⁷, Manolis Kellis⁷, Alexander Thomson⁷, Taylor Young⁷, Ellen Gelfand⁷, Molly Donovan⁷, Yan Meng⁷, George B. Grant⁷, Deborah C. Mash⁹, Yvonne Marcus⁹, Margaret J. Basile⁹, Jun Liu⁸, Jun Zhu¹⁰, Zhidong Tu¹⁰, Nancy J. Cox¹¹, Dan L. Nicolae¹¹, Eric R. Gamazon¹¹, Hae Kyung Im¹¹, Anuar Konkashbaev¹¹, Jonathan K. Pritchard¹¹, Jonathan K. Pritchard¹², Matthew Stevens¹¹, Timothée Flutre¹¹, Xiaoquan Wen¹¹, Emmanouil T. Dermitzakis¹³, Tuuli Lappalainen¹³, Roderic Guigó, Jean Monlong, Michael Sammeth, Daphne Koller¹⁴, Alexis Battle¹⁴, Sara Mostafavi¹⁴, Mark I. McCarthy¹⁵, Manual Rivas¹⁵, Julian Maller¹⁵, Ivan Rusyn¹⁶, Andrew B. Nobel¹⁶, Fred A. Wright¹⁶, Andrey A. Shabalin¹⁶, Mike Feolo¹⁷, Nataliya Sharopova¹⁷, Anne Sturcke¹⁷, Justin Paschal¹⁷, James M. Anderson¹⁷, Elizabeth L. Wilder¹⁷, Leslie Derr¹⁷, Eric D. Green¹⁷, Jeffery P. Struewing¹⁷, Gary F. Temple¹⁷, Simona Volpi¹⁷, Joy T. Boyer¹⁷, Elizabeth J. Thomson¹⁷, Mark S. Guyer¹⁷, Cathy Ng¹⁷, Assya Abdallah¹⁷, Deborah Colantuoni¹⁷, Thomas R. Insel¹⁷, Susan E. Koester¹⁷, Roger Little¹⁷, Patrick Bender¹⁷, Thomas Lehner¹⁷, Yin Yao¹⁷, Carolyn C. Compton¹⁷, Jimmie B. Vaught¹⁷, Sherilyn Sawyer¹⁷, Nicole C. Lockhart¹⁷, Joanne P. Demchok¹⁷, Helen F. Moore¹⁷ - Show less +126 more•Institutions (17)

Drexel University¹, Yeshiva University², Roswell Park Cancer Institute³, Virginia Commonwealth University⁴, Van Andel Institute⁵, Science Applications International Corporation⁶, Massachusetts Institute of Technology⁷, Harvard University⁸, University of Miami⁹, Icahn School of Medicine at Mount Sinai¹⁰, University of Chicago¹¹, Howard Hughes Medical Institute¹², University of Geneva¹³, Stanford University¹⁴, University of Oxford¹⁵, University of North Carolina at Chapel Hill¹⁶, National Institutes of Health¹⁷

29 May 2013-Nature Genetics

TL;DR: The Genotype-Tissue Expression (GTEx) project is described, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.

...read moreread less

Abstract: Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.

...read moreread less

6,545 citations

Journal Article•DOI•

The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans

[...]

Kristin G. Ardlie, David S. DeLuca, Ayellet V. Segrè, Timothy J. Sullivan, Taylor Young, Ellen Gelfand, Casandra A. Trowbridge, Julian Maller, Taru Tukiainen, Monkol Lek, Lucas D. Ward, Pouya Kheradpour, Benjamin Iriarte, Yan Meng, Cameron D. Palmer, Tõnu Esko, Wendy Winckler, Joel N. Hirschhorn, Manolis Kellis, Daniel G. MacArthur, Gad Getz, Andrey A. Shabalin, Gen Li, Yi-Hui Zhou, Andrew B. Nobel, Ivan Rusyn, Fred A. Wright, Tuuli Lappalainen, Pedro G. Ferreira, Halit Ongen, Manuel A. Rivas, Alexis Battle, Sara Mostafavi, Jean Monlong, Michael Sammeth, Marta Melé, Ferran Reverter, Jakob M. Goldmann, Daphne Koller, Roderic Guigó, Mark I. McCarthy, Emmanouil T. Dermitzakis, Eric R. Gamazon, Hae Kyung Im, Anuar Konkashbaev, Dan L. Nicolae, Nancy J. Cox, Timothée Flutre, Xiaoquan Wen, Matthew Stephens, Jonathan K. Pritchard, Zhidong Tu, Bin Zhang, Tao Huang, Quan Long, Luan Lin, Jialiang Yang, Jun Zhu, Jun Liu, Amanda Brown, Bernadette Mestichelli, Denee Tidwell, Edmund Lo, Mike Salvatore, Saboor Shad, Jeffrey A. Thomas, John T. Lonsdale, Michael T. Moser, Bryan Gillard, Ellen Karasik, Kimberly Ramsey, Christopher Choi, Barbara A. Foster, John Syron, Johnell Fleming, Harold Magazine, Rick Hasz, Gary Walters, Jason Bridge, Mark Miklos, Susan L. Sullivan, Laura Barker, Heather M. Traino, Maghboeba Mosavel, Laura A. Siminoff, Dana R. Valley, Daniel C. Rohrer, Scott D. Jewell, Philip A. Branton, Leslie H. Sobin, Mary Barcus, Liqun Qi, Jeffrey McLean, Pushpa Hariharan, Ki Sung Um, Shenpei Wu, David Tabor, Charles Shive, Anna M. Smith, Stephen A. Buia, Anita H. Undale, Karna Robinson, Nancy Roche, Kimberly M. Valentino, Angela Britton, Robin Burges, Debra Bradbury, Kenneth W. Hambright, John Seleski, Greg E. Korzeniewski, Kenyon Erickson, Yvonne Marcus, Jorge Tejada, Mehran Taherian, Chunrong Lu, Margaret J. Basile, Deborah C. Mash, Simona Volpi, Jeffery P. Struewing, Gary F. Temple, Joy T. Boyer, Deborah Colantuoni, Roger Little, Susan E. Koester, Latarsha J. Carithers, Helen M. Moore, Ping Guan, Carolyn C. Compton, Sherilyn Sawyer, Joanne P. Demchok, Jimmie B. Vaught, Chana A. Rabiner, Nicole C. Lockhart - Show less +129 more

08 May 2015-Science

TL;DR: The landscape of gene expression across tissues is described, thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants are cataloged, complex network relationships are described, and signals from genome-wide association studies explained by eQTLs are identified.

...read moreread less

Abstract: Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysi...

...read moreread less

4,418 citations

Journal Article•DOI•

The GTEx Consortium atlas of genetic regulatory effects across human tissues

[...]

François Aguet, Alvaro N. Barbeira, Rodrigo Bonazzola, Andrew A. Brown +164 more•Institutions (1)

01 Jan 2020-Science

1,756 citations

Journal Article•DOI•

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.

[...]

Alvaro N. Barbeira¹, Scott P. Dickinson¹, Rodrigo Bonazzola¹, Jiamao Zheng¹ +260 more•Institutions (43)

08 May 2018-Nature Communications

TL;DR: A mathematical expression is derived to compute PrediXcan results using summary data, and the effects of gene expression variation on human phenotypes in 44 GTEx tissues and >100 phenotypes are investigated.

...read moreread less

Abstract: Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.

...read moreread less

657 citations

Journal Article•DOI•

Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems.

[...]

Laura J. Rasmussen-Torvik¹, Sarah C. Stallings², Adam S. Gordon³, Berta Almoguera⁴, Melissa A. Basford², Suzette J. Bielinski⁵, Ariel Brautbar⁶, Murray H. Brilliant⁶, David Carrell⁷, John Connolly⁴, David R. Crosslin³, Kimberly F. Doheny⁸, Carlos J. Gallego³, Omri Gottesman⁹, Daniel Seung Kim³, Kathleen A. Leppig⁷, Rongling Li¹⁰, Simon Lin⁶, Shannon Manzi¹¹, Ana R. Mejia⁹, Jennifer A. Pacheco¹, Vivian Pan¹, Jyotishman Pathak⁵, Cassandra Perry¹¹, Josh F. Peterson¹², Cynthia A. Prows¹³, James D. Ralston⁷, Luke V. Rasmussen¹, Marylyn D. Ritchie¹⁴, Senthilkumar Sadhasivam¹³, Senthilkumar Sadhasivam¹⁵, Stuart A. Scott⁹, Maureen E. Smith¹, Aida Vega⁹, Alexander A. Vinks¹³, Alexander A. Vinks¹⁵, Simona Volpi¹⁰, Wendy A. Wolf¹¹, Erwin P. Bottinger⁹, Rex L. Chisholm¹, Christopher G. Chute⁵, Jonathan L. Haines¹², John B. Harley¹⁵, John B. Harley¹⁶, Brendan J. Keating⁴, Ingrid A. Holm¹¹, Ingrid A. Holm¹, Iftikhar J. Kullo⁵, Gail P. Jarvik³, Eric B. Larson⁷, Teri A. Manolio¹⁰, Catherine A. McCarty, Deborah A. Nickerson³, Steven E. Scherer¹⁷, Marc S. Williams¹⁸, Dan M. Roden², Joshua C. Denny² - Show less +53 more•Institutions (18)

Northwestern University¹, Vanderbilt University², University of Washington³, University of Pennsylvania⁴, Mayo Clinic⁵, Marshfield Clinic⁶, Group Health Cooperative⁷, Johns Hopkins University⁸, Icahn School of Medicine at Mount Sinai⁹, National Institutes of Health¹⁰, Harvard University¹¹, Vanderbilt University Medical Center¹², Cincinnati Children's Hospital Medical Center¹³, Pennsylvania State University¹⁴, University of Cincinnati¹⁵, Veterans Health Administration¹⁶, Baylor College of Medicine¹⁷, Geisinger Medical Center¹⁸

01 Oct 2014-Clinical Pharmacology & Therapeutics

TL;DR: The design and initial implementation of the eMERGE‐PGx project is described, including site‐specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to managing incidental findings, and patient and clinician education methods.

...read moreread less

Abstract: We describe here the design and initial implementation of the eMERGE-PGx project. eMERGE-PGx, a partnership of the eMERGE and PGRN consortia, has three objectives : 1) Deploy PGRNseq, a next-generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1–3 year timeframe across several clinical sites; 2) Integrate well-established clinically-validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and assess process and clinical outcomes of implementation; and 3) Develop a repository of pharmacogenetic variants of unknown significance linked to a repository of EHR-based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site-specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to manage incidental findings, and patient and clinician education methods.

...read moreread less

204 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel², Eric Vallabh Minikel¹, Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria³, James S. Ware, Andrew J. Hill⁴, Andrew J. Hill², Andrew J. Hill¹, Beryl B. Cummings¹, Beryl B. Cummings², Taru Tukiainen¹, Taru Tukiainen², Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan¹, Laramie E. Duncan², Karol Estrada¹, Karol Estrada², Fengmei Zhao², Fengmei Zhao¹, James Zou², Emma Pierce-Hoffman², Emma Pierce-Hoffman¹, Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick², Jason Flannick¹, Menachem Fromer, Laura D. Gauthier², Jackie Goldstein², Jackie Goldstein¹, Namrata Gupta², Daniel P. Howrigan¹, Daniel P. Howrigan², Adam Kiezun², Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso¹, Gina M. Peloso², Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas¹, Brett Thomas², Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler², David Altshuler¹⁰, Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez², Jose C. Florez¹, Stacey Gabriel², Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll¹, Steven A. McCarroll², Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale¹, Benjamin M. Neale², Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan¹⁴, Patrick F. Sullivan²¹, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur², Daniel G. MacArthur¹ - Show less +103 more•Institutions (24)

Harvard University¹, Broad Institute², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, Wellcome Trust Centre for Human Genetics¹⁶, University of Oxford¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

[...]

Maxim V. Kuleshov¹, Matthew R. Jones¹, Andrew D. Rouillard¹, Nicolas F. Fernandez¹, Qiaonan Duan¹, Zichen Wang¹, Simon Koplev¹, Sherry L. Jenkins¹, Kathleen M. Jagodnik², Alexander Lachmann¹, Michael G. McDermott¹, Caroline D. Monteiro¹, Gregory W. Gundersen¹, Avi Ma'ayan¹ - Show less +10 more•Institutions (2)

Icahn School of Medicine at Mount Sinai¹, Glenn Research Center²

08 Jul 2016-Nucleic Acids Research

TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.

...read moreread less

Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

...read moreread less

6,201 citations

Journal Article•DOI•

GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses.

[...]

Zefang Tang¹, Chenwei Li¹, Boxi Kang¹, Ge Gao¹, Cheng Li¹, Zemin Zhang - Show less +2 more•Institutions (1)

Peking University¹

03 Jul 2017-Nucleic Acids Research

TL;DR: GEPIA (Gene Expression Profiling Interactive Analysis) fills in the gap between cancer genomics big data and the delivery of integrated information to end users, thus helping unleash the value of the current data resources.

...read moreread less

Abstract: Tremendous amount of RNA sequencing data have been produced by large consortium projects such as TCGA and GTEx, creating new opportunities for data mining and deeper understanding of gene functions. While certain existing web servers are valuable and widely used, many expression analysis functions needed by experimental biologists are still not adequately addressed by these tools. We introduce GEPIA (Gene Expression Profiling Interactive Analysis), a web-based tool to deliver fast and customizable functionalities based on TCGA and GTEx data. GEPIA provides key interactive and customizable functions including differential expression analysis, profiling plotting, correlation analysis, patient survival analysis, similar gene detection and dimensionality reduction analysis. The comprehensive expression analyses with simple clicking through GEPIA greatly facilitate data mining in wide research areas, scientific discussion and the therapeutic discovery process. GEPIA fills in the gap between cancer genomics big data and the delivery of integrated information to end users, thus helping unleash the value of the current data resources. GEPIA is available at http://gepia.cancer-pku.cn/.

...read moreread less

5,980 citations

Journal Article•DOI•

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

[...]

Daehwan Kim¹, Joseph M. Paggi², Chanhee Park¹, Christopher Bennett¹, Steven L. Salzberg³ - Show less +1 more•Institutions (3)

University of Texas Southwestern Medical Center¹, Stanford University², Johns Hopkins University³

01 Aug 2019-Nature Biotechnology

TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.

...read moreread less

Abstract: The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays. A graph-based genome indexing scheme enables variant-aware alignment of sequences with very low memory requirements.

...read moreread less

4,855 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse