Home
/
Authors
/
Yakir A. Reshef

Author

Yakir A. Reshef

Other affiliations: Weizmann Institute of Science, Massachusetts Institute of Technology, Broad Institute

Bio: Yakir A. Reshef is an academic researcher from Harvard University. The author has contributed to research in topics: Maximal information coefficient & Genome-wide association study. The author has an hindex of 21, co-authored 44 publications receiving 6071 citations. Previous affiliations of Yakir A. Reshef include Weizmann Institute of Science & Massachusetts Institute of Technology.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Detecting Novel Associations in Large Data Sets

[...]

David N. Reshef¹, David N. Reshef², David N. Reshef³, Yakir A. Reshef⁴, Yakir A. Reshef², Hilary K. Finucane⁵, Sharon R. Grossman⁴, Sharon R. Grossman², Gilean McVean¹, Gilean McVean⁶, Peter J. Turnbaugh⁴, Eric S. Lander⁴, Eric S. Lander³, Eric S. Lander², Michael Mitzenmacher⁴, Pardis C. Sabeti⁴, Pardis C. Sabeti² - Show less +13 more•Institutions (6)

University of Oxford¹, Broad Institute², Massachusetts Institute of Technology³, Harvard University⁴, Weizmann Institute of Science⁵, Wellcome Trust Centre for Human Genetics⁶

16 Dec 2011-Science

TL;DR: A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.

...read moreread less

Abstract: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

...read moreread less

2,414 citations

Journal Article•DOI•

Partitioning heritability by functional annotation using genome-wide association summary statistics.

[...]

Hilary K. Finucane¹, Hilary K. Finucane², Brendan Bulik-Sullivan³, Brendan Bulik-Sullivan², Alexander Gusev², Gosia Trynka, Yakir A. Reshef², Po-Ru Loh², Verneri Anttila³, Verneri Anttila², Han Xu², Chongzhi Zang², Kyle Kai-How Farh², Kyle Kai-How Farh³, Stephan Ripke², Stephan Ripke³, Felix R. Day⁴, Shaun Purcell⁵, Shaun Purcell⁶, Eli A. Stahl⁶, Sara Lindström², John R. B. Perry⁴, Yukinori Okada⁷, Soumya Raychaudhuri, Mark J. Daly³, Mark J. Daly², Nick Patterson³, Benjamin M. Neale², Benjamin M. Neale³, Alkes L. Price², Alkes L. Price³ - Show less +27 more•Institutions (7)

Massachusetts Institute of Technology¹, Harvard University², Broad Institute³, Medical Research Council⁴, Brigham and Women's Hospital⁵, Icahn School of Medicine at Mount Sinai⁶, Tokyo Medical and Dental University⁷

01 Nov 2015-Nature Genetics

TL;DR: A new method is introduced, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers, which is computationally tractable at very large sample sizes and leverages genome-wide information.

...read moreread less

Abstract: Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

...read moreread less

1,939 citations

Journal Article•DOI•

Reference-based phasing using the Haplotype Reference Consortium panel.

[...]

Po-Ru Loh¹, Po-Ru Loh², Petr Danecek³, Pier Francesco Palamara¹, Pier Francesco Palamara², Christian Fuchsberger⁴, Christian Fuchsberger⁵, Yakir A. Reshef², Hilary K. Finucane², Hilary K. Finucane⁶, Sebastian Schoenherr⁷, Lukas Forer⁷, Shane A. McCarthy³, Gonçalo R. Abecasis⁵, Richard Durbin³, Alkes L. Price¹, Alkes L. Price² - Show less +13 more•Institutions (7)

Broad Institute¹, Harvard University², Wellcome Trust Sanger Institute³, European Academy of Bozen⁴, University of Michigan⁵, Massachusetts Institute of Technology⁶, Innsbruck Medical University⁷

01 Nov 2016-Nature Genetics

TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform.

...read moreread less

Abstract: Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

...read moreread less

1,246 citations

Journal Article•DOI•

Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types.

[...]

Hilary K. Finucane¹, Hilary K. Finucane², Hilary K. Finucane³, Yakir A. Reshef¹, Verneri Anttila¹, Verneri Anttila², Kamil Slowikowski⁴, Kamil Slowikowski², Kamil Slowikowski¹, Alexander Gusev¹, Andrea Byrnes², Andrea Byrnes¹, Steven Gazal¹, Po-Ru Loh¹, Caleb A. Lareau¹, Caleb A. Lareau², Noam Shoresh², Giulio Genovese², Arpiar Saunders¹, Evan Z. Macosko¹, Samuela Pollack¹, John R. B. Perry⁵, Jason D. Buenrostro², Jason D. Buenrostro¹, Bradley E. Bernstein², Bradley E. Bernstein¹, Soumya Raychaudhuri, Steven A. McCarroll², Steven A. McCarroll¹, Benjamin M. Neale², Benjamin M. Neale¹, Alkes L. Price¹, Alkes L. Price² - Show less +29 more•Institutions (5)

Harvard University¹, Broad Institute², Massachusetts Institute of Technology³, Brigham and Women's Hospital⁴, Medical Research Council⁵

09 Apr 2018-Nature Genetics

TL;DR: An approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics and found significant tissue-specific enrichments for 34 traits.

...read moreread less

Abstract: We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.

...read moreread less

707 citations

Journal Article•DOI•

Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights

[...]

Alexander Gusev¹, Alexander Gusev², Nicholas Mancuso³, Hyejung Won³, Maria Kousi⁴, Hilary K. Finucane², Hilary K. Finucane¹, Hilary K. Finucane⁵, Yakir A. Reshef², Lingyun Song⁴, Alexias Safi⁴, Steven A. McCarroll¹, Steven A. McCarroll², Benjamin M. Neale¹, Benjamin M. Neale², Roel A. Ophoff³, Roel A. Ophoff⁶, Michael Conlon O'Donovan⁷, Gregory E. Crawford⁴, Daniel H. Geschwind, Nicholas Katsanis⁴, Patrick F. Sullivan⁸, Patrick F. Sullivan⁹, Bogdan Pasaniuc³, Alkes L. Price², Alkes L. Price¹ - Show less +22 more•Institutions (9)

Broad Institute¹, Harvard University², University of California, Los Angeles³, Duke University⁴, Massachusetts Institute of Technology⁵, Utrecht University⁶, Cardiff University⁷, University of North Carolina at Chapel Hill⁸, Karolinska Institutet⁹

09 Apr 2018-Nature Genetics

TL;DR: A transcriptome- wide association study integrating genome-wide association data with expression data from brain, blood and adipose tissues identifies new candidate susceptibility genes for schizophrenia, providing a step toward understanding the underlying biology.

...read moreread less

Abstract: Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.

...read moreread less

379 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Journal Article•DOI•

The mutational constraint spectrum quantified from variation in 141,456 humans

[...]

Konrad J. Karczewski¹, Laurent C. Francioli¹, Grace Tiao¹, Beryl B. Cummings¹, Jessica Alföldi¹, Qingbo Wang¹, Ryan L. Collins¹, Kristen M. Laricchia¹, Andrea Ganna¹, Daniel P. Birnbaum¹, Laura D. Gauthier¹, Harrison Brand¹, Matthew Solomonson¹, Nicholas A. Watts¹, Daniel R. Rhodes², Moriel Singer-Berk¹, Eleina M. England¹, Eleanor G. Seaby¹, Jack A. Kosmicki¹, Raymond K. Walters¹, Katherine Tashman¹, Yossi Farjoun¹, Eric Banks¹, Timothy Poterba¹, Arcturus Wang¹, Cotton Seed¹, Nicola Whiffin¹, Jessica X. Chong³, Kaitlin E. Samocha⁴, Emma Pierce-Hoffman¹, Zachary Zappala¹, Anne H. O’Donnell-Luria¹, Eric Vallabh Minikel¹, Ben Weisburd¹, Monkol Lek⁵, James S. Ware¹, Christopher Vittal⁶, Irina M. Armean¹, Louis Bergelson¹, Kristian Cibulskis¹, Kristen M. Connolly¹, Miguel Covarrubias¹, Stacey Donnelly¹, Steven Ferriera¹, Stacey Gabriel¹, Jeff Gentry¹, Namrata Gupta¹, Thibault Jeandet¹, Diane Kaplan¹, Christopher Llanwarne¹, Ruchi Munshi¹, Sam Novod¹, Nikelle Petrillo¹, David Roazen¹, Valentin Ruano-Rubio¹, Andrea Saltzman¹, Molly Schleicher¹, Jose Soto¹, Kathleen Tibbetts¹, Charlotte Tolonen¹, Gordon Wade¹, Michael E. Talkowski¹, Benjamin M. Neale¹, Mark J. Daly¹, Daniel G. MacArthur¹ - Show less +61 more•Institutions (6)

Broad Institute¹, Queen Mary University of London², University of Washington³, Wellcome Trust Sanger Institute⁴, Yale University⁵, Harvard University⁶

27 May 2020-Nature

TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

4,913 citations

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova², Desislava Petkova¹, Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes¹, Adrian Cortes⁷, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, Swiss Institute of Bioinformatics⁴, University of Geneva⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje, Wouter Meuleman, Jason Ernst, Angela Yen, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Lucas D. Ward, Abhishek Sarkar, Gerald Quon, Matthew L. Eaton, Yi-Chieh Wu, Andreas R. Pfenning, Xinchen Wang, Melina Claussnitzer, Yaping Liu, Mukul S. Bansal, Soheil Feizi-Khankandi, Ah Ram Kim, Richard C Sallari, Nicholas A Sinnott-Armstrong, Laurie A. Boyer, Elizabeta Gjoneska, Li-Huei Tsai, Manolis Kellis - Show less +21 more

01 Feb 2015

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.

...read moreread less

Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read moreread less

4,409 citations

Book•

Applied Predictive Modeling

[...]

Max Kuhn, Kjell Johnson

17 May 2013

TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.

...read moreread less

Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

...read moreread less

3,672 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse