Home
/
Authors
/
Adam J. Riesselman

Author

Adam J. Riesselman

Other affiliations: Donald Danforth Plant Science Center

Bio: Adam J. Riesselman is an academic researcher from Harvard University. The author has contributed to research in topics: Sequence space (evolution) & Genome. The author has an hindex of 12, co-authored 16 publications receiving 846 citations. Previous affiliations of Adam J. Riesselman include Donald Danforth Plant Science Center.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Deep generative models of genetic variation capture the effects of mutations.

[...]

Adam J. Riesselman¹, John Ingraham¹, Debora S. Marks¹•Institutions (1)

Harvard University¹

24 Sep 2018-Nature Methods

TL;DR: DeepSequence is an unsupervised deep latent-variable model that predicts the effects of mutations on the basis of evolutionary sequence information that is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.

...read moreread less

Abstract: The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies We found that DeepSequence ( https://githubcom/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space

...read moreread less

385 citations

Journal Article•DOI•

The EVcouplings Python framework for coevolutionary sequence analysis

[...]

Thomas A. Hopf¹, Anna G. Green¹, Benjamin Schubert¹, Sophia Mersmann¹, Charlotta P I Schärfe², Charlotta P I Schärfe¹, John Ingraham¹, Agnes Toth-Petroczy¹, Kelly P Brock¹, Adam J. Riesselman¹, Perry Palmedo¹, Perry Palmedo³, Chan Kang¹, Robert L. Sheridan⁴, Eli J. Draizen⁵, Christian Dallago¹, Christian Dallago⁶, Chris Sander¹, Debora S. Marks¹ - Show less +15 more•Institutions (6)

Harvard University¹, University of Tübingen², Massachusetts Institute of Technology³, Memorial Sloan Kettering Cancer Center⁴, University of Virginia⁵, Technische Universität München⁶

01 May 2019-Bioinformatics

TL;DR: The EVcouplings framework is presented, a fully integrated open-source application and Python package for coevolutionary analysis that enables generation of sequence alignments, calculation and evaluation of evolutionary couplings, and de novo prediction of structure and mutation effects.

...read moreread less

Abstract: Summary Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. Availability and implementation https://github.com/debbiemarkslab/evcouplings.

...read moreread less

161 citations

Journal Article•DOI•

3D RNA and Functional Interactions from Evolutionary Couplings

[...]

Caleb Weinreb¹, Adam J. Riesselman¹, John Ingraham¹, Torsten Gross², Torsten Gross¹, Chris Sander¹, Debora S. Marks¹ - Show less +3 more•Institutions (2)

Harvard University¹, Charité²

05 May 2016-Cell

TL;DR: In this paper, the authors mine the evolutionary sequence record to derive precise information about the function and structure of RNAs and RNA-protein complexes and predict contacts in 160 non-coding RNA families.

...read moreread less

158 citations

Proceedings Article•

Learning Protein Structure with a Differentiable Simulator

[...]

John Ingraham¹, Adam J. Riesselman¹, Chris Sander¹, Debora S. Marks¹•Institutions (1)

Harvard University¹

27 Sep 2018

133 citations

Journal Article•DOI•

Protein design and variant prediction using autoregressive generative models

[...]

Jung-Eun Shin¹, Adam J. Riesselman¹, Aaron W. Kollasch¹, Conor McMahon¹, Elana P. Simon¹, Chris Sander¹, Aashish Manglik², Andrew C. Kruse¹, Debora S. Marks³, Debora S. Marks¹ - Show less +6 more•Institutions (3)

Harvard University¹, University of California, San Francisco², Broad Institute³

23 Apr 2021-Nature Communications

TL;DR: In this article, a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments is proposed, which performs state-of-the-art prediction of missense and indel effects and successfully design and test a diverse 105-nanobody library.

...read moreread less

Abstract: The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

...read moreread less

116 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Highly accurate protein structure prediction with AlphaFold

[...]

John M. Jumper, Richard O. Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russell Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, R. D. Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger¹, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David L. Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis - Show less +30 more•Institutions (1)

Seoul National University¹

15 Jul 2021-Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

...read moreread less

10,601 citations

Journal Article•DOI•

High-performance medicine: the convergence of human and artificial intelligence

[...]

Eric J. Topol¹•Institutions (1)

Scripps Health¹

01 Jan 2019-Nature Medicine

TL;DR: Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient–doctor relationship or facilitate its erosion remains to be seen.

...read moreread less

Abstract: The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors. In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient-doctor relationship or facilitate its erosion remains to be seen.

...read moreread less

2,574 citations

Posted Content•DOI•

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

[...]

Alexander Rives¹, Siddharth Goyal², Joshua Meier², Demi Guo², Myle Ott², C. Lawrence Zitnick², Jerry Ma², Rob Fergus², Rob Fergus¹ - Show less +5 more•Institutions (2)

New York University¹, Facebook²

29 Apr 2019-bioRxiv

TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.

...read moreread less

Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

...read moreread less

748 citations

Journal Article•DOI•

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

[...]

Alexander Rives¹, Alexander Rives², Joshua Meier¹, Tom Sercu¹, Siddharth Goyal¹, Zeming Lin², Jason Liu¹, Demi Guo³, Myle Ott¹, C. Lawrence Zitnick¹, Jerry Ma⁴, Jerry Ma⁵, Rob Fergus² - Show less +9 more•Institutions (5)

Facebook¹, New York University², Harvard University³, University of Chicago⁴, Yale University⁵

13 Apr 2021-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This paper used unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, which contains information about biological properties in its representations.

...read moreread less

Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity The resulting model contains information about biological properties in its representations The representations are learned from sequence data alone The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction

...read moreread less

700 citations

Journal Article•DOI•

Unified rational protein engineering with sequence-based deep representation learning

[...]

Ethan C. Alley¹, Grigory Khimulya, Surojit Biswas², Surojit Biswas¹, Mohammed AlQuraishi², George M. Church¹, George M. Church² - Show less +3 more•Institutions (2)

Wyss Institute for Biologically Inspired Engineering¹, Harvard University²

21 Oct 2019-Nature Methods

TL;DR: Deep learning is applied to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded and broadly applicable to unseen regions of sequence space.

...read moreread less

Abstract: Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.

...read moreread less

560 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse