Home
/
Authors
/
Ian R. Humphreys

Author

Ian R. Humphreys

Bio: Ian R. Humphreys is an academic researcher from University of Washington. The author has contributed to research in topics: Pipeline (computing) & Protein structure prediction. The author has an hindex of 3, co-authored 5 publications receiving 20 citations.

Topics: Pipeline (computing), Protein structure prediction, Oligomer, CASP ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Computed structures of core eukaryotic protein complexes.

[...]

Ian R. Humphreys¹, Jimin Pei², Minkyung Baek¹, Aditya Krishnakumar¹, Ivan Anishchenko¹, Sergey Ovchinnikov³, Jing Zhang², Travis J. Ness⁴, Sudeep Banjade⁵, Saket R. Bagde⁵, Viktoriya G. Stancheva⁶, Xiao-Han Li⁶, Kaixian Liu⁷, Zhi Zheng⁸, Zhi Zheng⁷, Daniel J. Barrero⁹, Upasana Roy¹⁰, Jochen Kuper¹¹, Israel S. Fernández¹², Barnabas Szakal, Dana Branzei, Josep Rizo², Caroline Kisker¹¹, Eric C. Greene¹⁰, Sue Biggins⁹, Scott Keeney⁸, Scott Keeney⁷, Elizabeth A. Miller⁶, J. Christopher Fromme⁵, Tamara L. Hendrickson⁴, Qian Cong², David Baker¹ - Show less +28 more•Institutions (12)

University of Washington¹, University of Texas Southwestern Medical Center², Harvard University³, Wayne State University⁴, Cornell University⁵, Laboratory of Molecular Biology⁶, Memorial Sloan Kettering Cancer Center⁷, Kettering University⁸, Fred Hutchinson Cancer Research Center⁹, Columbia University¹⁰, University of Würzburg¹¹, St. Jude Children's Research Hospital¹²

11 Nov 2021-Science

TL;DR: The structures of many eukaryotic protein complexes are unknown, and there are likely many protein-protein interactions not yet identified as mentioned in this paper, but these structures play critical roles in biology.

...read moreread less

Abstract: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...

...read moreread less

215 citations

Journal Article•DOI•

Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14.

[...]

Ivan Anishchenko¹, Minkyung Baek¹, Hahnbeom Park¹, Naozumi Hiranuma¹, David E. Kim¹, Justas Dauparas¹, Sanaa Mansoor¹, Ian R. Humphreys¹, David Baker¹ - Show less +5 more•Institutions (1)

University of Washington¹

17 Aug 2021-Proteins

TL;DR: The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built as mentioned in this paper, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14.

...read moreread less

Abstract: The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.

...read moreread less

26 citations

Journal Article•DOI•

Protein oligomer modeling guided by predicted interchain contacts in CASP14.

[...]

Minkyung Baek¹, Ivan Anishchenko¹, Hahnbeom Park¹, Ian R. Humphreys¹, David Baker¹ - Show less +1 more•Institutions (1)

University of Washington¹

23 Aug 2021-Proteins

TL;DR: In this paper, a deep learning-based method was developed for predicting homo-oligomeric contacts and used them for oligomer modeling, which produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance.

...read moreread less

Abstract: For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock method with template-based and ab initio docking approaches using deep learning-based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM-score of 0.71 (average oligomer TM-score of the next best group: 0.64), and explicit modeling of inter-subunit interactions improved modeling of six out of 40 individual domains (ΔGDT-TS > 2.0).

...read moreread less

13 citations

Posted Content•DOI•

Structures of core eukaryotic protein complexes

[...]

Ian R. Humphreys¹, Jimin Pei², Minkyung Baek¹, Krishnakumar A¹, Ivan Anishchenko¹, Sergey Ovchinnikov³, Jianhua Zhang², Travis J. Ness⁴, Sudeep Banjade⁵, Saket R. Bagde⁵, Viktoriya G. Stancheva⁶, Xiao-Han Li⁶, Kaixian Liu⁷, Zheng Z⁷, Zheng Z⁸, Barrero Dj⁹, Upasana Roy¹⁰, Fernández Is¹¹, Barnabas Szakal, Dana Branzei, Eric C. Greene¹⁰, Biggins S⁹, Scott Keeney⁸, Scott Keeney⁷, Elizabeth A. Miller⁶, Fromme Jc⁵, Hendrickson Tl³, Qian Cong², David Baker¹², David Baker¹ - Show less +26 more•Institutions (12)

University of Washington¹, University of Texas Southwestern Medical Center², Harvard University³, Wayne State University⁴, Cornell University⁵, Laboratory of Molecular Biology⁶, Memorial Sloan Kettering Cancer Center⁷, Kettering University⁸, Fred Hutchinson Cancer Research Center⁹, Columbia University¹⁰, St. Jude Children's Research Hospital¹¹, Howard Hughes Medical Institute¹²

30 Sep 2021-bioRxiv

TL;DR: In this article, a combination of RoseTTAFold and AlphaFold is used to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies.

...read moreread less

Abstract: Protein-protein interactions play critical roles in biology, but despite decades of effort, the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions that have not yet been identified. Here, we take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes, as represented within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies with two to five components. Comparison to existing interaction and structural data suggests that these predictions are likely to be quite accurate. We provide structure models spanning almost all key processes in Eukaryotic cells for 104 protein assemblies which have not been previously identified, and 608 which have not been structurally characterized. One-sentence summary We take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes.

...read moreread less

11 citations

Peer Review•DOI•

Author response for "Protein oligomer modeling guided by predicted interchain contacts in CASP14"

[...]

Minkyung Baek, Ivan Anishchenko, Hahnbeom Park, Ian R. Humphreys, David Baker - Show less +1 more

02 Jul 2021

8 citations

Cited by

PDF

Open Access

More filters

Posted Content•DOI•

Protein complex prediction with AlphaFold-Multimer

[...]

Richard Evans, Michael J. O'Neill, Alexander Pritzel, Natasha Antropova, Andrew W. Senior, Tim Green, Augustin Žídek, Russell Bates, Sam Blackwell, Jason Yim, Olaf Ronneberger, Sebastian Bodenstein, Michal Zielinski, Alex Bridgland, Anna Potapenko, Andrew Cowie, Kathryn Tunyasuvunakool, R. D. Jain, Ellen Clancy, Pushmeet Kohli, John M. Jumper, Demis Hassabis - Show less +18 more

04 Oct 2021-bioRxiv

TL;DR: In this article, an AlphaFold model trained specifically for multimeric inputs of known stoichiometry was proposed, which significantly increases the accuracy of predicted multimimeric interfaces over input-adapted single-chain AlphaFolds.

...read moreread less

Abstract: While the vast majority of well-structured single protein chains can now be predicted to high accuracy due to the recent AlphaFold [1] model, the prediction of multi-chain protein complexes remains a challenge in many cases. In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold-Multimer, significantly increases accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates (introduced in [2]) we achieve at least medium accuracy (DockQ [3] [≥] 0.49) on 14 targets and high accuracy (DockQ [≥] 0.8) on 6 targets, compared to 9 targets of at least medium accuracy and 4 of high accuracy for the previous state of the art system (an AlphaFold-based system from [2]). We also predict structures for a large dataset of 4,433 recent protein complexes, from which we score all non-redundant interfaces with low template identity. For heteromeric interfaces we successfully predict the interface (DockQ [≥] 0.23) in 67% of cases, and produce high accuracy predictions (DockQ [≥] 0.8) in 23% of cases, an improvement of +25 and +11 percentage points over the flexible linker modification of AlphaFold [4] respectively. For homomeric interfaces we successfully predict the interface in 69% of cases, and produce high accuracy predictions in 34% of cases, an improvement of +5 percentage points in both instances.

...read moreread less

1,023 citations

Journal Article•DOI•

Critical assessment of methods of protein structure prediction (CASP)-Round XIV.

[...]

Andriy Kryshtafovych¹, Torsten Schwede², Maya Topf³, Krzysztof Fidelis¹, John Moult⁴ - Show less +1 more•Institutions (4)

University of California, Davis¹, Swiss Institute of Bioinformatics², Leibniz Institute for Neurobiology³, University of Maryland, College Park⁴

01 Dec 2021-Proteins

TL;DR: In the most recent Critical Assessment of Structure Prediction (CASP14), deep learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy as mentioned in this paper.

...read moreread less

Abstract: Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.

...read moreread less

175 citations

Journal Article•DOI•

The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest

[...]

Damian Szklarczyk, Rebecca Kirsch, Mikaela Koutrouli, Katerina C. Nastou, Farrokh Mehryary, Radja Hachilif, Annika L. Gable, Tao Fang, Nadezhda Tsankova Doncheva, Sampo Pyysalo, Peer Bork, Lars Juhl Jensen, Christian von Mering - Show less +9 more

12 Nov 2022-Nucleic Acids Research

TL;DR: STRING as mentioned in this paper collects and integrates protein-protein interactions, both physical interactions as well as functional associations, from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources.

...read moreread less

Abstract: Abstract Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

...read moreread less

127 citations

Journal Article•DOI•

Scaffolding protein functional sites using deep learning

[...]

Jue Wang, Sidney Lisanza, David Juergens, Doug Tischer, Joseph L. Watson, Karla M Castro, Robert J. Ragotte, Amijai Saragovi, Lukas F. Milles, Minkyung Baek, Ivan Anishchenko, Wei Yang, Derrick R. Hicks, Marc Expòsit, Thomas Schlichthaerle, Jung Ho Chun, Justas Dauparas, N. Bennett, Basile I. M. Wicky, Andrew G. Muenks, Frank DiMaio, Bruno E. Correia, Sergey Ovchinnikov, David Baker - Show less +20 more

21 Jul 2022-Science

TL;DR: Wang et al. as mentioned in this paper proposed two deep learning methods to design proteins that contain prespecified functional sites, which can enable the scaffolding of desired functional residues within a well-folded designed protein.

...read moreread less

Abstract: The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, “constrained hallucination,” optimizes sequences such that their predicted structures contain the desired functional site. The second approach, “inpainting,” starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests. Description Designing around function Protein design has had success in finding sequences that fold into a desired conformation, but designing functional proteins remains challenging. Wang et al. describe two deep-learning methods to design proteins that contain prespecified functional sites. In the first, they found sequences predicted to fold into stable structures that contain the functional site. In the second, they retrained a structure prediction network to recover the sequence and full structure of a protein given only the functional site. The authors demonstrate their methods by designing proteins containing a variety of functional motifs. —VV Deep-learning methods enable the scaffolding of desired functional residues within a well-folded designed protein.

...read moreread less

118 citations

Posted Content•DOI•

Learning inverse folding from millions of predicted structures

[...]

Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, Alexander Rives - Show less +4 more

06 Sep 2022-bioRxiv

TL;DR: A sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods.

...read moreread less

Abstract: We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.

...read moreread less

109 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

Collapse