scispace - formally typeset
Search or ask a question
Author

Upasana Roy

Other affiliations: Stony Brook University
Bio: Upasana Roy is an academic researcher from Columbia University. The author has contributed to research in topics: Homologous recombination & RAD51. The author has an hindex of 3, co-authored 10 publications receiving 39 citations. Previous affiliations of Upasana Roy include Stony Brook University.

Papers
More filters
Journal ArticleDOI
11 Nov 2021-Science
TL;DR: The structures of many eukaryotic protein complexes are unknown, and there are likely many protein-protein interactions not yet identified as mentioned in this paper, but these structures play critical roles in biology.
Abstract: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...

215 citations

Journal ArticleDOI
TL;DR: In this article, the authors used single-molecule imaging to reveal that the Rad51 paralog complex Rad55-Rad57 promotes assembly of Rad51 recombinase filament through transient interactions, providing evidence that it acts like a classical molecular chaperone.

37 citations

Posted ContentDOI
Upasana Roy1
12 Feb 2020-bioRxiv
TL;DR: Single–molecule imaging is used to reveal that the Saccharomyces cerevisiae Rad51 paralog complex Rad55–Rad57 promotes the assembly of Rad51 recombinase filaments through transient interactions, providing evidence that it acts as a classical molecular chaperone.
Abstract: Summary Homologous recombination (HR) is essential for the maintenance of genome integrity. Rad51 paralogs fulfill a conserved, but undefined role in HR, and their mutations are associated with increased cancer risk in humans. Here, we use single–molecule imaging to reveal that the Saccharomyces cerevisiae Rad51 paralog complex Rad55–Rad57 promotes the assembly of Rad51 recombinase filaments through transient interactions, providing evidence that it acts as a classical molecular chaperone. Srs2 is an ATP–dependent anti–recombinase that downregulates HR by actively dismantling Rad51 filaments. Contrary to the current model, we find that Rad55– Rad57 does not physically block the movement of Srs2. Instead, Rad55–Rad57 promotes rapid re– assembly of Rad51 filaments after their disruption by Srs2. Our findings support a model in which Rad51 is in flux between free and ssDNA–bound states, the rate of which is dynamically controlled though the opposing actions of Rad55–Rad57 and Srs2.

17 citations

Journal ArticleDOI
TL;DR: No stalling by Rev1–Pol ζ directly past the ICL was observed, suggesting that the proposed function of Pol ζ as an extender DNA polymerase is also required for ICL repair.
Abstract: DNA polymerase ζ (Pol ζ) and Rev1 are essential for the repair of DNA interstrand crosslink (ICL) damage. We have used yeast DNA polymerases η, ζ and Rev1 to study translesion synthesis (TLS) past a nitrogen mustard-based interstrand crosslink (ICL) with an 8-atom linker between the crosslinked bases. The Rev1-Pol ζ complex was most efficient in complete bypass synthesis, by 2-3 fold, compared to Pol ζ alone or Pol η. Rev1 protein, but not its catalytic activity, was required for efficient TLS. A dCMP residue was faithfully inserted across the ICL-G by Pol η, Pol ζ, and Rev1-Pol ζ. Rev1-Pol ζ, and particularly Pol ζ alone showed a tendency to stall before the ICL, whereas Pol η stalled just after insertion across the ICL. The stalling of Pol η directly past the ICL is attributed to its autoinhibitory activity, caused by elongation of the short ICL-unhooked oligonucleotide (a six-mer in our study) by Pol η providing a barrier to further elongation of the correct primer. No stalling by Rev1-Pol ζ directly past the ICL was observed, suggesting that the proposed function of Pol ζ as an extender DNA polymerase is also required for ICL repair.

11 citations

Posted ContentDOI
30 Sep 2021-bioRxiv
TL;DR: In this article, a combination of RoseTTAFold and AlphaFold is used to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies.
Abstract: Protein-protein interactions play critical roles in biology, but despite decades of effort, the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions that have not yet been identified. Here, we take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes, as represented within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies with two to five components. Comparison to existing interaction and structural data suggests that these predictions are likely to be quite accurate. We provide structure models spanning almost all key processes in Eukaryotic cells for 104 protein assemblies which have not been previously identified, and 608 which have not been structurally characterized. One-sentence summary We take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes.

11 citations


Cited by
More filters
Posted ContentDOI
04 Oct 2021-bioRxiv
TL;DR: In this article, an AlphaFold model trained specifically for multimeric inputs of known stoichiometry was proposed, which significantly increases the accuracy of predicted multimimeric interfaces over input-adapted single-chain AlphaFolds.
Abstract: While the vast majority of well-structured single protein chains can now be predicted to high accuracy due to the recent AlphaFold [1] model, the prediction of multi-chain protein complexes remains a challenge in many cases. In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold-Multimer, significantly increases accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates (introduced in [2]) we achieve at least medium accuracy (DockQ [3] [≥] 0.49) on 14 targets and high accuracy (DockQ [≥] 0.8) on 6 targets, compared to 9 targets of at least medium accuracy and 4 of high accuracy for the previous state of the art system (an AlphaFold-based system from [2]). We also predict structures for a large dataset of 4,433 recent protein complexes, from which we score all non-redundant interfaces with low template identity. For heteromeric interfaces we successfully predict the interface (DockQ [≥] 0.23) in 67% of cases, and produce high accuracy predictions (DockQ [≥] 0.8) in 23% of cases, an improvement of +25 and +11 percentage points over the flexible linker modification of AlphaFold [4] respectively. For homomeric interfaces we successfully predict the interface in 69% of cases, and produce high accuracy predictions in 34% of cases, an improvement of +5 percentage points in both instances.

1,023 citations

Journal ArticleDOI
11 Nov 2021-Science
TL;DR: The structures of many eukaryotic protein complexes are unknown, and there are likely many protein-protein interactions not yet identified as mentioned in this paper, but these structures play critical roles in biology.
Abstract: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...

215 citations

Journal ArticleDOI
TL;DR: STRING as mentioned in this paper collects and integrates protein-protein interactions, both physical interactions as well as functional associations, from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources.
Abstract: Abstract Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

127 citations

Journal ArticleDOI
21 Jul 2022-Science
TL;DR: Wang et al. as mentioned in this paper proposed two deep learning methods to design proteins that contain prespecified functional sites, which can enable the scaffolding of desired functional residues within a well-folded designed protein.
Abstract: The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, “constrained hallucination,” optimizes sequences such that their predicted structures contain the desired functional site. The second approach, “inpainting,” starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests. Description Designing around function Protein design has had success in finding sequences that fold into a desired conformation, but designing functional proteins remains challenging. Wang et al. describe two deep-learning methods to design proteins that contain prespecified functional sites. In the first, they found sequences predicted to fold into stable structures that contain the functional site. In the second, they retrained a structure prediction network to recover the sequence and full structure of a protein given only the functional site. The authors demonstrate their methods by designing proteins containing a variety of functional motifs. —VV Deep-learning methods enable the scaffolding of desired functional residues within a well-folded designed protein.

118 citations

Posted ContentDOI
06 Sep 2022-bioRxiv
TL;DR: A sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods.
Abstract: We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.

109 citations