scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Computed structures of core eukaryotic protein complexes.

TL;DR: The structures of many eukaryotic protein complexes are unknown, and there are likely many protein-protein interactions not yet identified as mentioned in this paper, but these structures play critical roles in biology.
Abstract: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...
Citations
More filters
Journal ArticleDOI
TL;DR: STRING as mentioned in this paper collects and integrates protein-protein interactions, both physical interactions as well as functional associations, from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources.
Abstract: Abstract Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

127 citations

Journal ArticleDOI
21 Jul 2022-Science
TL;DR: Wang et al. as mentioned in this paper proposed two deep learning methods to design proteins that contain prespecified functional sites, which can enable the scaffolding of desired functional residues within a well-folded designed protein.
Abstract: The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, “constrained hallucination,” optimizes sequences such that their predicted structures contain the desired functional site. The second approach, “inpainting,” starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests. Description Designing around function Protein design has had success in finding sequences that fold into a desired conformation, but designing functional proteins remains challenging. Wang et al. describe two deep-learning methods to design proteins that contain prespecified functional sites. In the first, they found sequences predicted to fold into stable structures that contain the functional site. In the second, they retrained a structure prediction network to recover the sequence and full structure of a protein given only the functional site. The authors demonstrate their methods by designing proteins containing a variety of functional motifs. —VV Deep-learning methods enable the scaffolding of desired functional residues within a well-folded designed protein.

118 citations

Posted ContentDOI
06 Sep 2022-bioRxiv
TL;DR: A sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods.
Abstract: We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.

109 citations

Journal ArticleDOI
10 Jun 2022-Science
TL;DR: This study used artificial intelligence (AI)–based prediction to generate an extensive repertoire of structural models of human NUPs and their subcomplexes that cover various domains and interfaces that so far remained structurally uncharacterized and increases the structural coverage of the human NPC scaffold by about twofold.
Abstract: Description INTRODUCTION The eukaryotic nucleus protects the genome and is enclosed by the two membranes of the nuclear envelope. Nuclear pore complexes (NPCs) perforate the nuclear envelope to facilitate nucleocytoplasmic transport. With a molecular weight of ∼120 MDa, the human NPC is one of the largest protein complexes. Its ~1000 proteins are taken in multiple copies from a set of about 30 distinct nucleoporins (NUPs). They can be roughly categorized into two classes. Scaffold NUPs contain folded domains and form a cylindrical scaffold architecture around a central channel. Intrinsically disordered NUPs line the scaffold and extend into the central channel, where they interact with cargo complexes. The NPC architecture is highly dynamic. It responds to changes in nuclear envelope tension with conformational breathing that manifests in dilation and constriction movements. Elucidating the scaffold architecture, ultimately at atomic resolution, will be important for gaining a more precise understanding of NPC function and dynamics but imposes a substantial challenge for structural biologists. RATIONALE Considerable progress has been made toward this goal by a joint effort in the field. A synergistic combination of complementary approaches has turned out to be critical. In situ structural biology techniques were used to reveal the overall layout of the NPC scaffold that defines the spatial reference for molecular modeling. High-resolution structures of many NUPs were determined in vitro. Proteomic analysis and extensive biochemical work unraveled the interaction network of NUPs. Integrative modeling has been used to combine the different types of data, resulting in a rough outline of the NPC scaffold. Previous structural models of the human NPC, however, were patchy and limited in accuracy owing to several challenges: (i) Many of the high-resolution structures of individual NUPs have been solved from distantly related species and, consequently, do not comprehensively cover their human counterparts. (ii) The scaffold is interconnected by a set of intrinsically disordered linker NUPs that are not straightforwardly accessible to common structural biology techniques. (iii) The NPC scaffold intimately embraces the fused inner and outer nuclear membranes in a distinctive topology and cannot be studied in isolation. (iv) The conformational dynamics of scaffold NUPs limits the resolution achievable in structure determination. RESULTS In this study, we used artificial intelligence (AI)–based prediction to generate an extensive repertoire of structural models of human NUPs and their subcomplexes. The resulting models cover various domains and interfaces that so far remained structurally uncharacterized. Benchmarking against previous and unpublished x-ray and cryo–electron microscopy structures revealed unprecedented accuracy. We obtained well-resolved cryo–electron tomographic maps of both the constricted and dilated conformational states of the human NPC. Using integrative modeling, we fitted the structural models of individual NUPs into the cryo–electron microscopy maps. We explicitly included several linker NUPs and traced their trajectory through the NPC scaffold. We elucidated in great detail how membrane-associated and transmembrane NUPs are distributed across the fusion topology of both nuclear membranes. The resulting architectural model increases the structural coverage of the human NPC scaffold by about twofold. We extensively validated our model against both earlier and new experimental data. The completeness of our model has enabled microsecond-long coarse-grained molecular dynamics simulations of the NPC scaffold within an explicit membrane environment and solvent. These simulations reveal that the NPC scaffold prevents the constriction of the otherwise stable double-membrane fusion pore to small diameters in the absence of membrane tension. CONCLUSION Our 70-MDa atomically resolved model covers >90% of the human NPC scaffold. It captures conformational changes that occur during dilation and constriction. It also reveals the precise anchoring sites for intrinsically disordered NUPs, the identification of which is a prerequisite for a complete and dynamic model of the NPC. Our study exemplifies how AI-based structure prediction may accelerate the elucidation of subcellular architecture at atomic resolution. A 70-MDa model of the human nuclear pore complex scaffold architecture. The structural model of the human NPC scaffold is shown for the constricted state as a cut-away view. High-resolution models are color coded according to nucleoporin subcomplex membership. The nuclear envelope is shown as a gray surface. Nuclear pore complexes (NPCs) mediate nucleocytoplasmic transport. Their intricate 120-megadalton architecture remains incompletely understood. Here, we report a 70-megadalton model of the human NPC scaffold with explicit membrane and in multiple conformational states. We combined artificial intelligence (AI)–based structure prediction with in situ and in cellulo cryo–electron tomography and integrative modeling. We show that linker nucleoporins spatially organize the scaffold within and across subcomplexes to establish the higher-order structure. Microsecond-long molecular dynamics simulations suggest that the scaffold is not required to stabilize the inner and outer nuclear membrane fusion but rather widens the central pore. Our work exemplifies how AI-based modeling can be integrated with in situ structural biology to understand subcellular architecture across spatial organization levels.

75 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: Two unusual extensions are presented: Multiscale, which adds the ability to visualize large‐scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales.
Abstract: The design, implementation, and capabilities of an extensible visualization system, UCSF Chimera, are discussed. Chimera is segmented into a core that provides basic services and visualization, and extensions that provide most higher level functionality. This architecture ensures that the extension mechanism satisfies the demands of outside developers who wish to incorporate new features. Two unusual extensions are presented: Multiscale, which adds the ability to visualize large-scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales. Other extensions include Multalign Viewer, for showing multiple sequence alignments and associated structures; ViewDock, for screening docked ligand orientations; Movie, for replaying molecular dynamics trajectories; and Volume Viewer, for display and analysis of volumetric data. A discussion of the usage of Chimera in real-world situations is given, along with anticipated future directions. Chimera includes full user documentation, is free to academic and nonprofit users, and is available for Microsoft Windows, Linux, Apple Mac OS X, SGI IRIX, and HP Tru64 Unix from http://www.cgl.ucsf.edu/chimera/.

35,698 citations

Journal ArticleDOI
15 Jul 2021-Nature
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

10,601 citations

Journal ArticleDOI
10 Feb 2000-Nature
TL;DR: Examination of large-scale yeast two-hybrid screens reveals interactions that place functionally unclassified proteins in a biological context, interactions between proteins involved in the same biological function, and interactions that link biological functions together into larger cellular processes.
Abstract: Two large-scale yeast two-hybrid screens were undertaken to identify protein-protein interactions between full-length open reading frames predicted from the Saccharomyces cerevisiae genome sequence. In one approach, we constructed a protein array of about 6,000 yeast transformants, with each transformant expressing one of the open reading frames as a fusion to an activation domain. This array was screened by a simple and automated procedure for 192 yeast proteins, with positive responses identified by their positions in the array. In a second approach, we pooled cells expressing one of about 6,000 activation domain fusions to generate a library. We used a high-throughput screening procedure to screen nearly all of the 6,000 predicted yeast proteins, expressed as Gal4 DNA-binding domain fusion proteins, against the library, and characterized positives by sequence analysis. These approaches resulted in the detection of 957 putative interactions involving 1,004 S. cerevisiae proteins. These data reveal interactions that place functionally unclassified proteins in a biological context, interactions between proteins involved in the same biological function, and interactions that link biological functions together into larger cellular processes. The results of these screens are shown here.

4,877 citations

Journal ArticleDOI
TL;DR: This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization of ENDscript 2 and ESPript 3 to handle a large number of data with reduced computation time.
Abstract: ENDscript 2 is a friendly Web server for extracting and rendering a comprehensive analysis of primary to quaternary protein structure information in an automated way. This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization. It takes advantage of the new version 3 of ESPript, our well-known sequence alignment renderer, improved to handle a large number of data with reduced computation time. From a single PDB entry or file, ENDscript produces high quality figures displaying multiple sequence alignment of proteins homologous to the query, colored according to residue conservation. Furthermore, the experimental secondary structure elements and a detailed set of relevant biophysical and structural data are depicted. All this information and more are now mapped on interactive 3D PyMOL representations. Thanks to its adaptive and rigorous algorithm, beginner to expert users can modify settings to fine-tune ENDscript to their needs. ENDscript has also been upgraded as an open platform for the visualization of multiple biochemical and structural data coming from external biotool Web servers, with both 2D and 3D representations. ENDscript 2 and ESPript 3 are freely available at http://endscript.ibcp.fr and http://espript.ibcp.fr, respectively.

4,722 citations