scispace - formally typeset
Search or ask a question
Author

Borbála Hajdu-Soltész

Bio: Borbála Hajdu-Soltész is an academic researcher from Eötvös Loránd University. The author has contributed to research in topics: Protein degradation & Protein domain. The author has an hindex of 5, co-authored 7 publications receiving 157 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website are reported.
Abstract: The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.

159 citations

Journal ArticleDOI
TL;DR: It is argued that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements.
Abstract: There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.

72 citations

Journal ArticleDOI
04 Mar 2021
TL;DR: In this article, the authors used an integrative computational approach to explore the direct role of intrinsically disordered protein regions driving cancer and found that around 20% of cancer drivers are primarily targeted through a disordered region.
Abstract: Many proteins contain intrinsically disordered regions (IDRs) which carry out important functions without relying on a single well-defined conformation. IDRs are increasingly recognized as critical elements of regulatory networks and have been also associated with cancer. However, it is unknown whether mutations targeting IDRs represent a distinct class of driver events associated with specific molecular and system-level properties, cancer types and treatment options. Here, we used an integrative computational approach to explore the direct role of intrinsically disordered protein regions driving cancer. We showed that around 20% of cancer drivers are primarily targeted through a disordered region. These IDRs can function in multiple ways which are distinct from the functional mechanisms of ordered drivers. Disordered drivers play a central role in context-dependent interaction networks and are enriched in specific biological processes such as transcription, gene expression regulation and protein degradation. Furthermore, their modulation represents an alternative mechanism for the emergence of all known cancer hallmarks. Importantly, in certain cancer patients, mutations of disordered drivers represent key driving events. However, treatment options for such patients are currently severely limited. The presented study highlights a largely overlooked class of cancer drivers associated with specific cancer types that need novel therapeutic options.

20 citations

Journal ArticleDOI
TL;DR: A novel bioinformatic filtering protocol is established to efficiently explore interaction network of a hub protein, a ubiquitous eukaryotic hub protein that has been suggested to be involved in motor-related functions as well as promoting the dimerization of various proteins by recognizing linear motifs in its partners.
Abstract: Protein-protein interactions (PPIs) formed between short linear motifs and globular domains play important roles in many regulatory and signaling processes but are highly underrepresented in current protein-protein interaction databases. These types of interactions are usually characterized by a specific binding motif that captures the key amino acids shared among the interaction partners. However, the computational proteome-level identification of interaction partners based on the known motif is hindered by the huge number of randomly occurring matches from which biologically relevant motif hits need to be extracted. In this work, we established a novel bioinformatic filtering protocol to efficiently explore interaction network of a hub protein. We introduced a novel measure that enabled the optimization of the elements and parameter settings of the pipeline which was built from multiple sequence-based prediction methods. In addition, data collected from PPI databases and evolutionary analyses were also incorporated to further increase the biological relevance of the identified motif hits. The approach was applied to the dynein light chain LC8, a ubiquitous eukaryotic hub protein that has been suggested to be involved in motor-related functions as well as promoting the dimerization of various proteins by recognizing linear motifs in its partners. From the list of putative binding motifs collected by our protocol, several novel peptides were experimentally verified to bind LC8. Altogether 71 potential new motif instances were identified. The expanded list of LC8 binding partners revealed the evolutionary plasticity of binding partners despite the highly conserved binding interface. In addition, it also highlighted a novel, conserved function of LC8 in the upstream regulation of the Hippo signaling pathway. Beyond the LC8 system, our work also provides general guidelines that can be applied to explore the interaction network of other linear motif binding proteins or protein domains.

18 citations

Posted ContentDOI
14 Dec 2020-bioRxiv
TL;DR: This study used an integrative computational approach to explore the direct role of intrinsically disordered proteins/protein regions (IDPs/IDRs) driving cancer and showed that around 20% of cancer drivers are primarily targeted through a disordered region.
Abstract: Many proteins contain intrinsically disordered regions (IDRs) which carry out important functions without relying on a single well-defined conformation. IDRs are increasingly recognized as critical elements of regulatory networks and have been also associated with cancer. However, it is unknown whether mutations targeting IDRs represent a distinct class of driver events associated with specific molecular and system-level properties, cancer types and treatment options. Here, we used an integrative computational approach to explore the direct role of intrinsically disordered proteins/protein regions (IDPs/IDRs) driving cancer. We showed that around 20% of cancer drivers are primarily targeted through a disordered region. The detailed analysis of these IDRs revealed that they can function in multiple ways that are distinct from the functional mechanisms of ordered drivers. Disordered drivers play a central role in context-dependent interaction networks and are enriched in specific biological processes such as transcription, gene expression regulation and protein degradation. Furthermore, their modulation represents an alternative mechanism for the emergence of all known cancer hallmarks independently of the modulation of globular proteins. Disordered drivers are also highly relevant at the sample level, and their mutations can represent the key driving event in certain individual cancer patients. However, treatment options for such patients are currently severely limited. The presented study highlights a largely overlooked class of cancer drivers associated with specific cancer types that need novel therapeutic options.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A combined web interface that allows to generate energy estimation based predictions for ordered and disordered residues by IUPred2 and for disordered binding regions by ANCHOR2 is presented and the updated web server retains the robustness of the original programs but offers several new features.
Abstract: The structural states of proteins include ordered globular domains as well as intrinsically disordered protein regions that exist as highly flexible conformational ensembles in isolation. Various computational tools have been developed to discriminate ordered and disordered segments based on the amino acid sequence. However, properties of IDRs can also depend on various conditions, including binding to globular protein partners or environmental factors, such as redox potential. These cases provide further challenges for the computational characterization of disordered segments. In this work we present IUPred2A, a combined web interface that allows to generate energy estimation based predictions for ordered and disordered residues by IUPred2 and for disordered binding regions by ANCHOR2. The updated web server retains the robustness of the original programs but offers several new features. While only minor bug fixes are implemented for IUPred, the next version of ANCHOR is significantly improved through a new architecture and parameters optimized on novel datasets. In addition, redox-sensitive regions can also be highlighted through a novel experimental feature. The web server offers graphical and text outputs, a RESTful interface, access to software download and extensive help, and can be accessed at a new location: http://iupred2a.elte.hu.

997 citations

Journal ArticleDOI
TL;DR: Detailed instructions on how to use IUPred2A, one of the most widely used tools for the prediction of disordered regions/proteins or conditionally disordered segments, are presented and examples of how the predictions can be interpreted in different contexts are provided.
Abstract: IUPred2A is a combined prediction tool designed to discover intrinsically disordered or conditionally disordered proteins and protein regions. Intrinsically disordered regions exist without a well-defined three-dimensional structure in isolation but carry out important biological functions. Over the years, various prediction methods have been developed to characterize disordered regions. The existence of disordered segments can also be dependent on different factors such as binding partners or environmental traits like pH or redox potential, and recognizing such regions represents additional computational challenges. In this article, we present detailed instructions on how to use IUPred2A, one of the most widely used tools for the prediction of disordered regions/proteins or conditionally disordered segments, and provide examples of how the predictions can be interpreted in different contexts. © 2020 The Authors. Basic Protocol 1: Analyzing disorder propensity with IUPred2A online Basic Protocol 2: Analyzing disordered binding regions using ANCHOR2 Support Protocol 1: Interpretation of the results Basic Protocol 3: Analyzing redox-sensitive disordered regions Support Protocol 2: Download options Support Protocol 3: REST API for programmatic purposes Basic Protocol 4: Using IUPred2A locally.

202 citations

Journal ArticleDOI
TL;DR: A review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses, to raise the awareness level within the community of database users and alert scientists working in the underlying workflow of database creation.
Abstract: The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.

178 citations

Journal ArticleDOI
TL;DR: The authors reviewed similarities and differences among four main proteins, α-synuclein, FUS, tau, and TDP-43, which are found aggregated in different diseases and were independently shown to phase separate.

177 citations

Journal ArticleDOI
TL;DR: The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
Abstract: The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.

151 citations