scispace - formally typeset
Search or ask a question
Book ChapterDOI

Functions of intrinsically disordered proteins through evolutionary lenses.

TL;DR: In this article, the authors give an overview of the different types of evolutionary behavior of disordered proteins and associated functions in normal and disease settings in both disease and normal settings. But, different evolutionary rules apply for the group of intrinsically disordered regions and proteins (IDR/IDPs) that exist as an ensemble of fluctuating conformations.
Abstract: Protein sequences are the result of an evolutionary process that involves the balancing act of experimenting with novel mutations and selecting out those that have an undesirable functional outcome. In the case of globular proteins, the function relies on a well-defined conformation, therefore, there is a strong evolutionary pressure to preserve the structure. However, different evolutionary rules might apply for the group of intrinsically disordered regions and proteins (IDR/IDPs) that exist as an ensemble of fluctuating conformations. The function of IDRs can directly originate from their disordered state or arise through different types of molecular recognition processes. There is an amazing variety of ways IDRs can carry out their functions, and this is also reflected in their evolutionary properties. In this chapter we give an overview of the different types of evolutionary behavior of disordered proteins and associated functions in normal and disease settings.
Citations
More filters
Journal ArticleDOI
TL;DR: This study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.
Abstract: All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.

6 citations

Journal ArticleDOI
TL;DR: The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences as mentioned in this paper .

4 citations

Journal ArticleDOI
TL;DR: A comparative phylogenetic study shows that CtBP is a bilaterian innovation whose CTD of about 100 residues is present in almost all orthologs, and highlights the rich regulatory potential of this previously unstudied domain of a central transcriptional regulator.
Abstract: Evolution of sequence-specific transcription factors clearly drives lineage-specific innovations, but less is known about how changes in the central transcriptional machinery may contribute to evolutionary transformations. In particular, transcriptional regulators are rich in intrinsically disordered regions that appear to be magnets for evolutionary innovation. The C-terminal Binding Protein (CtBP) is a transcriptional corepressor derived from an ancestral lineage of alpha hydroxyacid dehydrogenases; it is found in mammals and invertebrates, and features a core NAD-binding domain as well as an unstructured C-terminus (CTD) of unknown function. CtBP can act on promoters and enhancers to repress transcription through chromatin-linked mechanisms. Our comparative phylogenetic study shows that CtBP is a bilaterian innovation whose CTD of about 100 residues is present in almost all orthologs. CtBP CTDs contain conserved blocks of residues and retain a predicted disordered property, despite having variations in the primary sequence. Interestingly, the structure of the C-terminus has undergone radical transformation independently in certain lineages including flatworms and nematodes. Also contributing to CTD diversity is the production of myriad alternative RNA splicing products, including the production of “short” tailless forms of CtBP in Drosophila. Additional diversity stems from multiple gene duplications in vertebrates, where up to five CtBP orthologs have been observed. Vertebrate lineages show fewer major modifications in the unstructured CTD, possibly because gene regulatory constraints of the vertebrate body plan place specific constraints on this domain. Our study highlights the rich regulatory potential of this previously unstudied domain of a central transcriptional regulator.

3 citations

Journal ArticleDOI
01 Dec 2022-Biology
TL;DR: In this article , the second largest genus in the family Crassulaceae, Crassula L. is the second most common genus in angiosperms, but variable in size, gene content, and evolutionary rates of genes.
Abstract: Simple Summary Plastids are semi-autonomous plant organelles which play critical roles in photosynthesis, stress response, and storage. The plastid genomes (plastomes) in angiosperms are relatively conserved in quadripartite structure, but variable in size, gene content, and evolutionary rates of genes. The genus Crassula L. is the second-largest genus in the family Crassulaceae J.St.-Hil, that significantly contributes to the diversity of Crassulaceae. However, few studies have focused on the evolution of plastomes within Crassula. In the present study, we sequenced ten plastomes of Crassula: C. alstonii Marloth, C. columella Marloth & Schönland, C. dejecta Jacq., C. deltoidei Thunb., C. expansa subsp. fragilis (Baker) Toelken, C. mesembrianthemopsis Dinter, C. mesembryanthoides (Haw.) D.Dietr., C. socialis Schönland, C. tecta Thunb., and C. volkensii Engl. Through comparative studies, we found Crassula plastomes have unique codon usage and aversion patterns within Crassulaceae. In addition, genomic features, evolutionary rates, and phylogenetic implications were analyzed using plastome data. Our findings will not only reveal new insights into the plastome evolution of Crassulaceae, but also provide potential molecular markers for DNA barcoding. Abstract The genus Crassula is the second-largest genus in the family Crassulaceae, with about 200 species. As an acknowledged super-barcode, plastomes have been extensively utilized for plant evolutionary studies. Here, we first report 10 new plastomes of Crassula. We further focused on the structural characterizations, codon usage, aversion patterns, and evolutionary rates of plastomes. The IR junction patterns—IRb had 110 bp expansion to rps19—were conservative among Crassula species. Interestingly, we found the codon usage patterns of matK gene in Crassula species are unique among Crassulaceae species with elevated ENC values. Furthermore, subgenus Crassula species have specific GC-biases in the matK gene. In addition, the codon aversion motifs from matK, pafI, and rpl22 contained phylogenetic implications within Crassula. The evolutionary rates analyses indicated all plastid genes of Crassulaceae were under the purifying selection. Among plastid genes, ycf1 and ycf2 were the most rapidly evolving genes, whereas psaC was the most conserved gene. Additionally, our phylogenetic analyses strongly supported that Crassula is sister to all other Crassulaceae species. Our findings will be useful for further evolutionary studies within the Crassula and Crassulaceae.

2 citations

Journal ArticleDOI
TL;DR: In this paper , the authors investigate the correlation between Arabidopsis pentatricopeptide repeat (PPR) proteins and their respective DYW domain functions in RNA editing of accD-C794.
References
More filters
Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

12,003 citations

Journal ArticleDOI
TL;DR: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML), which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses.
Abstract: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).

10,773 citations

Journal ArticleDOI
TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
Abstract: Motivation: Robust, large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterise many millions of sequences. Here we describe a new Java-based architecture for the widely-used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete re-implementation of the software framework, resulting in a flexible and stable system that is able to utilise both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the (open) source code is hosted at Google Code. Availability: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk

5,434 citations