scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 2005"


Journal ArticleDOI
TL;DR: A novel microRNA quantification method has been developed using stem–loop RT followed by TaqMan PCR analysis, which enables fast, accurate and sensitive miRNA expression profiling and can identify and monitor potential biomarkers specific to tissues or diseases.
Abstract: A novel microRNA (miRNA) quantification method has been developed using stem–loop RT followed by TaqMan PCR analysis. Stem–loop RT primers are better than conventional ones in terms of RT efficiency and specificity. TaqMan miRNA assays are specific for mature miRNAs and discriminate among related miRNAs that differ by as little as one nucleotide. Furthermore, they are not affected by genomic DNA contamination. Precise quantification is achieved routinely with as little as 25 pg of total RNA for most miRNAs. In fact, the high sensitivity, specificity and precision of this method allows for direct analysis of a single cell without nucleic acid purification. Like standard TaqMan gene expression assays, TaqMan miRNA assays exhibit a dynamic range of seven orders of magnitude. Quantification of five miRNAs in seven mouse tissues showed variation from less than 10 to more than 30 000 copies per cell. This method enables fast, accurate and sensitive miRNA expression profiling and can identify and monitor potential biomarkers specific to tissues or diseases. Stem–loop RT–PCR can be used for the quantification of other small RNA molecules such as short interfering RNAs (siRNAs). Furthermore, the concept of stem–loop RT primer design could be applied in small RNA cloning and multiplex assays for better specificity and efficiency.

4,599 citations


Journal ArticleDOI
TL;DR: Improvement in accuracy was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here, which showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences.
Abstract: The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of ∼8 sequences with low similarity, the accuracy was improved (2–10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10−5–10−20) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.

4,528 citations


Journal ArticleDOI
TL;DR: HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs) and allows to search a wide choice of databases.
Abstract: HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at http://protevo.eb.tuebingen.mpg.de/hhpred.

3,347 citations


Journal ArticleDOI
TL;DR: Two freely available web servers for molecular docking that perform structure prediction of protein–protein and protein–small molecule complexes and the SymmDock method predicts the structure of a homomultimer with cyclic symmetry given theructure of the monomeric unit are described.
Abstract: Here, we describe two freely available web servers for molecular docking. The PatchDock method performs structure prediction of protein-protein and protein-small molecule complexes. The SymmDock method predicts the structure of a homomultimer with cyclic symmetry given the structure of the monomeric unit. The inputs to the servers are either protein PDB codes or uploaded protein structures. The services are available at http://bioinfo3d.cs.tau.ac.il. The methods behind the servers are very efficient, allowing large-scale docking experiments.

2,590 citations


Journal ArticleDOI
TL;DR: There exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB, which could be used to assist in model selection in blind protein structure predictions.
Abstract: We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is approximately 4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff <95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions. The TM-align program is freely downloadable at http://bioinformatics.buffalo.edu/TM-align.

2,582 citations


Journal ArticleDOI
TL;DR: InterProScan is a tool that combines different protein signature recognition methods from the InterPro consortium member databases into one resource and can be analysed for protein as well as DNA sequences.
Abstract: InterProScan [E. M. Zdobnov and R. Apweiler (2001) Bioinformatics, 17, 847-848] is a tool that combines different protein signature recognition methods from the InterPro [N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bradley, P. Bork, P. Bucher, L. Cerutti et al. (2005) Nucleic Acids Res., 33, D201-D205] consortium member databases into one resource. At the time of writing there are 10 distinct publicly available databases in the application. Protein as well as DNA sequences can be analysed. A web-based version is accessible for academic and commercial organizations from the EBI (http://www.ebi.ac.uk/InterProScan/). In addition, a standalone Perl version and a SOAP Web Service [J. Snell, D. Tidwell and P. Kulchenko (2001) Programming Web Services with SOAP, 1st edn. O'Reilly Publishers, Sebastopol, CA, http://www.w3.org/TR/soap/] are also available to the users. Various output formats are supported and include text tables, XML documents, as well as various graphs to help interpret the results.

2,520 citations


Journal ArticleDOI
TL;DR: The core functionality of FoldX, namely the calculation of the free energy of a macromolecule based on its high-resolution 3D structure, is now publicly available through a web server at FoldX.
Abstract: FoldX is an empirical force field that was developed for the rapid evaluation of the effect of mutations on the stability, folding and dynamics of proteins and nucleic acids. The core functionality of FoldX, namely the calculation of the free energy of a macromolecule based on its high-resolution 3D structure, is now publicly available through a web server at http://foldx.embl.de/. The current release allows the calculation of the stability of a protein, calculation of the positions of the protons and the prediction of water bridges, prediction of metal binding sites and the analysis of the free energy of complex formation. Alanine scanning, the systematic truncation of side chains to alanine, is also included. In addition, some reporting functions have been added, and it is now possible to print both the atomic interaction networks that constitute the protein, print the structural and energetic details of the interactions per atom or per residue, as well as generate a general quality report of the pdb structure. This core functionality will be further extended as more FoldX applications are developed.

2,076 citations


Journal ArticleDOI
TL;DR: Online implementations of tRNAscan-SE, snoscan and snoGPS are described that make these RNA detection tools accessible to a wider range of research biologists.
Abstract: Transfer RNAs (tRNAs) and small nucleolar RNAs (snoRNAs) are two of the largest classes of non-protein-coding RNAs. Conventional gene finders that detect protein-coding genes do not find tRNA and snoRNA genes because they lack the codon structure and statistical signatures of protein-coding genes. Previously, we developed tRNAscan-SE, snoscan and snoGPS for the detection of tRNAs, methylation-guide snoRNAs and pseudouridylation-guide snoRNAs, respectively. tRNAscan-SE is routinely applied to completed genomes, resulting in the identification of thousands of tRNA genes. Snoscan has successfully detected methylation-guide snoRNAs in a variety of eukaryotes and archaea, and snoGPS has identified novel pseudouridylation-guide snoRNAs in yeast and mammals. Although these programs have been quite successful at RNA gene detection, their use has been limited by the need to install and configure the software packages on UNIX workstations. Here, we describe online implementations of these RNA detection tools that make these programs accessible to a wider range of research biologists. The tRNAscan-SE, snoscan and snoGPS servers are available at http://lowelab.ucsc.edu/tRNAscan-SE/, http://lowelab.ucsc.edu/snoscan/ and http://lowelab.ucsc.edu/snoGPS/, respectively.

2,000 citations


Journal ArticleDOI
TL;DR: The subsystem approach is described, the first release of the growing library of populated subsystems is offered, and the SEED is the first annotation environment that supports this model of annotation.
Abstract: The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

1,896 citations


Journal ArticleDOI
TL;DR: This work reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information, and demonstrates that the original Affymetrix probe set definitions are inaccurate.
Abstract: Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge The resultant informatics problems have a profound impact on analysis and interpretation the data Here, we address these critical issues and offer a solution We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information Comparing analysis results between the original and the redefined probe sets reveals ∼30–50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions

1,849 citations


Journal ArticleDOI
TL;DR: A web-based integrated data mining system to help biologists in exploring large sets of genes, WebGestalt, has been developed and 48 gene sets with genes over-represented in various human tissue types are generated.
Abstract: High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from ‘single genes’ to ‘gene sets’. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php.

Journal ArticleDOI
TL;DR: There is a significant repression of quadruplexes in the coding strand of exonic regions, which suggests that quadruplex-forming patterns are disfavoured in sequences that will form RNA.
Abstract: Guanine-rich DNA sequences of a particular form have the ability to fold into four-stranded structures called G-quadruplexes. In this paper, we present a working rule to predict which primary sequences can form this structure, and describe a search algorithm to identify such sequences in genomic DNA. We count the number of quadruplexes found in the human genome and compare that with the figure predicted by modelling DNA as a Bernoulli stream or as a Markov chain, using windows of various sizes. We demonstrate that the distribution of loop lengths is significantly different from what would be expected in a random case, providing an indication of the number of potentially relevant quadruplex-forming sequences. In particular, we show that there is a significant repression of quadruplexes in the coding strand of exonic regions, which suggests that quadruplex-forming patterns are disfavoured in sequences that will form RNA.

Journal ArticleDOI
TL;DR: I-Mutant2.0 is introduced as a unique and valuable helper for protein design, even when the protein structure is not yet known with atomic resolution.
Abstract: I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodynamic experimental data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, respectively. When predicting ΔΔG values associated with mutations, the correlation of predicted with expected/experimental values is 0.71 (with a standard error of 1.30 kcal/mol) and 0.62 (with a standard error of 1.45 kcal/mol) when structural or sequence information are respectively adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with atomic resolution. Availability: http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi.

Journal ArticleDOI
TL;DR: It is concluded that miRNA-mediated regulation has a complexity of cellular outcomes and that miRNAs can be mediators of regulation of cell growth and apoptosis pathways.
Abstract: Of the over 200 identified mammalian microRNAs (miRNAs), only a few have known biological activity. To gain a better understanding of the role that miRNAs play in specific cellular pathways, we utilized antisense molecules to inhibit miRNA activity. We used miRNA inhibitors targeting miR-23, 21, 15a, 16 and 19a to test efficacy of antisense molecules in reducing miRNA activity on reporter genes bearing miRNAbinding sites. The miRNA inhibitors de-repressed reporter gene activity when a miRNA-binding site was cloned into its 3 0 -untranslated region. We employed a library of miRNA inhibitors to screen for miRNA involved in cell growth and apoptosis. In HeLa cells, we found that inhibition of miR-95, 124, 125, 133, 134, 144, 150, 152, 187, 190, 191, 192, 193, 204, 211, 218, 220, 296 and 299 caused a decrease in cell growth and that inhibition of miR-21 and miR-24 had a profound increase in cell growth. On the other hand, inhibition of miR-7, 19a, 23, 24, 134, 140, 150, 192 and 193 downregulated cell growth, and miR-107, 132, 155, 181, 191, 194, 203, 215 and 301 increased cell growth in lung carcinoma cells, A549. We also identified miRNA that when inhibited increased the level of apoptosis (miR1d, 7, 148, 204, 210, 216 and 296) and one miRNA that decreased apoptosis (miR-214) in HeLa cells. From these screens, we conclude that miRNA-mediated regulation has a complexity of cellular outcomes and that miRNAs can be mediators of regulation of cell growth and apoptosis pathways.

Journal ArticleDOI
TL;DR: PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences.
Abstract: PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time. The server and its documentation are available at http://atgc.lirmm.fr/phyml.

Journal ArticleDOI
TL;DR: This new version of ConSurf includes an empirical Bayesian method for scoring conservation, which is more accurate than the maximum-likelihood method that was used in the earlier release and includes a measure of confidence for the inferred amino acid conservation scores.
Abstract: Key amino acid positions that are important for maintaining the 3D structure of a protein and/or its function(s), e.g. catalytic activity, binding to ligand, DNA or other proteins, are often under strong evolutionary constraints. Thus, the biological importance of a residue often correlates with its level of evolutionary conservation within the protein family. ConSurf (http://consurf.tau.ac.il/) is a web-based tool that automatically calculates evolutionary conservation scores and maps them on protein structures via a user-friendly interface. Structurally and functionally important regions in the protein typically appear as patches of evolutionarily conserved residues that are spatially close to each other. We present here version 3.0 of ConSurf. This new version includes an empirical Bayesian method for scoring conservation, which is more accurate than the maximum-likelihood method that was used in the earlier release. Various additional steps in the calculation can now be controlled by a number of advanced options, thus further improving the accuracy of the calculation. Moreover, ConSurf version 3.0 also includes a measure of confidence for the inferred amino acid conservation scores.

Journal ArticleDOI
TL;DR: Three new recombineering strains are described that allow bacterial artificial chromosomes (BACs) to be modified using galK positive/negative selection, and it is shown how galK selection can be used to rapidly introduce point mutations, deletions and loxP sites into BAC DNA and thus facilitate functional studies of SNP and/or disease-causing point mutations.
Abstract: Recombineering allows DNA cloned in Escherichia coli to be modified via lambda (l) Red-mediated homologous recombination, obviating the need for restriction enzymes and DNA ligases to modify DNA Here, we describe the construction of three new recombineering strains (SW102, SW105 and SW106) that allow bacterial artificial chromosomes (BACs) to be modified using galK positive/negative selection This two-step selection procedure allows DNA to be modified without introducing an unwanted selectable marker at the modification site All three strains contain an otherwise complete galactose operon, except for a precise deletion of the galK gene, and a defective temperature-sensitive l prophage that makes recombineering possible SW105 and SW106 cells in addition carry L-arabinose-inducible Cre or Flp genes, respectively The galK function can be selected both for and against This feature greatly reduces the background seen in other negative-selection schemes, and galK selection is considerably more efficient than other related selection methods published We also show how galK selection can be used to rapidly introduce point mutations, deletions and loxP sites into BAC DNA and thus facilitate functional studies of SNP and/or disease-causing point mutations, the identification of long-range regulatory elements and the construction of conditional targeting vectors

Journal ArticleDOI
TL;DR: The web server provides access to a tool that automates estimates of pKs as well as other related characteristics of biomolecules such as isoelectric points, titration curves and energies of protonation microstates, and is intended for a broad community of biochemists, molecular modelers, structural biologists and drug designers.
Abstract: The structure and function of macromolecules depend critically on the ionization (protonation) states of their acidic and basic groups. A number of existing practical methods predict protonation equilibrium pK constants of macromolecules based upon their atomic resolution Protein Data Bank (PDB) structures; the calculations are often performed within the framework of the continuum electrostatics model. Unfortunately, these methodologies are complex, involve multiple steps and require considerable investment of effort. Our web server http://biophysics.cs.vt.edu/H++ provides access to a tool that automates this process, allowing both experts and novices to quickly obtain estimates of pKs as well as other related characteristics of biomolecules such as isoelectric points, titration curves and energies of protonation microstates. Protons are added to the input structure according to the calculated ionization states of its titratable groups at the user-specified pH; the output is in the PQR (PDB + charges + radii) format. In addition, corresponding coordinate and topology files are generated in the format supported by the molecular modeling package AMBER. The server is intended for a broad community of biochemists, molecular modelers, structural biologists and drug designers; it can also be used as an educational tool in biochemistry courses.

Journal ArticleDOI
TL;DR: A WWW server for AUGUSTUS, a software for gene prediction in eukaryotic genomic sequences that is based on a generalized hidden Markov model, a probabilistic model of a sequence and its gene structure, is presented.
Abstract: We present a WWW server for AUGUSTUS, a software for gene prediction in eukaryotic genomic sequences that is based on a generalized hidden Markov model, a probabilistic model of a sequence and its gene structure. The web server allows the user to impose constraints on the predicted gene structure. A constraint can specify the position of a splice site, a translation initiation site or a stop codon. Furthermore, it is possible to specify the position of known exons and intervals that are known to be exonic or intronic sequence. The number of constraints is arbitrary and constraints can be combined in order to pin down larger parts of the predicted gene structure. The result then is the most likely gene structure that complies with all given user constraints, if such a gene structure exists. The specification of constraints is useful when part of the gene structure is known, e.g. by expressed sequence tag or protein sequence alignments, or if the user wants to change the default prediction. The web interface and the downloadable stand-alone program are available free of charge at http://augustus.gobics.de/submission.

Journal ArticleDOI
TL;DR: Findings indicate random loss rather than specific maintenance of methylation in Dnmt[1kd,3a−/−,3b−/ −] cells, and suggest that random shotgun bisulfite sequencing can be scaled to a genome-wide approach.
Abstract: We describe a large-scale random approach termed reduced representation bisulfite sequencing (RRBS) for analyzing and comparing genomic methylation patterns. BglII restriction fragments were size-selected to 500-600 bp, equipped with adapters, treated with bisulfite, PCR amplified, cloned and sequenced. We constructed RRBS libraries from murine ES cells and from ES cells lacking DNA methyltransferases Dnmt3a and 3b and with knocked-down (kd) levels of Dnmt1 (Dnmt[1(kd),3a-/-,3b-/-]). Sequencing of 960 RRBS clones from Dnmt[1(kd),3a-/-,3b-/-] cells generated 343 kb of non-redundant bisulfite sequence covering 66212 cytosines in the genome. All but 38 cytosines had been converted to uracil indicating a conversion rate of >99.9%. Of the remaining cytosines 35 were found in CpG and 3 in CpT dinucleotides. Non-CpG methylation was >250-fold reduced compared with wild-type ES cells, consistent with a role for Dnmt3a and/or Dnmt3b in CpA and CpT methylation. Closer inspection revealed neither a consensus sequence around the methylated sites nor evidence for clustering of residual methylation in the genome. Our findings indicate random loss rather than specific maintenance of methylation in Dnmt[1(kd),3a-/-,3b-/-] cells. Near-complete bisulfite conversion and largely unbiased representation of RRBS libraries suggest that random shotgun bisulfite sequencing can be scaled to a genome-wide approach.

Journal ArticleDOI
TL;DR: An analysis of the known ARE-binding proteins (ARE-BP) with respect to their mRNA targets and the consequences of their binding to the mRNA is presented and several hypotheses that could unify the published data are presented and suggest avenues for future research.
Abstract: The control of mRNA stability is an important process that allows cells to not only limit, but also rapidly adjust, the expression of regulatory factors whose over expression may be detrimental to the host organism. Sequence elements rich in A and U nucleotides or AU-rich elements (AREs) have been known for many years to target mRNAs for rapid degradation. In this survey, after briefly summarizing the data on the sequence characteristics of AREs, we present an analysis of the known ARE-binding proteins (ARE-BP) with respect to their mRNA targets and the consequences of their binding to the mRNA. In this analysis, both the changes in mRNA stability and the lesser studied effects on translation are considered. This analysis highlights the multitude of mRNAs bound by one ARE-BP and conversely the large number of ARE-BP that associate with any particular ARE-containing mRNA. This situation is discussed with respect to functional redundancies or antagonisms. The potential relationship between mRNA stability and translation is also discussed. Finally, we present several hypotheses that could unify the published data and suggest avenues for future research.

Journal ArticleDOI
TL;DR: A novel method for the adaptation of target gene codon usage to most sequenced prokaryotes and selected eukaryotic gene expression hosts was developed to improve heterologous protein production using JCat (Java Codon Adaptation Tool).
Abstract: A novel method for the adaptation of target gene codon usage to most sequenced prokaryotes and selected eukaryotic gene expression hosts was developed to improve heterologous protein production. In contrast to existing tools, JCat (Java Codon Adaptation Tool) does not require the manual definition of highly expressed genes and is, therefore, a very rapid and easy method. Further options of JCat for codon adaptation include the avoidance of unwanted cleavage sites for restriction enzymes and Rho-independent transcription terminators. The output of JCat is both graphically and as Codon Adaptation Index (CAI) values given for the pasted sequence and the newly adapted sequence. Additionally, a list of genes in FASTA-format can be uploaded to calculate CAI values. In one example, all genes of the genome of Caenorhabditis elegans were adapted to Escherichia coli codon usage and further optimized to avoid commonly used restriction sites. In a second example, the Pseudomonas aeruginosa exbD gene codon usage was adapted to E.coli codon usage with parallel avoidance of the same restriction sites. For both, the degree of introduced changes was documented and evaluated. JCat is integrated into the PRODORIC database that hosts all required information on the various organisms to fulfill the requested calculations. JCat is freely accessible at http://www.prodoric.de/JCat.

Journal ArticleDOI
TL;DR: A large-scale study provides important insights into the mechanism of polyadenylation in mammalian species and represents a genomic view of the regulation of gene expression by alternative polyadenyation.
Abstract: mRNA polyadenylation is a critical cellular process in eukaryotes. It involves 3' end cleavage of nascent mRNAs and addition of the poly(A) tail, which plays important roles in many aspects of the cellular metabolism of mRNA. The process is controlled by various cis-acting elements surrounding the cleavage site, and their binding factors. In this study, we surveyed genome regions containing cleavage sites [herein called poly(A) sites], for 13,942 human and 11,155 mouse genes. We found that a great proportion of human and mouse genes have alternative polyadenylation ( approximately 54 and 32%, respectively). The conservation of alternative polyadenylation type or polyadenylation configuration between human and mouse orthologs is statistically significant, indicating that alternative polyadenylation is widely employed by these two species to produce alternative gene transcripts. Genes belonging to several functional groups, indicated by their Gene Ontology annotations, are biased with respect to polyadenylation configuration. Many poly(A) sites harbor multiple cleavage sites (51.25% human and 46.97% mouse sites), leading to heterogeneous 3' end formation for transcripts. This implies that the cleavage process of polyadenylation is largely imprecise. Different types of poly(A) sites, with regard to their relative locations in a gene, are found to have distinct nucleotide composition in surrounding genomic regions. This large-scale study provides important insights into the mechanism of polyadenylation in mammalian species and represents a genomic view of the regulation of gene expression by alternative polyadenylation.

Journal ArticleDOI
TL;DR: SCRATCH is a server for predicting protein tertiary structure and structural features and includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiaries structure.
Abstract: SCRATCH is a server for predicting protein tertiary structure and structural features. The SCRATCH software suite includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiary structure. The user simply provides an amino acid sequence and selects the desired predictions, then submits to the server. Results are emailed to the user. The server is available at http://www.igb.uci.edu/servers/psss.html.

Journal ArticleDOI
TL;DR: The DINAMelt web server simulates the melting of one or two single-stranded nucleic acids in solution to predict not just a melting temperature for a hybridized pair ofucleic acids, but entire equilibrium melting profiles as a function of temperature.
Abstract: The DINAMelt web server simulates the melting of one or two single-stranded nucleic acids in solution. The goal is to predict not just a melting temperature for a hybridized pair of nucleic acids, but entire equilibrium melting profiles as a function of temperature. The two molecules are not required to be complementary, nor must the two strand concentrations be equal. Competition among different molecular species is automatically taken into account. Calculations consider not only the heterodimer, but also the two possible homodimers, as well as the folding of each single-stranded molecule. For each of these five molecular species, free energies are computed by summing Boltzmann factors over every possible hybridized or folded state. For temperatures within a user-specified range, calculations predict species mole fractions together with the free energy, enthalpy, entropy and heat capacity of the ensemble. Ultraviolet (UV) absorbance at 260 nm is simulated using published extinction coefficients and computed base pair probabilities. All results are available as text files and plots are provided for species concentrations, heat capacity and UV absorbance versus temperature. This server is connected to an active research program and should evolve as new theory and software are developed. The server URL is http://www.bioinfo.rpi.edu/applications/hybrid/.

Journal ArticleDOI
TL;DR: Observations strongly suggest that TA loci are mobile cassettes that move frequently within and between chromosomes and also lend support to the hypothesis thatTA loci function as stress-response elements beneficial to free-living prokaryotes.
Abstract: Prokaryotic chromosomes code for toxin-antitoxin (TA) loci, often in multiple copies. In E.coli, experimental evidence indicates that TA loci are stress-response elements that help cells survive unfavorable growth conditions. The first gene in a TA operon codes for an antitoxin that combines with and neutralizes a regulatory 'toxin', encoded by the second gene. RelE and MazF toxins are regulators of translation that cleave mRNA and function, in interplay with tmRNA, in quality control of gene expression. Here, we present the results from an exhaustive search for TA loci in 126 completely sequenced prokaryotic genomes (16 archaea and 110 bacteria). We identified 671 TA loci belonging to the seven known TA gene families. Surprisingly, obligate intracellular organisms were devoid of TA loci, whereas free-living slowly growing prokaryotes had particularly many (38 in Mycobacterium tuberculosis and 43 in Nitrosomonas europaea). In many cases, TA loci were clustered and closely linked to mobile genetic elements. In the most extreme of these cases, all 13 TA loci of Vibrio cholerae were bona fide integron elements located in the V.cholerae mega-integron. These observations strongly suggest that TA loci are mobile cassettes that move frequently within and between chromosomes and also lend support to the hypothesis that TA loci function as stress-response elements beneficial to free-living prokaryotes.

Journal ArticleDOI
TL;DR: Using arithmetic ratio and probability techniques, frequent and systematic occurrence of certain sequence types are discovered, the most prominent being a potential quadruplex containing CCTGT in the first ‘loop’ position.
Abstract: We report here the results of a systematic search for the existence and prevalence of potential intramolecular G-quadruplex forming sequences in the human genome. We have also examined the tendency for particular sequences of 'loop' regions to occur in particular positions with respect to the G-tracts in a quadruplex. Using arithmetic ratio and probability techniques we have discovered frequent and systematic occurrence of certain sequence types, the most prominent being a potential quadruplex containing CCTGT in the first 'loop' position. Being able to highlight types of potential quadruplex sequences in G-rich regions is an important step in searching for biologically relevant sequences and finding their function.

Journal ArticleDOI
TL;DR: The task of gene identification frequently confronting researchers working with both novel and well studied genomes can be conveniently and reliably solved with the help of the GeneMark web software.
Abstract: The task of gene identification frequently confronting researchers working with both novel and well studied genomes can be conveniently and reliably solved with the help of the GeneMark web software (http://opal.biology.gatech.edu/GeneMark/). The website provides interfaces to the GeneMark family of programs designed and tuned for gene prediction in prokaryotic, eukaryotic and viral genomic sequences. Currently, the server allows the analysis of nearly 200 prokaryotic and >10 eukaryotic genomes using species-specific versions of the software and pre-computed gene models. In addition, genes in prokaryotic sequences from novel genomes can be identified using models derived on the spot upon sequence submission, either by a relatively simple heuristic approach or by the full-fledged self-training program GeneMarkS. A database of reannotations of >1000 viral genomes by the GeneMarkS program is also available from the web site. The GeneMark website is frequently updated to provide the latest versions of the software and gene models.

Journal ArticleDOI
TL;DR: This study raises the proportion of clustered human miRNAs that are <3000 nt apart to 42%.
Abstract: MicroRNAs (miRNAs) are � 22 nt-long non-coding RNA molecules, believed to play important roles in gene regulation. We present a comprehensive analysis of the conservation and clustering patterns of known miRNAs in human. We show that human miRNA gene clustering is significantly higher than expected at random. A total of 37% of the known human miRNA genes analyzed in this study appear in clusters of two or more with pairwise chromosomal distancesofatmost3000 nt. Comparison ofthe miRNA sequences with their homologs in four other organisms reveals a typical conservation pattern, persistent throughout the clusters. Furthermore, we show enrichment in the typical conservation patterns and other miRNA-like properties in the vicinity of known miRNA genes, compared with random genomic regions. This may imply that additional, yet unknown, miRNAs reside in these regions, consistent with the current recognition that there are overlooked miRNAs. Indeed, by comparing our predictions with cloning results and with identified miRNA genes in other mammals, we corroborate the predictions of 18 additional human miRNA genes in the vicinity of the previously known ones. Our study raises the proportion of clustered human miRNAs that are <3000 nt apart to 42%. This suggests that the clustering of miRNA genes ishigherthancurrentlyacknowledged, alluding to its evolutionary and functional implications.

Journal ArticleDOI
TL;DR: A number of state-of-the-art protein structure prediction servers have been developed by researchers working in the Bioinformatics Unit at University College London, and these servers include DISOPRED for the prediction of protein dynamic disorder and DomPred for domain boundary prediction.
Abstract: A number of state-of-the-art protein structure prediction servers have been developed by researchers working in the Bioinformatics Unit at University College London. The popular PSIPRED server allows users to perform secondary structure prediction, transmembrane topology prediction and protein fold recognition. More recent servers include DISOPRED for the prediction of protein dynamic disorder and DomPred for domain boundary prediction. These servers are available from our software home page at http://bioinf.cs.ucl.ac.uk/software.html.