scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Programmatic access to bioinformatics tools from EMBL-EBI update: 2017.

01 Jul 2017-Nucleic Acids Research (Oxford University Press)-Vol. 45
TL;DR: An update is presented describing the latest enhancement to the Job Dispatcher APIs as well as the governance under it, which is increasingly important as more high-throughput data is generated.
Abstract: Since 2009 the EMBL-EBI provides free and unrestricted access to several bioinformatics tools via the user's browser as well as programmatically via Web Services APIs. Programmatic access to these tools, which is fundamental to bioinformatics, is increasingly important as more high-throughput data is generated, e.g. from proteomics and metagenomic experiments. Access is available using both the SOAP and RESTful approaches and their usage is reviewed regularly in order to ensure that the best, supported tools are available to all users. We present here an update describing the latest enhancement to the Job Dispatcher APIs as well as the governance under it.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability are described.
Abstract: The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.

3,529 citations

Journal ArticleDOI
TL;DR: The work to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years is presented and updates to some of the key predictive algorithms available through the website are surveyed.
Abstract: The PSIPRED Workbench is a web server offering a range of predictive methods to the bioscience community for 20 years. Here, we present the work we have completed to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years. The main focus of our recent website upgrade work has been the acceleration of analyses in the face of increasing protein sequence database size. We additionally discuss any new software, the new hardware infrastructure, our webservices and web site. Lastly we survey updates to some of the key predictive algorithms available through our website.

858 citations

Journal ArticleDOI
TL;DR: The structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13) is presented and the structural coordination of its five domains is investigated, which provides new insights into the Replication and Transcription Complex (RTC) of CoVs.
Abstract: To date, an effective therapeutic treatment that confers strong attenuation toward coronaviruses (CoVs) remains elusive. Of all the potential drug targets, the helicase of CoVs is considered to be one of the most important. Here, we first present the structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13) and investigate the structural coordination of its five domains and how these contribute to its translocation and unwinding activity. A translocation model is proposed for the Upf1-like helicase members according to three different structural conditions in solution characterized through H/D exchange assay, including substrate state (SARS-Nsp13-dsDNA bound with AMPPNP), transition state (bound with ADP-AlF4-) and product state (bound with ADP). We observed that the β19-β20 loop on the 1A domain is involved in unwinding process directly. Furthermore, we have shown that the RNA dependent RNA polymerase (RdRp), SARS-Nsp12, can enhance the helicase activity of SARS-Nsp13 through interacting with it directly. The interacting regions were identified and can be considered common across CoVs, which provides new insights into the Replication and Transcription Complex (RTC) of CoVs.

239 citations

Journal ArticleDOI
TL;DR: Sequencing and genomic diversification of five allopolyploid cotton species provide insights into polyploid genome evolution and epigenetic landscapes for cotton improvement, and will empower efforts to manipulate genetic recombination and modify epigenetics landscapes and target genes for crop improvement.
Abstract: Polyploidy is an evolutionary innovation for many animals and all flowering plants, but its impact on selection and domestication remains elusive. Here we analyze genome evolution and diversification for all five allopolyploid cotton species, including economically important Upland and Pima cottons. Although these polyploid genomes are conserved in gene content and synteny, they have diversified by subgenomic transposon exchanges that equilibrate genome size, evolutionary rate heterogeneities and positive selection between homoeologs within and among lineages. These differential evolutionary trajectories are accompanied by gene-family diversification and homoeolog expression divergence among polyploid lineages. Selection and domestication drive parallel gene expression similarities in fibers of two cultivated cottons, involving coexpression networks and N6-methyladenosine RNA modifications. Furthermore, polyploidy induces recombination suppression, which correlates with altered epigenetic landscapes and can be overcome by wild introgression. These genomic insights will empower efforts to manipulate genetic recombination and modify epigenetic landscapes and target genes for crop improvement.

195 citations

Book ChapterDOI
TL;DR: Clustal Omega is a version, completely rewritten and revised in 2011, of the widely used Clustal series of programs for multiple sequence alignment that can deal with very large numbers of DNA/RNA or protein sequences due to its use of the mBed algorithm for calculating guide-trees.
Abstract: Clustal Omega is a version, completely rewritten and revised in 2011, of the widely used Clustal series of programs for multiple sequence alignment. It can deal with very large numbers (many tens of thousands) of DNA/RNA or protein sequences due to its use of the mBed algorithm for calculating guide-trees. This algorithm allows very large alignment problems to be tackled very quickly, even on personal computers. The accuracy of the program has been considerably improved over earlier Clustal programs, through the use of the HHalign method for aligning profile hidden Markov models. The program currently is used from the command-line or can be run online.

118 citations

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations


"Programmatic access to bioinformati..." refers methods in this paper

  • ...Finally, simple phylogeny, which is used for generating phylogenetic trees using Neighbor-Joining (32) and UPGMA (33) methods....

    [...]

Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations


"Programmatic access to bioinformati..." refers background in this paper

  • ...At present, tools include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST (4), FASTA (5) and PSI-Search (6), multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega (7), MAFFT (8) and KAlign (9), and other sequence analysis tools (https://www.ebi.ac.uk/Tools/ pfa/) such as InterProScan5 (10)....

    [...]

  • ...uk/Tools/msa/) such as Clustal Omega (7), MAFFT (8) and KAlign (9), and other sequence analysis tools (https://www....

    [...]

Journal ArticleDOI
TL;DR: The Clustal W and ClUSTal X multiple sequence alignment programs have been completely rewritten in C++ to facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.
Abstract: Summary: The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. Availability: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/ Contact: clustalw@ucd.ie

25,325 citations

Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations


"Programmatic access to bioinformati..." refers background in this paper

  • ...At present, tools include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST (4), FASTA (5) and PSI-Search (6), multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega (7), MAFFT (8) and KAlign (9), and other sequence analysis tools (https://www.ebi.ac.uk/Tools/ pfa/) such as InterProScan5 (10)....

    [...]

  • ...uk/Tools/sss/) such as BLAST (4), FASTA (5) and PSI-Search (6), multiple sequence alignment tools (https://www....

    [...]

  • ...This is followed by the BLAST+ programs, which give access to ∼45 000 libraries of sequences from ENA, UniProt and EnsemblGenomes....

    [...]

Journal ArticleDOI
TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.
Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

12,489 citations


"Programmatic access to bioinformati..." refers background in this paper

  • ...At present, tools include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST (4), FASTA (5) and PSI-Search (6), multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega (7), MAFFT (8) and KAlign (9), and other sequence analysis tools (https://www.ebi.ac.uk/Tools/ pfa/) such as InterProScan5 (10)....

    [...]

  • ...Clustal Omega and Muscle are the most popular multiple sequence alignment methods. water and needle from the EMBOSS suite give access to local and global pairwise alignments methods. seqret is very popular for sequence reformatting, pfamscan for searching Pfam HMMs and Phobius (31) for predicting transmembrane regions and signal peptides....

    [...]

  • ...uk/Tools/msa/) such as Clustal Omega (7), MAFFT (8) and KAlign (9), and other sequence analysis tools (https://www....

    [...]