scispace - formally typeset
Search or ask a question

Showing papers in "EMBnet.journal in 2011"


Journal ArticleDOI
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Abstract: When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

20,255 citations


Journal ArticleDOI
TL;DR: This article shows the usage of two free ChIP-seq analysis packages, the HOMER and ChIPseeqer along with the MACS and MEME programs and provides a customisable script suitable for the complete analysis of raw Chip-seq sequencing data either from a sequence read repository or directly from sequencing.
Abstract: Among the emerging next-generation sequencing technologies, ChIP-seq provides a very important tool for functional genomics studies. From the bioinformatics point of view, ChIP-seq analysis involves more than simply aligning the short reads to the reference genome. It also completes several other downstream steps, like determination of peaks, motif finding and gene ontology enrichment calculation. For these, several programs, applications and packages are available, both free and commercial. In this article I am showing the usage of two free ChIP-seq analysis packages, the HOMER and ChIPseeqer along with the MACS and MEME programs. I also provide a customisable script suitable for the complete analysis of raw ChIP-seq sequencing data either from a sequence read repository or directly from sequencing.

36 citations


Journal ArticleDOI
TL;DR: The 2011 AGM workshop took place at the Instituto Gulbenkian de Ciencia (IGC) in Oeiras, Portugal from 23-25 May and was an opportunity to build on the commitment to take EMBnet forward by embracing new partners and new activities.
Abstract: The 2011 AGM workshop took place at the Instituto Gulbenkian de Ciencia (IGC) in Oeiras, Portugal, from 23-25 May. The goal of the workshop was to build on the demonstrable progress made during the previous year, in particular by helping to deliver on some of the plans outlined during the 2010 AGM. It was also an opportunity to build on our commitment to take EMBnet forward by embracing new partners and new activities. The following pages summarise the workshop content, discussions and conclusions.

30 citations


Journal ArticleDOI
TL;DR: A novel method is proposed that aims to improve de novo assembly in the presence of a closely related reference in order to obtain enhanced results.
Abstract: Next Generation Sequencing has totally changed genomics: we are able to produce huge amounts of data at an incredibly low cost compared to Sanger sequencing. Despite this, some old problems have become even more difficult, de novo assembly being on top of this list. Despite efforts to design tools able to assemble, de novo, an organism sequenced with short reads, the results are still far from those achievable with long reads. In this paper, we propose a novel method that aims to improve de novo assembly in the presence of a closely related reference. The idea is to combine de novo and reference-guided assembly in order to obtain enhanced results.

20 citations


Journal ArticleDOI
TL;DR: In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows.
Abstract: The aim of the BioVeL project is to provide a seamlessly connected informatics environment that makes it easier for biodiversity scientists to carry out in-silico analysis of relevant biodiversity data and to pursue in-silico experimentation based on composing and executing sequences of complex digital data manipulations and modelling tasks. In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows. Workflows represent a way of speeding up scientific advance when that advance is based on the manipulation of digital data.

19 citations


Journal ArticleDOI
TL;DR: This work exemplifies how SEQscoring was used in a recent study as a subsequent step to a Genome Wide Association Studies (GWAS) to extract a set of candidate mutations.
Abstract: Next Generation Sequencing (NGS) technologies promise a revolution in genetic research Generating enormous amounts of data, they bring both new opportunities and new challenges to researchers SEQscoring was designed to facilitate analysis and enable extraction of the most essential information from data produced in NGS resequencing projects Its main functionality is to help researchers locate the most likely causative mutations for a specific trait or disease, but it can advantageously be used whenever the goal is to compare and explore haplotype patterns and to locate variations positioned in evolutionary conserved genomic elements SEQscoring uses input data containing information about coverage and variations produced by other programs, like MAQ and SAMtools, and put the emphasis on methods for data visualisation and interpretation We compare cases and controls in several ways and also utilise the power of comparative genomics, by scoring all variations according to their degree of conservation The SEQscoring tool is publicly accessible via the Web It has an intuitive interface and can easily be used by biologists, medical researchers, veterinarians as well as bioinformaticians We exemplify how SEQscoring was used in a recent study as a subsequent step to a Genome Wide Association Studies (GWAS) to extract a set of candidate mutations Availability: http://wwwseqscoringorg

16 citations


Journal ArticleDOI
TL;DR: An unsupervised hierarchical clustering algorithm, Chaotic Map Clustering (CMC), is used in a coupled two-way approach to analyse microarray data, and Grid technology is shown to drastically speed up the process by distributing the clustering of each matrix to a separate worker node, and thus retrieve resampling results within a few hours instead of several days.
Abstract: Microarray data are a rich source of information, containing the collected expression values of thousands of genes for well defined states of a cell or tissue. Vast amounts of data (thousands of arrays) are publicly available and ready for analysis, e.g. to scrutinise correlations between genes at the level of gene expression. The large variety of arrays available makes it possible to combine different independent experiments to extract new knowledge. Starting with a large set of data, relevant information can be isolated for further analysis. To extract the required information from data sets of such size and complexity requires an appropriate and powerful analysis method. In this study, we chose to use an unsupervised hierarchical clustering algorithm, Chaotic Map Clustering (CMC), in a coupled two-way approach to analyse such data. However, the clustering approach is intrinsically difficult, both in terms of the unknown structure of the data and interpretation of the clustering results. It is therefore critical to evaluate the quality of any unsupervised procedure for such a complex set of data and to validate the clustering results, separating those clusters that are due simply to noise or statistical fluctuations. We used a resampling method to perform this validation. The resampling procedure applies the clustering algorithm to a large number of random sub-samples of the original data matrix and, consequently, the whole process becomes computationally intensive and time consuming. Using Grid technology, we show that we can drastically speed up this process by distributing the clustering of each matrix to a separate worker node, and thus retrieve resampling results within a few hours instead of several days. Further, we offer an online service to cluster large microarray data sets and conduct the subsequent validation described in this paper.

7 citations


Journal ArticleDOI
TL;DR: Description of the bioinformatics activities done by the Institute for Health and Consumer Protection - Specialist EMBnet Node is provided.
Abstract: Description of the bioinformatics activities done by the Institute for Health and Consumer Protection - Specialist EMBnet Node

7 citations


Journal ArticleDOI
TL;DR: A new web tool called Superclusteroid is presented which can analyse PPI data, in order to detect protein complexes or characterise the functionality of unknown proteins, and is essentially an intuitive PPIData processing pipeline.
Abstract: The study of proteins and the interactions between them, known as Protein-Protein Interactions (PPI), is extremely important in interpreting all biological cellular functions. In this article, a new web tool called Superclusteroid is presented which can analyse PPI data, in order to detect protein complexes or characterise the functionality of unknown proteins. The tool is essentially an intuitive PPI data processing pipeline. It supports various input file formats and provides services such as clustering, PPI network visualisation and protein cluster function prediction. Each Superclusteroid service can be used in a sequential manner or on an individual basis. In order to assess the reliability of our tool to infer PPIs, the results of the tool were compared to already known MIPS database complexes and a case scenario is presented where a known protein complex is predicted and the functionality of some of its proteins is revealed. Availability: Superclusteroid is freely available online at http://superclusteroid.uio.no/ .

5 citations


Journal ArticleDOI
TL;DR: The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.
Abstract: The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings. To do so, it will exploit and build on the most prominent high-performance computing paradigms and large data processing technologies to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.

5 citations


Journal ArticleDOI
TL;DR: It is shown how recent improvements to these consensus methods, as implemented in the latest release of the TANGO tool, can provide an improved estimate of diversity in simulated data sets.
Abstract: One of the main computational challenges facing metagenomic analysis is the taxonomic identification of short DNA fragments. The combination of sequence alignment methods with taxonomic assignment based on consensus can provide an accurate estimate of the microbial diversity in a sample. In this note, we show how recent improvements to these consensus methods, as implemented in the latest release of the TANGO tool, can provide an improved estimate of diversity in simulated data sets.

Journal ArticleDOI
TL;DR: Jointly, the two parties will form a South African PGENI Centre of Competence for conducting translational research relevant to the local burden of disease and the most appropriate drugs for treating diseases in African populations.
Abstract: Cape Town, South Africa, 17 June 2011 - The Division of Human Genetics at the University of Cape Town (UCT) and the Centre for Proteomic and Genomic Research (CPGR), Cape Town, South Africa, proudly announce that they will be joining the ‘Pharmacogenomics for Every Nation Initiative’ (PGENI). Jointly, the two parties will form a South African PGENI Centre of Competence for conducting translational research relevant to the local burden of disease and the most appropriate drugs for treating diseases in African populations.

Journal ArticleDOI
TL;DR: The COST Action as mentioned in this paper was proposed to tackle the bioinformatics challenges inherent in managing and analysing these data, and to support researchers who use NGS technologies but do not have direct access to the necessary underpinning resources.
Abstract: COST (European Cooperation in Science and Technology) is one of the longest-running European instruments supporting cooperation, collaboration and orchestration among scientists and researchers across Europe working in the same field. Some of the organisers of the two EMBRACE workshops on ‘Next Generation Sequencing’ (NGS) saw this type of Action as exactly the right kind of mechanism to try to tame the data tsunami being generated by the furiously fast developing NGS technologies. Their aim was to tackle the bioinformatics challenges inherent in managing and analysing these data, and to support researchers who use NGS technologies but do not have direct access to the necessary underpinning bioinformatics resources. The history of the NGS initiative is short, but explosive. It is imperative for the life science community to be prepared for the enormous growth in NGS data, the challenges this presents, and the opportunities it affords. Recognising these issues, and the need for global cooperation, gave birth to the idea for this COST Action proposal; it developed into the concerted action of today.

Journal ArticleDOI
TL;DR: This paper discusses HOPE, a next generation sequencing technology that can automatically predict the molecular effects of point mutations by massively collecting highly heterogeneous data related to the protein and the mutated residue followed by automatic reasoning that as much as possible mimics the thinking of a trained bioinformatician.
Abstract: Introduction The development of next generation sequencing (NGS) technologies is accompanied by a series of challenges ranging from problems with storage of large amounts of data to the underAbstract Next generation sequencing is greatly speeding up the discovery of point mutations that are causally related to disease states. Knowledge of the effects of these point mutations on the structure and function of the affected proteins is crucial for the design of follow-up experiments and diagnostic kits, and ultimately for the implementation of a cure. HOPE can automatically predict the molecular effects of point mutations. HOPE does this by massively collecting highly heterogeneous data related to the protein and the mutated residue followed by automatic reasoning that as much as possible mimics the thinking of a trained bioinformatician. We discuss HOPE and review today’s possibilities and challenges in this field.

Journal ArticleDOI
TL;DR: The P&PR PC as mentioned in this paper is responsible for promoting any type of EMBnet activities, for the advertisement of products and services provided by the EMBNET community, as well as for proposing and developing new strategies aiming to enhance EMBNet's visibility, and to take care of public relationships with eMBnet communities and related networks/societies.
Abstract: (June 2010 – June 2011) The main mission of the P&PR PC is to nurture and promote EMBnet's image at large. The P&PR PC is responsible for promoting any type of EMBnet activities, for the advertisement of products and services provided by the EMBnet community, as well as for proposing and developing new strategies aiming to enhance EMBnet’s visibility, and to take care of public relationships with EMBnet communities and related networks/societies. In this document, we report proposals, activities and achievements of the committee from June 2010 to May 2011.

Journal ArticleDOI
TL;DR: The International Society for Computational Biology (ISCB) and the African Society for Bioinformatics and Computational biology (ASBCB) held the ISCB Africa ASBCB Conference on Bioinformics in Cape Town, South Africa, in March 2011.
Abstract: The International Society for Computational Biology (ISCB) [(1]) and the African Society for Bioinformatics and Computational Biology (ASBCB)(2) held the ISCB Africa ASBCB Conference on Bioinformatics in Cape Town, South Africa, in March 2011. The meeting constituted the second joint meeting of ISCB and ASBCB, and the third conference of the ASBCB on Bioinformatics of African Pathogens, Hosts and Vectors. The conference was preceded by a two-day workshop at the University of the Western Cape [(3])

Journal ArticleDOI
TL;DR: A collaboration to provide education and training in the use of bioinformatics tools to members of the authors' communities and to ensure EMBnet training courses are incorporated into ISCB meetings whenever possible, particularly those in developing regions.
Abstract: SAN DIEGO, USA and UPPSALA, SWEDEN, April 26, 2011 - The International Society for Computational Biology (ISCB) and the European Molecular Biology Network (EMBnet) are pleased to announce a collaboration to provide education and training in the use of bioinformatics tools to members of our communities. Having just sponsored and organised a successful workshop introducing the EMBnet eBioKit at the ISCB Africa ASBCB Conference on Bioinformatics 2011 (www.iscb.org/iscbafrica2011/), that followed a similarly sponsored and organised workshop on sequence analysis using EMBOSS at the 2009 rendition of the same conference (www.iscb.org/iscbafrica2009/), both EMBnet Executive Board member Erik Bongcam-Rudloff and ISCB President Burkhard Rost enthusiastically embraced the idea of a formal collaboration that will ensure EMBnet training courses are incorporated into ISCB meetings whenever possible, particularly those in developing regions.

Journal ArticleDOI
TL;DR: The ITB-Bari (Bioinformatics and Genomics) is the National node of EMBnet in Italy: Domenica D’Elia is the node manager and Andreas Gisel is a regular member.
Abstract: ITB, the Institute of Biomedical Technologies, is an institute of the Italian National Research Council (CNR) (http://www.cnr.it); it is composed by 4 sections, located in Milano (from where the Institute is directed), Bari, Pisa and Padova. The ITB-Bari (Bioinformatics and Genomics) is the National node of EMBnet in Italy: Domenica D’Elia is the node manager and Andreas Gisel is a regular member. URL: http://www.ba.itb.cnr.it

Journal ArticleDOI
Jingchu Luo1
TL;DR: Wang et al. as discussed by the authors have been 15 years since they joined EMBnet in 1996 as the national node of China and with the support from several funding sources, they are trying their best to work hard in bioinformatics service, education and development.
Abstract: It has been 15 years since we joined EMBnet in 1996 as the national node of China. With the support from several funding sources, we are trying our best to work hard in bioinformatics service, education and development.

Journal ArticleDOI
TL;DR: Analysis of Grid efficiency and discussing which benefits can be realistically expected with the current technology and provide useful advice for future Grid developers are concluded.
Abstract: Analysis of population evolutionary dynamics using realistic models is a challenging task requiring access to huge resources. Estimates for simple models of population growth under different mutation and selection conditions yield running times of CPU years. As mutations are stochastic events, experiments can be split into many separate jobs reducing to a large Monte Carlo-like problem that is embarrassingly parallel and thus maps perfectly on the Grid. We have been able to run simulations with realistic population sizes (up to 1.000.000 of individuals) and growth cycles using the Grid with a ~190x efficiency gain, thus reducing execution time from years to a few days. This speedup allows us to accelerate the simulation cycle and work on data analysis and additional model refinements with minimal delays and effort. We have taken measures at various steps in the process to study the efficiency gains obtained. While our simple approach may arguably be far from achieving optimum efficiency, we were able to achieve significant gains. We conclude analyzing Grid efficiency and discussing which benefits can be realistically expected with the current technology and provide useful advice for future Grid developers. All the tools described are available under GPL from http://ahriman.cnb.csic.es/sbg/tiki-download_file.php?fileId=16.