Showing papers in "EMBnet.journal in 2011"

PDF

Open Access

Journal Article•DOI•

Cutadapt removes adapter sequences from high-throughput sequencing reads

[...]

Marcel Martin¹•Institutions (1)

02 May 2011-EMBnet.journal

TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.

...read moreread less

Abstract: When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

...read moreread less

20,255 citations

Journal Article•DOI•

Command line analysis of ChIP-seq results

[...]

Endre Barta¹•Institutions (1)

Hungarian Academy of Sciences¹

12 May 2011-EMBnet.journal

TL;DR: This article shows the usage of two free ChIP-seq analysis packages, the HOMER and ChIPseeqer along with the MACS and MEME programs and provides a customisable script suitable for the complete analysis of raw Chip-seq sequencing data either from a sequence read repository or directly from sequencing.

...read moreread less

Abstract: Among the emerging next-generation sequencing technologies, ChIP-seq provides a very important tool for functional genomics studies. From the bioinformatics point of view, ChIP-seq analysis involves more than simply aligning the short reads to the reference genome. It also completes several other downstream steps, like determination of peaks, motif finding and gene ontology enrichment calculation. For these, several programs, applications and packages are available, both free and commercial. In this article I am showing the usage of two free ChIP-seq analysis packages, the HOMER and ChIPseeqer along with the MACS and MEME programs. I also provide a customisable script suitable for the complete analysis of raw ChIP-seq sequencing data either from a sequence read repository or directly from sequencing.

...read moreread less

36 citations

Journal Article•DOI•

Report of the EMBnet AGM 2011 Workshop

[...]

Teresa K. Attwood¹, Andreas Gisel, Etienne P. de Villiers, Erik Bongcam-Rudloff², Domenica D'Elia, Pedro Fernandes³, Georgios Magklaras - Show less +3 more•Institutions (3)

University of Manchester¹, Swedish University of Agricultural Sciences², Instituto Gulbenkian de Ciência³

15 Dec 2011-EMBnet.journal

TL;DR: The 2011 AGM workshop took place at the Instituto Gulbenkian de Ciencia (IGC) in Oeiras, Portugal from 23-25 May and was an opportunity to build on the commitment to take EMBnet forward by embracing new partners and new activities.

...read moreread less

Abstract: The 2011 AGM workshop took place at the Instituto Gulbenkian de Ciencia (IGC) in Oeiras, Portugal, from 23-25 May. The goal of the workshop was to build on the demonstrable progress made during the previous year, in particular by helping to deliver on some of the plans outlined during the 2010 AGM. It was also an opportunity to build on our commitment to take EMBnet forward by embracing new partners and new activities. The following pages summarise the workshop content, discussions and conclusions.

...read moreread less

30 citations

Journal Article•DOI•

e-RGA: enhanced Reference Guided Assembly of Complex Genomes

[...]

Francesco Vezzi¹, Federica Cattonaro, Alberto Policriti¹•Institutions (1)

University of Udine¹

02 Aug 2011-EMBnet.journal

TL;DR: A novel method is proposed that aims to improve de novo assembly in the presence of a closely related reference in order to obtain enhanced results.

...read moreread less

Abstract: Next Generation Sequencing has totally changed genomics: we are able to produce huge amounts of data at an incredibly low cost compared to Sanger sequencing. Despite this, some old problems have become even more difficult, de novo assembly being on top of this list. Despite efforts to design tools able to assemble, de novo, an organism sequenced with short reads, the results are still far from those achievable with long reads. In this paper, we propose a novel method that aims to improve de novo assembly in the presence of a closely related reference. The idea is to combine de novo and reference-guided assembly in order to obtain enhanced results.

...read moreread less

20 citations

Journal Article•DOI•

BioVeL: Biodiversity Virtual e-Laboratory

[...]

Saverio Vicario, Alex Hardisty¹, Niobe Haitas•Institutions (1)

Cardiff University¹

06 Sep 2011-EMBnet.journal

TL;DR: In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows.

...read moreread less

Abstract: The aim of the BioVeL project is to provide a seamlessly connected informatics environment that makes it easier for biodiversity scientists to carry out in-silico analysis of relevant biodiversity data and to pursue in-silico experimentation based on composing and executing sequences of complex digital data manipulations and modelling tasks. In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows. Workflows represent a way of speeding up scientific advance when that advance is based on the manipulation of digital data.

...read moreread less

19 citations

Journal Article•DOI•

SEQscoring: a tool to facilitate the interpretation of data generated with next generation sequencing technologies

[...]

Katarina Truvé¹, Oscar Eriksson¹, Martin Norling¹, Maria Wilbe¹, Evan Mauceli², Kerstin Lindblad-Toh², Erik Bongcam-Rudloff¹ - Show less +3 more•Institutions (2)

Swedish University of Agricultural Sciences¹, Massachusetts Institute of Technology²

02 Aug 2011-EMBnet.journal

TL;DR: This work exemplifies how SEQscoring was used in a recent study as a subsequent step to a Genome Wide Association Studies (GWAS) to extract a set of candidate mutations.

...read moreread less

Abstract: Next Generation Sequencing (NGS) technologies promise a revolution in genetic research Generating enormous amounts of data, they bring both new opportunities and new challenges to researchers SEQscoring was designed to facilitate analysis and enable extraction of the most essential information from data produced in NGS resequencing projects Its main functionality is to help researchers locate the most likely causative mutations for a specific trait or disease, but it can advantageously be used whenever the goal is to compare and explore haplotype patterns and to locate variations positioned in evolutionary conserved genomic elements SEQscoring uses input data containing information about coverage and variations produced by other programs, like MAQ and SAMtools, and put the emphasis on methods for data visualisation and interpretation We compare cases and controls in several ways and also utilise the power of comparative genomics, by scoring all variations according to their degree of conservation The SEQscoring tool is publicly accessible via the Web It has an intuitive interface and can easily be used by biologists, medical researchers, veterinarians as well as bioinformaticians We exemplify how SEQscoring was used in a recent study as a subsequent step to a Genome Wide Association Studies (GWAS) to extract a set of candidate mutations Availability: http://wwwseqscoringorg

...read moreread less

16 citations

Journal Article•DOI•

GRID distribution supports clustering validation of large mixed microarray data sets

[...]

Angelica Tulipano, Carmela Marangi, Leonardo Angelini¹, Giacinto Donvito, G. Cuscela, Giorgio Maggi¹, Andreas Gisel - Show less +3 more•Institutions (1)

Instituto Politécnico Nacional¹

12 May 2011-EMBnet.journal

TL;DR: An unsupervised hierarchical clustering algorithm, Chaotic Map Clustering (CMC), is used in a coupled two-way approach to analyse microarray data, and Grid technology is shown to drastically speed up the process by distributing the clustering of each matrix to a separate worker node, and thus retrieve resampling results within a few hours instead of several days.

...read moreread less

Abstract: Microarray data are a rich source of information, containing the collected expression values of thousands of genes for well defined states of a cell or tissue. Vast amounts of data (thousands of arrays) are publicly available and ready for analysis, e.g. to scrutinise correlations between genes at the level of gene expression. The large variety of arrays available makes it possible to combine different independent experiments to extract new knowledge. Starting with a large set of data, relevant information can be isolated for further analysis. To extract the required information from data sets of such size and complexity requires an appropriate and powerful analysis method. In this study, we chose to use an unsupervised hierarchical clustering algorithm, Chaotic Map Clustering (CMC), in a coupled two-way approach to analyse such data. However, the clustering approach is intrinsically difficult, both in terms of the unknown structure of the data and interpretation of the clustering results. It is therefore critical to evaluate the quality of any unsupervised procedure for such a complex set of data and to validate the clustering results, separating those clusters that are due simply to noise or statistical fluctuations. We used a resampling method to perform this validation. The resampling procedure applies the clustering algorithm to a large number of random sub-samples of the original data matrix and, consequently, the whole process becomes computationally intensive and time consuming. Using Grid technology, we show that we can drastically speed up this process by distributing the clustering of each matrix to a separate worker node, and thus retrieve resampling results within a few hours instead of several days. Further, we offer an online service to cluster large microarray data sets and conduct the subsequent validation described in this paper.

...read moreread less

7 citations

Journal Article•DOI•

CCSIS specialist EMBnet node: AGM2011 report

[...]

Alex Patak

15 Dec 2011-EMBnet.journal

TL;DR: Description of the bioinformatics activities done by the Institute for Health and Consumer Protection - Specialist EMBnet Node is provided.

...read moreread less

Abstract: Description of the bioinformatics activities done by the Institute for Health and Consumer Protection - Specialist EMBnet Node

...read moreread less

7 citations

Journal Article•DOI•

Superclusteroid: a Web tool dedicated to data processing of protein-protein interaction networks

[...]

Athina Ropodi¹, Nikolaos Sakkos², Charalampos N. Moschopoulos³, George Magklaras⁴, Sophia Kossida² - Show less +1 more•Institutions (4)

National and Kapodistrian University of Athens¹, Academy of Athens², University of Patras³, University of Oslo⁴

20 Oct 2011-EMBnet.journal

TL;DR: A new web tool called Superclusteroid is presented which can analyse PPI data, in order to detect protein complexes or characterise the functionality of unknown proteins, and is essentially an intuitive PPIData processing pipeline.

...read moreread less

Abstract: The study of proteins and the interactions between them, known as Protein-Protein Interactions (PPI), is extremely important in interpreting all biological cellular functions. In this article, a new web tool called Superclusteroid is presented which can analyse PPI data, in order to detect protein complexes or characterise the functionality of unknown proteins. The tool is essentially an intuitive PPI data processing pipeline. It supports various input file formats and provides services such as clustering, PPI network visualisation and protein cluster function prediction. Each Superclusteroid service can be used in a sequential manner or on an individual basis. In order to assess the reliability of our tool to infer PPIs, the results of the tool were compared to already known MIPS database complexes and a case scenario is presented where a known protein complex is predicted and the functionality of some of its proteins is revealed. Availability: Superclusteroid is freely available online at http://superclusteroid.uio.no/ .

...read moreread less

5 citations

Journal Article•DOI•

Mastering Data-Intensive Collaboration and Decision Making through a Cloud Infrastructure: The Dicode EU project

[...]

Nikos Karacapilidis¹•Institutions (1)

University of Patras¹

14 Apr 2011-EMBnet.journal

TL;DR: The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.

...read moreread less

Abstract: The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings. To do so, it will exploit and build on the most prominent high-performance computing paradigms and large data processing technologies to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.

...read moreread less

5 citations

Journal Article•DOI•

Taxonomic Assignment in Metagenomics with TANGO

[...]

Daniel Alonso-Alemany¹, Jose C. Clemente², Jesper Jansson³, Gabriel Valiente¹•Institutions (3)

Polytechnic University of Catalonia¹, University of Colorado Boulder², Ochanomizu University³

10 Oct 2011-EMBnet.journal

TL;DR: It is shown how recent improvements to these consensus methods, as implemented in the latest release of the TANGO tool, can provide an improved estimate of diversity in simulated data sets.

...read moreread less

Abstract: One of the main computational challenges facing metagenomic analysis is the taxonomic identification of short DNA fragments. The combination of sequence alignment methods with taxonomic assignment based on consensus can provide an accurate estimate of the microbial diversity in a sample. In this note, we show how recent improvements to these consensus methods, as implemented in the latest release of the TANGO tool, can provide an improved estimate of diversity in simulated data sets.

...read moreread less

Journal Article•DOI•

UCT and CPGR join forces with international Pharmacogenomics initiative focussing on African diseases

[...]

Reinhard Hiller, Raj Ramesar¹•Institutions (1)

University of Cape Town¹

21 Jul 2011-EMBnet.journal

TL;DR: Jointly, the two parties will form a South African PGENI Centre of Competence for conducting translational research relevant to the local burden of disease and the most appropriate drugs for treating diseases in African populations.

...read moreread less

Abstract: Cape Town, South Africa, 17 June 2011 - The Division of Human Genetics at the University of Cape Town (UCT) and the Centre for Proteomic and Genomic Research (CPGR), Cape Town, South Africa, proudly announce that they will be joining the ‘Pharmacogenomics for Every Nation Initiative’ (PGENI). Jointly, the two parties will form a South African PGENI Centre of Competence for conducting translational research relevant to the local burden of disease and the most appropriate drugs for treating diseases in African populations.

...read moreread less

Journal Article•DOI•

SEQAHEAD - COST Action BM1006: Next Generation Sequencing Data Analysis Network

[...]

Teresa K. Attwood¹, Erik Bongcam-Rudloff², Andreas Gisel•Institutions (2)

University of Manchester¹, Swedish University of Agricultural Sciences²

18 May 2011-EMBnet.journal

TL;DR: The COST Action as mentioned in this paper was proposed to tackle the bioinformatics challenges inherent in managing and analysing these data, and to support researchers who use NGS technologies but do not have direct access to the necessary underpinning resources.

...read moreread less

Abstract: COST (European Cooperation in Science and Technology) is one of the longest-running European instruments supporting cooperation, collaboration and orchestration among scientists and researchers across Europe working in the same field. Some of the organisers of the two EMBRACE workshops on ‘Next Generation Sequencing’ (NGS) saw this type of Action as exactly the right kind of mechanism to try to tame the data tsunami being generated by the furiously fast developing NGS technologies. Their aim was to tackle the bioinformatics challenges inherent in managing and analysing these data, and to support researchers who use NGS technologies but do not have direct access to the necessary underpinning bioinformatics resources. The history of the NGS initiative is short, but explosive. It is imperative for the life science community to be prepared for the enormous growth in NGS data, the challenges this presents, and the opportunities it affords. Recognising these issues, and the need for global cooperation, gave birth to the idea for this COST Action proposal; it developed into the concerted action of today.

...read moreread less

Journal Article•DOI•

The future of HOPE: what can and cannot be predicted about the molecular effects of a disease causing point mutation in a protein?

[...]

Francesca Camilli¹, Annika Borrmann¹, Shima Gholizadeh¹, Tim A. H. te Beek², Remko Kuipers³, Hanka Venselaar¹ - Show less +2 more•Institutions (3)

Radboud University Nijmegen Medical Centre¹, Netherlands Bioinformatics Centre², Wageningen University and Research Centre³

18 May 2011-EMBnet.journal

TL;DR: This paper discusses HOPE, a next generation sequencing technology that can automatically predict the molecular effects of point mutations by massively collecting highly heterogeneous data related to the protein and the mutated residue followed by automatic reasoning that as much as possible mimics the thinking of a trained bioinformatician.

...read moreread less

Abstract: Introduction The development of next generation sequencing (NGS) technologies is accompanied by a series of challenges ranging from problems with storage of large amounts of data to the underAbstract Next generation sequencing is greatly speeding up the discovery of point mutations that are causally related to disease states. Knowledge of the effects of these point mutations on the structure and function of the affected proteins is crucial for the design of follow-up experiments and diagnostic kits, and ultimately for the implementation of a cure. HOPE can automatically predict the molecular effects of point mutations. HOPE does this by massively collecting highly heterogeneous data related to the protein and the mutated residue followed by automatic reasoning that as much as possible mimics the thinking of a trained bioinformatician. We discuss HOPE and review today’s possibilities and challenges in this field.

...read moreread less

Journal Article•DOI•

Publicity and Public Relations Project Committee: AGM2011 report

[...]

Domenica D'Elia

15 Dec 2011-EMBnet.journal

TL;DR: The P&PR PC as mentioned in this paper is responsible for promoting any type of EMBnet activities, for the advertisement of products and services provided by the EMBNET community, as well as for proposing and developing new strategies aiming to enhance EMBNet's visibility, and to take care of public relationships with eMBnet communities and related networks/societies.

...read moreread less

Abstract: (June 2010 – June 2011) The main mission of the P&PR PC is to nurture and promote EMBnet's image at large. The P&PR PC is responsible for promoting any type of EMBnet activities, for the advertisement of products and services provided by the EMBnet community, as well as for proposing and developing new strategies aiming to enhance EMBnet’s visibility, and to take care of public relationships with EMBnet communities and related networks/societies. In this document, we report proposals, activities and achievements of the committee from June 2010 to May 2011.

...read moreread less

Journal Article•DOI•

ISCB Africa ASBCB Conference on Bioinformatics and eBioKit Workshop

[...]

Etienne P. de Villiers, Judit Kumuthini, Erik Bongcam-Rudloff¹•Institutions (1)

Swedish University of Agricultural Sciences¹

24 Oct 2011-EMBnet.journal

TL;DR: The International Society for Computational Biology (ISCB) and the African Society for Bioinformatics and Computational biology (ASBCB) held the ISCB Africa ASBCB Conference on Bioinformics in Cape Town, South Africa, in March 2011.

...read moreread less

Abstract: The International Society for Computational Biology (ISCB) [(1]) and the African Society for Bioinformatics and Computational Biology (ASBCB)(2) held the ISCB Africa ASBCB Conference on Bioinformatics in Cape Town, South Africa, in March 2011. The meeting constituted the second joint meeting of ISCB and ASBCB, and the third conference of the ASBCB on Bioinformatics of African Pathogens, Hosts and Vectors. The conference was preceded by a two-day workshop at the University of the Western Cape [(3])

...read moreread less

Journal Article•DOI•

ISCB and EMBnet to collaborate on bioinformatics education and training

[...]

Iscb EMBnet

04 May 2011-EMBnet.journal

TL;DR: A collaboration to provide education and training in the use of bioinformatics tools to members of the authors' communities and to ensure EMBnet training courses are incorporated into ISCB meetings whenever possible, particularly those in developing regions.

...read moreread less

Abstract: SAN DIEGO, USA and UPPSALA, SWEDEN, April 26, 2011 - The International Society for Computational Biology (ISCB) and the European Molecular Biology Network (EMBnet) are pleased to announce a collaboration to provide education and training in the use of bioinformatics tools to members of our communities. Having just sponsored and organised a successful workshop introducing the EMBnet eBioKit at the ISCB Africa ASBCB Conference on Bioinformatics 2011 (www.iscb.org/iscbafrica2011/), that followed a similarly sponsored and organised workshop on sequence analysis using EMBOSS at the 2009 rendition of the same conference (www.iscb.org/iscbafrica2009/), both EMBnet Executive Board member Erik Bongcam-Rudloff and ISCB President Burkhard Rost enthusiastically embraced the idea of a formal collaboration that will ensure EMBnet training courses are incorporated into ISCB meetings whenever possible, particularly those in developing regions.

...read moreread less

Journal Article•DOI•

Italian EMBnet node: AGM2011 report

[...]

Domenica D'Elia

15 Dec 2011-EMBnet.journal

TL;DR: The ITB-Bari (Bioinformatics and Genomics) is the National node of EMBnet in Italy: Domenica D’Elia is the node manager and Andreas Gisel is a regular member.

...read moreread less

Abstract: ITB, the Institute of Biomedical Technologies, is an institute of the Italian National Research Council (CNR) (http://www.cnr.it); it is composed by 4 sections, located in Milano (from where the Institute is directed), Bari, Pisa and Padova. The ITB-Bari (Bioinformatics and Genomics) is the National node of EMBnet in Italy: Domenica D’Elia is the node manager and Andreas Gisel is a regular member. URL: http://www.ba.itb.cnr.it

...read moreread less

Journal Article•DOI•

Chinese EMBnet node: AGM2011 report

[...]

Jingchu Luo¹•Institutions (1)

Peking University¹

15 Dec 2011-EMBnet.journal

TL;DR: Wang et al. as discussed by the authors have been 15 years since they joined EMBnet in 1996 as the national node of China and with the support from several funding sources, they are trying their best to work hard in bioinformatics service, education and development.

...read moreread less

Abstract: It has been 15 years since we joined EMBnet in 1996 as the national node of China. With the support from several funding sources, we are trying our best to work hard in bioinformatics service, education and development.

...read moreread less

Journal Article•DOI•

Using the Grid to run population dynamics simulations

[...]

José R. Valverde¹•Institutions (1)

Spanish National Research Council¹

02 Nov 2011-EMBnet.journal

TL;DR: Analysis of Grid efficiency and discussing which benefits can be realistically expected with the current technology and provide useful advice for future Grid developers are concluded.

...read moreread less

Abstract: Analysis of population evolutionary dynamics using realistic models is a challenging task requiring access to huge resources. Estimates for simple models of population growth under different mutation and selection conditions yield running times of CPU years. As mutations are stochastic events, experiments can be split into many separate jobs reducing to a large Monte Carlo-like problem that is embarrassingly parallel and thus maps perfectly on the Grid. We have been able to run simulations with realistic population sizes (up to 1.000.000 of individuals) and growth cycles using the Grid with a ~190x efficiency gain, thus reducing execution time from years to a few days. This speedup allows us to accelerate the simulation cycle and work on data analysis and additional model refinements with minimal delays and effort. We have taken measures at various steps in the process to study the efficiency gains obtained. While our simple approach may arguably be far from achieving optimum efficiency, we were able to achieve significant gains. We conclude analyzing Grid efficiency and discussing which benefits can be realistically expected with the current technology and provide useful advice for future Grid developers. All the tools described are available under GPL from http://ahriman.cnb.csic.es/sbg/tiki-download_file.php?fileId=16.

...read moreread less