scispace - formally typeset
Search or ask a question

Showing papers in "Briefings in Bioinformatics in 2010"


Journal ArticleDOI
Heng Li1, Nils Homer
TL;DR: A wide variety of alignment algorithms and software have been developed over the past two years as discussed by the authors, and the current development of these algorithms and their practical applications on different types of experimental data.
Abstract: Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

958 citations


Journal ArticleDOI
TL;DR: Pathway Tools as discussed by the authors is a bioinformatics software environment with a broad set of capabilities, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search.
Abstract: Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms.

471 citations


Journal ArticleDOI
TL;DR: The beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome, are reviewed, which promises to be a major step forward in the ability to model and reason about cellular function and behavior.
Abstract: Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.

260 citations


Journal ArticleDOI
TL;DR: The assembly of the 2.25-Gb genome of the giant panda from Illumina sequence reads with an average length of just 52 nucleotides is discussed, and some practical aspects such as data filtering and submission of assembly data to public repositories are discussed.
Abstract: A new generation of sequencing technologies is revolutionizing molecular biology. Illumina's Solexa and Applied Biosystems' SOLiD generate gigabases of nucleotide sequence per week. However, a perceived limitation of these ultra-high-throughput technologies is their short read-lengths. De novo assembly of sequence reads generated by classical Sanger capillary sequencing is a mature field of research. Unfortunately, the existing sequence assembly programs were not effective for short sequence reads generated by Illumina and SOLiD platforms. Early studies suggested that, in principle, sequence reads as short as 20-30 nucleotides could be used to generate useful assemblies of both prokaryotic and eukaryotic genome sequences, albeit containing many gaps. The early feasibility studies and proofs of principle inspired several bioinformatics research groups to implement new algorithms as freely available software tools specifically aimed at assembling reads of 30-50 nucleotides in length. This has led to the generation of several draft genome sequences based exclusively on short sequence Illumina sequence reads, recently culminating in the assembly of the 2.25-Gb genome of the giant panda from Illumina sequence reads with an average length of just 52 nucleotides. As well as reviewing recent developments in the field, we discuss some practical aspects such as data filtering and submission of assembly data to public repositories.

235 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce available sources and methods that could be utilized for the network-based study of traditional Chinese medicine pharmacology, and propose a workflow for networkbased TCM pharmacology study, and presents two case studies on applying these sources and method to understand the mode of action of TCM recipes.
Abstract: To target complex, multi-factorial diseases more effectively, there has been an emerging trend of multi-target drug development based on network biology, as well as an increasing interest in traditional Chinese medicine (TCM) that applies a more holistic treatment to diseases. Thousands of years' clinic practices in TCM have accumulated a considerable number of formulae that exhibit reliable in vivo efficacy and safety. However, the molecular mechanisms responsible for their therapeutic effectiveness are still unclear. The development of network-based systems biology has provided considerable support for the understanding of the holistic, complementary and synergic essence of TCM in the context of molecular networks. This review introduces available sources and methods that could be utilized for the network-based study of TCM pharmacology, proposes a workflow for network-based TCM pharmacology study, and presents two case studies on applying these sources and methods to understand the mode of action of TCM recipes.

181 citations


Journal ArticleDOI
TL;DR: The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems.
Abstract: Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with ‘bottom-up’ simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.

181 citations


Journal ArticleDOI
TL;DR: A broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing is provided.
Abstract: Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called ‘next-generation’ sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing.

179 citations


Journal ArticleDOI
TL;DR: The state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans are described.
Abstract: Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35-250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.

157 citations


Journal ArticleDOI
TL;DR: The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels, then they are turned to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings.
Abstract: High-throughput biotechnologies, such as gene expression microarrays or mass-spectrometry-based proteomic assays, suffer from frequent missing values due to various experimental reasons. Since the missing data points can hinder downstream analyses, there exists a wide variety of ways in which to deal with missing values in large-scale data sets. Nowadays, it has become routine to estimate (or impute) the missing values prior to the actual data analysis. After nearly a decade since the publication of the first missing value imputation methods for gene expression microarray data, new imputation approaches are still being developed at an increasing rate. However, what is lagging behind is a systematic and objective evaluation of the strengths and weaknesses of the different approaches when faced with different types of data sets and experimental questions. In this review, the present strategies for missing value imputation and the measures for evaluating their performance are described. The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels; then, we turn to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings. Along with a description of the basic principles behind the different imputation approaches, the review tries to provide practical guidance for the users of high-throughput technologies on how to choose the imputation tool for their data and questions, and some additional research directions for the developers of imputation methodologies.

154 citations


Journal ArticleDOI
TL;DR: A survey of recent advancements in the emerging field of patient-specific modeling (PSM) suggests that with further testing and research, PSM-derived technologies will eventually become valuable, versatile clinical tools.
Abstract: We present a survey of recent advancements in the emerging field of patient-specific modeling (PSM). Researchers in this field are currently simulating a wide variety of tissue and organ dynamics to address challenges in various clinical domains. The majority of this research employs three-dimensional, image-based modeling techniques. Recent PSM publications mostly represent feasibility or preliminary validation studies on modeling technologies, and these systems will require further clinical validation and usability testing before they can become a standard of care. We anticipate that with further testing and research, PSM-derived technologies will eventually become valuable, versatile clinical tools.

148 citations


Journal ArticleDOI
TL;DR: A tour of six questions to improve the users' awareness about the method, the correct use of concepts and alternative tools provided by scientific community and to be aware of the consequences their choices produce on the results.
Abstract: DNA barcoding is a recent and widely used molecular-based identification system that aims to identify biological specimens, and to assign them to a given species. However, DNA barcoding is even more than this, and besides many practical uses, it can be considered the core of an integrated taxonomic system, where bioinformatics plays a key role. DNA barcoding data could be interpreted in different ways depending on the examined taxa but the technique relies on standardized approaches, methods and analyses. The existing reference towards a common way to treat DNA barcoding data, analyses and results is the Barcode of Life Data Systems. However, the scientific community has produced in the recent years a number of alternative methods to manage barcoding data. The present work starts from this point, because users should be aware of the consequences their choices produce on the results. Despite the fact that a strict standardization is the essence of DNA barcoding, we propose a tour of six questions to improve the users' awareness about the method, the correct use of concepts and alternative tools provided by scientific community.

Journal ArticleDOI
TL;DR: This paper reviews some recent efforts in exploiting the processing power of GPUs for the simulation of biological systems and offers an emerging alternative, GPGPU, which offers the power of a small computer cluster at a cost of approximately $400.
Abstract: The development of detailed, coherent, models of complex biological systems is recognized as a key requirement for integrating the increasing amount of experimental data. In addition, in-silico simulation of bio-chemical models provides an easy way to test different experimental conditions, helping in the discovery of the dynamics that regulate biological systems. However, the computational power required by these simulations often exceeds that available on common desktop computers and thus expensive high performance computing solutions are required. An emerging alternative is represented by general-purpose scientific computing on graphics processing units (GPGPU), which offers the power of a small computer cluster at a cost of approximately $400. Computing with a GPU requires the development of specific algorithms, since the programming paradigm substantially differs from traditional CPU-based computing. In this paper, we review some recent efforts in exploiting the processing power of GPUs for the simulation of biological systems.

Journal ArticleDOI
TL;DR: This review argues that the heterogeneity of disordered segments needs to be taken into account for a better understanding of protein disorder and presents a small survey of current methods to identify disordered proteins or protein segments.
Abstract: Intrinsically disordered/unstructured proteins exist without a stable three-dimensional (3D) structure as highly flexible conformational ensembles. The available genome sequences revealed that these proteins are surprisingly common and their frequency reaches high proportions in eukaryotes. Due to their vital role in various biological processes including signaling and regulation and their involvement in various diseases, disordered proteins and protein segments are the focus of many biochemical, molecular biological, pathological and pharmaceutical studies. These proteins are difficult to study experimentally because of the lack of unique structure in the isolated form. Their amino acid sequence, however, is available, and can be used for their identification and characterization by bioinformatic tools, analogously to globular proteins. In this review, we first present a small survey of current methods to identify disordered proteins or protein segments, focusing on those that are publicly available as web servers. In more detail we also discuss approaches that predict disordered regions and specific regions involved in protein binding by modeling the physical background of protein disorder. In our review we argue that the heterogeneity of disordered segments needs to be taken into account for a better understanding of protein disorder.

Journal ArticleDOI
Thomas Werner1
TL;DR: This review emphasizes the particular contribution NGS-based technologies make to functional genomics research with a special focus on gene regulation by transcription factor binding sites.
Abstract: Genome-wide sequencing has enabled modern biomedical research to relate more and more events in healthy as well as disease-affected cells and tissues to the genomic sequence. Now next generation sequencing (NGS) extends that reach into multiple almost complete genomes of the same species, revealing more and more details about how individual genomes as well as individual aspects of their regulation differ from each other. The inclusion of NGS-based transcriptome sequencing, chromatin-immunoprecipitation (ChIP) of transcription factor binding and epigenetic analyses (usually based on DNA methylation or histone modification ChIP) completes the picture with unprecedented resolution enabling the detection of even subtle differences such as alternative splicing of individual exons. Functional genomics aims at the elucidation of the molecular basis of biological functions and requires analyses that go far beyond the primary analysis of the reads such as mapping to a genome template sequence. The various and complex interactions between the genome, gene products and metabolites define biological function, which necessitates inclusion of results other than sequence tags in quite elaborative approaches. However, the extra efforts pay off in revealing mechanisms as well as providing the foundation for new strategies in systems biology and personalized medicine. This review emphasizes the particular contribution NGS-based technologies make to functional genomics research with a special focus on gene regulation by transcription factor binding sites.

Journal ArticleDOI
TL;DR: This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
Abstract: Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.

Journal ArticleDOI
TL;DR: This article reviews the currently emerging field of multi-scale modelling in computational biomedicine and proposes a direction that complements the classic dynamical systems approach and introduces two distinct case studies, transmission of resistance in human immunodeficiency virus spreading and in-stent restenosis in coronary artery disease.
Abstract: The inherent complexity of biomedical systems is well recognized; they are multi-scale, multi-science systems, bridging a wide range of temporal and spatial scales. This article reviews the currently emerging field of multi-scale modelling in computational biomedicine. Many exciting multi-scale models exist or are under development. However, an underpinning multi-scale modelling methodology seems to be missing. We propose a direction that complements the classic dynamical systems approach and introduce two distinct case studies, transmission of resistance in human immunodeficiency virus spreading and in-stent restenosis in coronary artery disease.

Journal ArticleDOI
TL;DR: It is observed that scalability achieved by efficient methods does not imply biological soundness of the discovered association patterns, and vice versa, and Ideally, GAA should employ a balanced mining model taking into account best practices employed by methods reviewed in this survey.
Abstract: Establishing an association between variables is always of interest in genomic studies. Generation of DNA microarray gene expression data introduces a variety of data analysis issues not encountered in traditional molecular biology or medicine. Frequent pattern mining (FPM) has been applied successfully in business and scientific data for discovering interesting association patterns, and is becoming a promising strategy in microarray gene expression analysis. We review the most relevant FPM strategies, as well as surrounding main issues when devising efficient and practical methods for gene association analysis (GAA). We observed that, so far, scalability achieved by efficient methods does not imply biological soundness of the discovered association patterns, and vice versa. Ideally, GAA should employ a balanced mining model taking into account best practices employed by methods reviewed in this survey. Integrative approaches, in which biological knowledge plays an important role within the mining process, are becoming more reliable.

Journal ArticleDOI
TL;DR: This paper compares the high performance parallel simulator for the GPU to the simulator developed on a single CPU, and shows that the GPU is better suited than the CPU to simulate P systems due to its highly parallel nature.
Abstract: P systems or Membrane Systems provide a high-level computational modelling framework that combines the structure and dynamic aspects of biological systems in a relevant and understandable way. They are inherently parallel and non-deterministic computing devices. In this article, we discuss the motivation, design principles and key of the implementation of a simulator for the class of recognizer P systems with active membranes running on a (GPU). We compare our parallel simulator for GPUs to the simulator developed for a single central processing unit (CPU), showing that GPUs are better suited than CPUs to simulate P systems due to their highly parallel nature.

Journal ArticleDOI
TL;DR: Several of the plethora of algorithms and tools designed to analyze HTS data, including algorithms for read mapping, as well as methods for identification of single-nucleotide polymorphisms, insertions/deletions and large-scale structural variants and copy-number variants from these mappings are discussed.
Abstract: The advent of high-throughput sequencing (HTS) technologies is enabling sequencing of human genomes at a significantly lower cost. The availability of these genomes is hoped to enable novel medical diagnostics and treatment, specific to the individual, thus launching the era of personalized medicine. The data currently generated by HTS machines require extensive computational analysis in order to identify genomic variants present in the sequenced individual. In this paper, we overview HTS technologies and discuss several of the plethora of algorithms and tools designed to analyze HTS data, including algorithms for read mapping, as well as methods for identification of single-nucleotide polymorphisms, insertions/deletions and large-scale structural variants and copy-number variants from these mappings.

Journal ArticleDOI
TL;DR: It is explained how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.
Abstract: Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy for classification. Gene selection can be considered as a combinatorial search problem and thus be conveniently handled with optimization methods. In this article, we summarize some recent developments of using metaheuristic-based methods within an embedded approach for gene selection. In particular, we put forward the importance and usefulness of integrating problem-specific knowledge into the search operators of such a method. To illustrate the point, we explain how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.

Journal ArticleDOI
TL;DR: The key challenges and pitfalls to providing effective training for users of bioinformatics services are reviewed, and successful training strategies shared by a diverse set of bioInformatics trainers are discussed.
Abstract: As bioinformatics becomes increasingly central to research in the molecular life sciences, the need to train non-bioinformaticians to make the most of bioinformatics resources is growing. Here, we review the key challenges and pitfalls to providing effective training for users of bioinformatics services, and discuss successful training strategies shared by a diverse set of bioinformatics trainers. We also identify steps that trainers in bioinformatics could take together to advance the state of the art in current training practices. The ideas presented in this article derive from the first Trainer Networking Session held under the auspices of the EU-funded SLING Integrating Activity, which took place in November 2009.

Journal ArticleDOI
TL;DR: BioModels.net Web Services make one step further for the researchers to simulate and understand the entirety of a biological system, by allowing them to retrieve biological models in their own tool, combine queries in workflows and efficiently analyse models.
Abstract: Exchanging and sharing scientific results are essential for researchers in the field of computational modelling. BioModels.net defines agreed-upon standards for model curation. A fundamental one, MIRIAM (Minimum Information Requested in the Annotation of Models), standardises the annotation and curation process of quantitative models in biology. To support this standard, MIRIAM Resources maintains a set of standard data types for annotating models, and provides services for manipulating these annotations. Furthermore, BioModels.net creates controlled vocabularies, such as SBO (Systems Biology Ontology) which strictly indexes, defines and links terms used in Systems Biology. Finally, BioModels Database provides a free, centralised, publicly accessible database for storing, searching and retrieving curated and annotated computational models. Each resource provides a web interface to submit, search, retrieve and display its data. In addition, the BioModels.net team provides a set of Web Services which allows the community to programmatically access the resources. A user is then able to perform remote queries, such as retrieving a model and resolving all its MIRIAM Annotations, as well as getting the details about the associated SBO terms. These web services use established standards. Communications rely on SOAP (Simple Object Access Protocol) messages and the available queries are described in a WSDL (Web Services Description Language) file. Several libraries are provided in order to simplify the development of client software. BioModels.net Web Services make one step further for the researchers to simulate and understand the entirety of a biological system, by allowing them to retrieve biological models in their own tool, combine queries in workflows and efficiently analyse models.

Journal ArticleDOI
TL;DR: This work presents a formidable informatics problem of sorting active 'driver' mutations from inactive 'passenger' mutations in order to prioritize these for further experimental characterization in cancer genomes.
Abstract: New generations of DNA sequencing technologies are enabling the systematic study of genetic derangement in cancers. Sequencing of cancer exomes or transcriptomes or even entire cancer genomes are now possible, though technical and economic challenges remain. Cancer samples are inherently heterogeneous and are often contaminated with normal DNA, placing additional demands on informatics tools for detecting genetic variation. However, even low coverage sequencing data can provide valuable information on genetic rearrangements, amplifications and losses in tumor genomes. Novel recurrent oncogenic mutations and fusion transcripts have been discovered with these technologies. In some sequenced cancer genomes, tens of thousands of genetic alterations have been discovered. While this enables the detailed dissection of mutation classes, it also presents a formidable informatics problem of sorting active 'driver' mutations from inactive 'passenger' mutations in order to prioritize these for further experimental characterization.

Journal ArticleDOI
TL;DR: The major opportunities for broader incorporation of bioinformatics in education can be placed into three general categories: general applicability, inherent fit of bio-formatics for promoting student learning in most biology programs; and the general experience and associated comfort students have with computers and technology as mentioned in this paper.
Abstract: The major opportunities for broader incorporation of bioinformatics in education can be placed into three general categories: general applicability of bioinformatics in life science and related curricula; inherent fit of bioinformatics for promoting student learning in most biology programs; and the general experience and associated comfort students have with computers and technology. Conversely, the major challenges for broader incorporation of bioinformatics in education can be placed into three general categories: required infrastructure and logistics; instructor knowledge of bioinformatics and continuing education; and the breadth of bioinformatics, and the diversity of students and educational objectives. Broader incorporation of bioinformatics at all education levels requires overcoming the challenges to using transformative computer-requiring learning activities, assisting faculty in collecting assessment data on mastery of student learning outcomes, as well as creating more faculty development opportunities that span diverse skill levels, with an emphasis placed on providing resource materials that are kept up-to-date as the field and tools change.

Journal ArticleDOI
TL;DR: This paper describes how Designer uses universal principles of molecular biology to generate models of any arbitrary synthetic biological system, which are useful as they explain biological phenotypic complexity in mechanistic terms and can assist in designing synthetic biological systems.
Abstract: Modeling tools can play an important role in synthetic biology the same way modeling helps in other engineering disciplines: simulations can quickly probe mechanisms and provide a clear picture of how different components influence the behavior of the whole. We present a brief review of available tools and present SynBioSS Designer. The Synthetic Biology Software Suite (SynBioSS) is used for the generation, storing, retrieval and quantitative simulation of synthetic biological networks. SynBioSS consists of three distinct components: the Desktop Simulator, the Wiki, and the Designer. SynBioSS Designer takes as input molecular parts involved in gene expression and regulation (e.g. promoters, transcription factors, ribosome binding sites, etc.), and automatically generates complete networks of reactions that represent transcription, translation, regulation, induction and degradation of those parts. Effectively, Designer uses DNA sequences as input and generates networks of biomolecular reactions as output. In this paper we describe how Designer uses universal principles of molecular biology to generate models of any arbitrary synthetic biological system. These models are useful as they explain biological phenotypic complexity in mechanistic terms. In turn, such mechanistic explanations can assist in designing synthetic biological systems. We also discuss, giving practical guidance to users, how Designer interfaces with the Registry of Standard Biological Parts, the de facto compendium of parts used in synthetic biology applications.

Journal ArticleDOI
TL;DR: JAMES II, a plug-in-based open source modeling and simulation framework, facilitates the exploitation and configuration of these techniques to facilitate repeatability and reuse in dry-lab experimentation.
Abstract: Dry-lab experimentation is being increasingly used to complement wet-lab experimentation. However, conducting dry-lab experiments is a challenging endeavor that requires the combination of diverse techniques. JAMES II, a plug-in-based open source modeling and simulation framework, facilitates the exploitation and configuration of these techniques. The different aspects that form an experiment are made explicit to facilitate repeatability and reuse. Each of those influences the performance and the quality of the simulation experiment. Common experimentation pitfalls and current challenges are discussed along the way.

Journal ArticleDOI
TL;DR: The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems as mentioned in this paper.
Abstract: The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.

Journal ArticleDOI
TL;DR: It is shown that the default parameters for qualitative detection calls yield few absent calls for high spike-in concentrations, and when genes of interest are expected to be present at very low concentrations, spike- in datasets can be useful for appropriately adjusting the tuning parameters for quantitative detection calls.
Abstract: Extensive methodological research has been conducted to improve gene expression summary methods. However, in addition to quantitative gene expression summaries, most platforms, including all those examined in the MicroArray Quality Control project, provide a qualitative detection call result for each gene on the platform. These detection call algorithms are intended to render an assessment of whether or not each transcript is reliably measured. In this paper, we review uses of these qualitative detection call results in the analysis of microarray data. We also review the detection call algorithms for two widely used gene expression microarray platforms, Affymetrix GeneChips and Illumina BeadArrays, and more clearly formalize the mathematical notation for the Illumina BeadArray detection call algorithm. Both algorithms result in a P-value which is then used for determining the qualitative detection calls. We examined the performance of these detection call algorithms and default parameters by applying the methods to two spike-in datasets. We show that the default parameters for qualitative detection calls yield few absent calls for high spike-in concentrations. When genes of interest are expected to be present at very low concentrations, spike-in datasets can be useful for appropriately adjusting the tuning parameters for qualitative detection calls.

Journal ArticleDOI
TL;DR: A methodology that successfully identified 16 hybrid mRNAs which might be instances of interchromosomal trans-splicing is developed, which indicates that trans- Splicing may be more widespread than believed.
Abstract: Trans-splicing is a common phenomenon in nematodes and kinetoplastids, and it has also been reported in other organisms, including humans. Up to now, all in silico strategies to find evidence of trans-splicing in humans have required that the candidate sequences follow the consensus splicing site rules (spliceosome-mediated mechanism). However, this criterion is not supported by the best human experimental evidence, which, except in a single case, do not follow canonical splicing sites. Moreover, recent findings describe a novel alternative tRNA mediated trans-splicing mechanism, which prescinds the spliceosome machinery. In order to answer the question, 'Are there hybrid mRNAs in sequence databanks, whose characteristics resemble those of the best human experimental evidence?', we have developed a methodology that successfully identified 16 hybrid mRNAs which might be instances of interchromosomal trans-splicing. Each hybrid mRNA is formed by a trans-spliced region (TSR), which was successfully mapped either onto known genes or onto a human endogenous retrovirus (HERV-K) transcript which supports their transcription. The existence of these hybrid mRNAs indicates that trans-splicing may be more widespread than believed. Furthermore, non-canonical splice site patterns suggest that infrequent splicing sites may occur under special conditions, or that an alternative trans-splicing mechanism is involved. Finally, our candidates are supposedly from normal tissue, and a recent study has reported that trans-splicing may occur not only in malignant tissues, but in normal tissues as well. Our methodology can be applied to 5'-UTR, coding sequences and 3'-UTR in order to find new candidates for a posteriori experimental confirmation.

Journal ArticleDOI
TL;DR: The EB-eye can be accessed over the web or programmatically using a SOAP Web Services interface and its search and retrieval capabilities can be exploited in workflows and analytical pipe-lines.
Abstract: The EB-eye is a fast and efficient search engine that provides easy and uniform access to the biological data resources hosted at the EMBL-EBI. Currently, users can access information from more than 62 distinct datasets covering some 400 million entries. The data resources represented in the EB-eye include: nucleotide and protein sequences at both the genomic and proteomic levels, structures ranging from chemicals to macro-molecular complexes, gene-expression experiments, binary level molecular interactions as well as reaction maps and pathway models, functional classifications, biological ontologies, and comprehensive literature libraries covering the biomedical sciences and related intellectual property. The EB-eye can be accessed over the web or programmatically using a SOAP Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines. The EB-eye is a novel alternative to existing biological search and retrieval engines. In this article we describe in detail how to exploit its powerful capabilities.