scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Computational Biology in 2010"


Journal ArticleDOI
TL;DR: GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments, and is predicted to predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences.
Abstract: Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.

1,481 citations


Journal ArticleDOI
TL;DR: A global, network-based method for prioritizing disease genes and inferring protein complex associations, which is called PRINCE, and applies to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus.
Abstract: A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.

811 citations


Journal ArticleDOI
TL;DR: A polarizable coarse-grained water model is parameterized such that bulk water density and oil/water partitioning data remain at the same level of accuracy as for the standard MARTINI force field, and the dielectric screening of bulk water is reproduced.
Abstract: Coarse-grained (CG) simulations have become an essential tool to study a large variety of biomolecular processes, exploring temporal and spatial scales inaccessible to traditional models of atomistic resolution. One of the major simplifications of CG models is the representation of the solvent, which is either implicit or modeled explicitly as a van der Waals particle. The effect of polarization, and thus a proper screening of interactions depending on the local environment, is absent. Given the important role of water as a ubiquitous solvent in biological systems, its treatment is crucial to the properties derived from simulation studies. Here, we parameterize a polarizable coarse-grained water model to be used in combination with the CG MARTINI force field. Using a three-bead model to represent four water molecules, we show that the orientational polarizability of real water can be effectively accounted for. This has the consequence that the dielectric screening of bulk water is reproduced. At the same time, we parameterized our new water model such that bulk water density and oil/water partitioning data remain at the same level of accuracy as for the standard MARTINI force field. We apply the new model to two cases for which current CG force fields are inadequate. First, we address the transport of ions across a lipid membrane. The computed potential of mean force shows that the ions now naturally feel the change in dielectric medium when moving from the high dielectric aqueous phase toward the low dielectric membrane interior. In the second application we consider the electroporation process of both an oil slab and a lipid bilayer. The electrostatic field drives the formation of water filled pores in both cases, following a similar mechanism as seen with atomistically detailed models.

752 citations


Journal ArticleDOI
TL;DR: The simulation model successfully describes the relative thermodynamic stabilities of proteins measured in E. coli, and shows that effects additional to the commonly cited “crowding” effect must be included in attempts to understand macromolecular behavior in vivo.
Abstract: A longstanding question in molecular biology is the extent to which the behavior of macromolecules observed in vitro accurately reflects their behavior in vivo. A number of sophisticated experimental techniques now allow the behavior of individual types of macromolecule to be studied directly in vivo; none, however, allow a wide range of molecule types to be observed simultaneously. In order to tackle this issue we have adopted a computational perspective, and, having selected the model prokaryote Escherichia coli as a test system, have assembled an atomically detailed model of its cytoplasmic environment that includes 50 of the most abundant types of macromolecules at experimentally measured concentrations. Brownian dynamics (BD) simulations of the cytoplasm model have been calibrated to reproduce the translational diffusion coefficients of Green Fluorescent Protein (GFP) observed in vivo, and “snapshots” of the simulation trajectories have been used to compute the cytoplasm's effects on the thermodynamics of protein folding, association and aggregation events. The simulation model successfully describes the relative thermodynamic stabilities of proteins measured in E. coli, and shows that effects additional to the commonly cited “crowding” effect must be included in attempts to understand macromolecular behavior in vivo.

682 citations


Journal ArticleDOI
TL;DR: A combination of two further approaches: family level inference and Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure are proposed.
Abstract: Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This "best model" approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data.

680 citations


Journal ArticleDOI
TL;DR: A concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and a few representative studies illustrating different facets of recent scientific discoveries made using meetagenomics are provided.
Abstract: Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.

657 citations


Journal ArticleDOI
TL;DR: 8 paginas, 3 figuras, 1 tabla.-- This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Abstract: 8 paginas, 3 figuras, 1 tabla.-- This is an open-access article distributed under the terms of the Creative Commons Attribution License.

568 citations


Journal ArticleDOI
TL;DR: It is found that community structure has a major impact on disease dynamics, and it is shown that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals.
Abstract: The dynamics of infectious diseases spread via direct person-to-person transmission (such as influenza, smallpox, HIV/AIDS, etc.) depends on the underlying host contact network. Human contact networks exhibit strong community structure. Understanding how such community structure affects epidemics may provide insights for preventing the spread of disease between communities by changing the structure of the contact network through pharmaceutical or non-pharmaceutical interventions. We use empirical and simulated networks to investigate the spread of disease in networks with community structure. We find that community structure has a major impact on disease dynamics, and we show that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals. Because the structure of relevant contact networks is generally not known, and vaccine supply is often limited, there is great need for efficient vaccination algorithms that do not require full knowledge of the network. We developed an algorithm that acts only on locally available network information and is able to quickly identify targets for successful immunization intervention. The algorithm generally outperforms existing algorithms when vaccine supply is limited, particularly in networks with strong community structure. Understanding the spread of infectious diseases and designing optimal control strategies is a major goal of public health. Social networks show marked patterns of community structure, and our results, based on empirical and simulated data, demonstrate that community structure strongly affects disease dynamics. These results have implications for the design of control strategies.

548 citations


Journal ArticleDOI
TL;DR: VBQTL is presented, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors and is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternatives.
Abstract: Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/.

468 citations


Journal ArticleDOI
TL;DR: The development of NeuroML as a common description language for biophysically detailed neuronal and network models enables interoperability across multiple simulation environments, thereby improving model transparency, accessibility and reuse in computational neuroscience.
Abstract: Biologically detailed single neuron and network models are important for understanding how ion channels, synapses and anatomical connectivity underlie the complex electrical behavior of the brain. While neuronal simulators such as NEURON, GENESIS, MOOSE, NEST, and PSICS facilitate the development of these data-driven neuronal models, the specialized languages they employ are generally not interoperable, limiting model accessibility and preventing reuse of model components and cross-simulator validation. To overcome these problems we have used an Open Source software approach to develop NeuroML, a neuronal model description language based on XML (Extensible Markup Language). This enables these detailed models and their components to be defined in a standalone form, allowing them to be used across multiple simulators and archived in a standardized format. Here we describe the structure of NeuroML and demonstrate its scope by converting into NeuroML models of a number of different voltage- and ligand-gated conductances, models of electrical coupling, synaptic transmission and short-term plasticity, together with morphologically detailed models of individual neurons. We have also used these NeuroML-based components to develop an highly detailed cortical network model. NeuroML-based model descriptions were validated by demonstrating similar model behavior across five independently developed simulators. Although our results confirm that simulations run on different simulators converge, they reveal limits to model interoperability, by showing that for some models convergence only occurs at high levels of spatial and temporal discretisation, when the computational overhead is high. Our development of NeuroML as a common description language for biophysically detailed neuronal and network models enables interoperability across multiple simulation environments, thereby improving model transparency, accessibility and reuse in computational neuroscience.

437 citations


Journal ArticleDOI
TL;DR: It is shown that human brain structural networks, and the nervous system of the nematode C. elegans, also obey Rent's rule, and exhibit some degree of hierarchical modularity, suggesting that these principles of nervous system design are highly conserved.
Abstract: Nervous systems are information processing networks that evolved by natural selection, whereas very large scale integrated (VLSI) computer circuits have evolved by commercially driven technology development. Here we follow historic intuition that all physical information processing systems will share key organizational properties, such as modularity, that generally confer adaptivity of function. It has long been observed that modular VLSI circuits demonstrate an isometric scaling relationship between the number of processing elements and the number of connections, known as Rent's rule, which is related to the dimensionality of the circuit's interconnect topology and its logical capacity. We show that human brain structural networks, and the nervous system of the nematode C. elegans, also obey Rent's rule, and exhibit some degree of hierarchical modularity. We further show that the estimated Rent exponent of human brain networks, derived from MRI data, can explain the allometric scaling relations between gray and white matter volumes across a wide range of mammalian species, again suggesting that these principles of nervous system design are highly conserved. For each of these fractal modular networks, the dimensionality of the interconnect topology was greater than the 2 or 3 Euclidean dimensions of the space in which it was embedded. This relatively high complexity entailed extra cost in physical wiring: although all networks were economically or cost-efficiently wired they did not strictly minimize wiring costs. Artificial and biological information processing systems both may evolve to optimize a trade-off between physical cost and topological complexity, resulting in the emergence of homologous principles of economical, fractal and modular design across many different kinds of nervous and computational networks.

Journal ArticleDOI
TL;DR: The OptForce procedure is introduced that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target and reveals non-intuitive ones that boost succinate production.
Abstract: Computational procedures for predicting metabolic interventions leading to the overproduction of biochemicals in microbial strains are widely in use. However, these methods rely on surrogate biological objectives (e.g., maximize growth rate or minimize metabolic adjustments) and do not make use of flux measurements often available for the wild-type strain. In this work, we introduce the OptForce procedure that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target. We hierarchically apply this classification rule for pairs, triples, quadruples, etc. of reactions. This leads to the identification of a sufficient and non-redundant set of fluxes that must change (i.e., MUST set) to meet a pre-specified overproduction target. Starting with this set we subsequently extract a minimal set of fluxes that must actively be forced through genetic manipulations (i.e., FORCE set) to ensure that all fluxes in the network are consistent with the overproduction objective. We demonstrate our OptForce framework for succinate production in Escherichia coli using the most recent in silico E. coli model, iAF1260. The method not only recapitulates existing engineering strategies but also reveals non-intuitive ones that boost succinate production by performing coordinated changes on pathways distant from the last steps of succinate synthesis.

Journal ArticleDOI
TL;DR: Among the three cortical networks, the greatest clustering coefficient and the longest absolute path length in AD are found, which might indicate that the organization of the cortical network was the least optimal in AD.
Abstract: Recently, many researchers have used graph theory to study the aberrant brain structures in Alzheimer's disease (AD) and have made great progress. However, the characteristics of the cortical network in Mild Cognitive Impairment (MCI) are still largely unexplored. In this study, the gray matter volumes obtained from magnetic resonance imaging (MRI) for all brain regions except the cerebellum were parcellated into 90 areas using the automated anatomical labeling (AAL) template to construct cortical networks for 98 normal controls (NCs), 113 MCIs and 91 ADs. The measurements of the network properties were calculated for each of the three groups respectively. We found that all three cortical networks exhibited small-world properties and those strong interhemispheric correlations existed between bilaterally homologous regions. Among the three cortical networks, we found the greatest clustering coefficient and the longest absolute path length in AD, which might indicate that the organization of the cortical network was the least optimal in AD. The small-world measures of the MCI network exhibited intermediate values. This finding is logical given that MCI is considered to be the transitional stage between normal aging and AD. Out of all the between-group differences in the clustering coefficient and absolute path length, only the differences between the AD and normal control groups were statistically significant. Compared with the normal controls, the MCI and AD groups retained their hub regions in the frontal lobe but showed a loss of hub regions in the temporal lobe. In addition, altered interregional correlations were detected in the parahippocampus gyrus, medial temporal lobe, cingulum, fusiform, medial frontal lobe, and orbital frontal gyrus in groups with MCI and AD. Similar to previous studies of functional connectivity, we also revealed increased interregional correlations within the local brain lobes and disrupted long distance interregional correlations in groups with MCI and AD.

Journal ArticleDOI
TL;DR: A differential-game is formulates to identify how individuals would best use social distancing and related self-protective behaviors during an epidemic and shows how the window of opportunity for vaccine development lengthens as the efficiency of social distanced and detection improve.
Abstract: Social distancing practices are changes in behavior that prevent disease transmission by reducing contact rates between susceptible individuals and infected individuals who may transmit the disease. Social distancing practices can reduce the severity of an epidemic, but the benefits of social distancing depend on the extent to which it is used by individuals. Individuals are sometimes reluctant to pay the costs inherent in social distancing, and this can limit its effectiveness as a control measure. This paper formulates a differential-game to identify how individuals would best use social distancing and related self-protective behaviors during an epidemic. The epidemic is described by a simple, well-mixed ordinary differential equation model. We use the differential game to study potential value of social distancing as a mitigation measure by calculating the equilibrium behaviors under a variety of cost-functions. Numerical methods are used to calculate the total costs of an epidemic under equilibrium behaviors as a function of the time to mass vaccination, following epidemic identification. The key parameters in the analysis are the basic reproduction number and the baseline efficiency of social distancing. The results show that social distancing is most beneficial to individuals for basic reproduction numbers around 2. In the absence of vaccination or other intervention measures, optimal social distancing never recovers more than 30% of the cost of infection. We also show how the window of opportunity for vaccine development lengthens as the efficiency of social distancing and detection improve.

Journal ArticleDOI
TL;DR: A formalism is proposed, derived from the expression of dendritic arborizations as locally optimized graphs, which can capture the general features of neuronal branching and is inspired by Ramón y Cajal's laws of conservation of cytoplasm and conduction time in neural circuitry.
Abstract: Understanding the principles governing axonal and dendritic branching is essential for unravelling the functionality of single neurons and the way in which they connect. Nevertheless, no formalism has yet been described which can capture the general features of neuronal branching. Here we propose such a formalism, which is derived from the expression of dendritic arborizations as locally optimized graphs. Inspired by Ramon y Cajal's laws of conservation of cytoplasm and conduction time in neural circuitry, we show that this graphical representation can be used to optimize these variables. This approach allows us to generate synthetic branching geometries which replicate morphological features of any tested neuron. The essential structure of a neuronal tree is thereby captured by the density profile of its spanning field and by a single parameter, a balancing factor weighing the costs for material and conduction time. This balancing factor determines a neuron's electrotonic compartmentalization. Additions to this rule, when required in the construction process, can be directly attributed to developmental processes or a neuron's computational role within its neural circuit. The simulations presented here are implemented in an open-source software package, the “TREES toolbox,” which provides a general set of tools for analyzing, manipulating, and generating dendritic structure, including a tool to create synthetic members of any particular cell group and an approach for a model-based supervised automatic morphological reconstruction from fluorescent image stacks. These approaches provide new insights into the constraints governing dendritic architectures. They also provide a novel framework for modelling and analyzing neuronal branching structures and for constructing realistic synthetic neural networks.

Journal ArticleDOI
TL;DR: The observed properties suggest that the brain has evolved a balance that optimizes information-processing efficiency across different classes of specialized areas as well as mechanisms to modulate coupling in support of dynamically changing processing demands.
Abstract: Information processing in the human brain arises from both interactions between adjacent areas and from distant projections that form distributed brain systems. Here we map interactions across different spatial scales by estimating the degree of intrinsic functional connectivity for the local (≤14 mm) neighborhood directly surrounding brain regions as contrasted with distant (>14 mm) interactions. The balance between local and distant functional interactions measured at rest forms a map that separates sensorimotor cortices from heteromodal association areas and further identifies regions that possess both high local and distant cortical-cortical interactions. Map estimates of network measures demonstrate that high local connectivity is most often associated with a high clustering coefficient, long path length, and low physical cost. Task performance changed the balance between local and distant functional coupling in a subset of regions, particularly, increasing local functional coupling in regions engaged by the task. The observed properties suggest that the brain has evolved a balance that optimizes information-processing efficiency across different classes of specialized areas as well as mechanisms to modulate coupling in support of dynamically changing processing demands. We discuss the implications of these observations and applications of the present method for exploring normal and atypical brain function.

Journal ArticleDOI
TL;DR: It is argued that cooperative and exploitative cell lineages will spontaneously segregate in space under a wide range of conditions and, therefore, that cellular cooperation may evolve more readily than naively expected.
Abstract: On its own, a single cell cannot exert more than a microscopic influence on its immediate surroundings. However, via strength in numbers and the expression of cooperative phenotypes, such cells can enormously impact their environments. Simple cooperative phenotypes appear to abound in the microbial world, but explaining their evolution is challenging because they are often subject to exploitation by rapidly growing, non-cooperative cell lines. Population spatial structure may be critical for this problem because it influences the extent of interaction between cooperative and non-cooperative individuals. It is difficult for cooperative cells to succeed in competition if they become mixed with non-cooperative cells, which can exploit the public good without themselves paying a cost. However, if cooperative cells are segregated in space and preferentially interact with each other, they may prevail. Here we use a multi-agent computational model to study the origin of spatial structure within growing cell groups. Our simulations reveal that the spatial distribution of genetic lineages within these groups is linked to a small number of physical and biological parameters, including cell growth rate, nutrient availability, and nutrient diffusivity. Realistic changes in these parameters qualitatively alter the emergent structure of cell groups, and thereby determine whether cells with cooperative phenotypes can locally and globally outcompete exploitative cells. We argue that cooperative and exploitative cell lineages will spontaneously segregate in space under a wide range of conditions and, therefore, that cellular cooperation may evolve more readily than naively expected.

Journal ArticleDOI
TL;DR: Analysis of β-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure and urge caution in the design and interpretation of analyses using pyrosequencing data.
Abstract: Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of β-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results urge caution in the design and interpretation of analyses using pyrosequencing data.

Journal ArticleDOI
TL;DR: The findings suggest that the consideration of punishment strategies allows one to understand the establishment and spreading of “moral behavior” by means of game-theoretical concepts, demonstrating that quantitative biological modeling approaches are powerful even in domains that have been addressed with non-mathematical concepts so far.
Abstract: Situations where individuals have to contribute to joint efforts or share scarce resources are ubiquitous. Yet, without proper mechanisms to ensure cooperation, the evolutionary pressure to maximize individual success tends to create a tragedy of the commons (such as over-fishing or the destruction of our environment). This contribution addresses a number of related puzzles of human behavior with an evolutionary game theoretical approach as it has been successfully used to explain the behavior of other biological species many times, from bacteria to vertebrates. Our agent-based model distinguishes individuals applying four different behavioral strategies: non-cooperative individuals (“defectors”), cooperative individuals abstaining from punishment efforts (called “cooperators” or “second-order free-riders”), cooperators who punish non-cooperative behavior (“moralists”), and defectors, who punish other defectors despite being non-cooperative themselves (“immoralists”). By considering spatial interactions with neighboring individuals, our model reveals several interesting effects: First, moralists can fully eliminate cooperators. This spreading of punishing behavior requires a segregation of behavioral strategies and solves the “second-order free-rider problem”. Second, the system behavior changes its character significantly even after very long times (“who laughs last laughs best effect”). Third, the presence of a number of defectors can largely accelerate the victory of moralists over non-punishing cooperators. Fourth, in order to succeed, moralists may profit from immoralists in a way that appears like an “unholy collaboration”. Our findings suggest that the consideration of punishment strategies allows one to understand the establishment and spreading of “moral behavior” by means of game-theoretical concepts. This demonstrates that quantitative biological modeling approaches are powerful even in domains that have been addressed with non-mathematical concepts so far. The complex dynamics of certain social behaviors become understandable as the result of an evolutionary competition between different behavioral strategies.

Journal ArticleDOI
TL;DR: A new stochastic model of the spread of influenza across a large population is developed, which has realistic social contact networks, and transmission and infections are based on the current state of knowledge of the natural history of influenza.
Abstract: Mathematical and computer models of epidemics have contributed to our understanding of the spread of infectious disease and the measures needed to contain or mitigate them. To help prepare for future influenza seasonal epidemics or pandemics, we developed a new stochastic model of the spread of influenza across a large population. Individuals in this model have realistic social contact networks, and transmission and infections are based on the current state of knowledge of the natural history of influenza. The model has been calibrated so that outcomes are consistent with the 1957/1958 Asian A(H2N2) and 2009 pandemic A(H1N1) influenza viruses. We present examples of how this model can be used to study the dynamics of influenza epidemics in the United States and simulate how to mitigate or delay them using pharmaceutical interventions and social distancing measures. Computer simulation models play an essential role in informing public policy and evaluating pandemic preparedness plans. We have made the source code of this model publicly available to encourage its use and further development.

Journal ArticleDOI
TL;DR: This model showed that secreted Bar1 protease might help a cell identify the fittest mating partner by sharpening the pheromone concentration gradient, and found that Smoldyn was in many cases more accurate, more computationally efficient, and easier to use.
Abstract: Most cellular processes depend on intracellular locations and random collisions of individual protein molecules. To model these processes, we developed algorithms to simulate the diffusion, membrane interactions, and reactions of individual molecules, and implemented these in the Smoldyn program. Compared to the popular MCell and ChemCell simulators, we found that Smoldyn was in many cases more accurate, more computationally efficient, and easier to use. Using Smoldyn, we modeled pheromone response system signaling among yeast cells of opposite mating type. This model showed that secreted Bar1 protease might help a cell identify the fittest mating partner by sharpening the pheromone concentration gradient. This model involved about 200,000 protein molecules, about 7000 cubic microns of volume, and about 75 minutes of simulated time; it took about 10 hours to run. Over the next several years, as faster computers become available, Smoldyn will allow researchers to model and explore systems the size of entire bacterial and smaller eukaryotic cells.

Journal ArticleDOI
TL;DR: This study redefines and reclassifies the domains of PfEMP1 from seven genomes, and hopes this comprehensive categorization will provide a platform for future studies on var/PfEMP1 expression and function.
Abstract: The var gene encoded hyper-variable Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) family mediates cytoadhesion of infected erythrocytes to human endothelium. Antibodies blocking cytoadhesion are important mediators of malaria immunity acquired by endemic populations. The development of a PfEMP1 based vaccine mimicking natural acquired immunity depends on a thorough understanding of the evolved PfEMP1 diversity, balancing antigenic variation against conserved receptor binding affinities. This study redefines and reclassifies the domains of PfEMP1 from seven genomes. Analysis of domains in 399 different PfEMP1 sequences allowed identification of several novel domain classes, and a high degree of PfEMP1 domain compositional order, including conserved domain cassettes not always associated with the established group A–E division of PfEMP1. A novel iterative homology block (HB) detection method was applied, allowing identification of 628 conserved minimal PfEMP1 building blocks, describing on average 83% of a PfEMP1 sequence. Using the HBs, similarities between domain classes were determined, and Duffy binding-like (DBL) domain subclasses were found in many cases to be hybrids of major domain classes. Related to this, a recombination hotspot was uncovered between DBL subdomains S2 and S3. The VarDom server is introduced, from which information on domain classes and homology blocks can be retrieved, and new sequences can be classified. Several conserved sequence elements were found, including: (1) residues conserved in all DBL domains predicted to interact and hold together the three DBL subdomains, (2) potential integrin binding sites in DBLα domains, (3) an acylation motif conserved in group A var genes suggesting N-terminal N-myristoylation, (4) PfEMP1 inter-domain regions proposed to be elastic disordered structures, and (5) several conserved predicted phosphorylation sites. Ideally, this comprehensive categorization of PfEMP1 will provide a platform for future studies on var/PfEMP1 expression and function.

Journal ArticleDOI
TL;DR: This work shows how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters, and shows how the same tools can be used to discriminate among alternate models of the same biological process.
Abstract: A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection.

Journal ArticleDOI
TL;DR: The first quantitative framework to compare and contrast diseases by an integrated analysis of disease-related mRNA expression data and the human protein interaction network is presented, leading to 138 significant similarities between diseases.
Abstract: Current work in elucidating relationships between diseases has largely been based on pre-existing knowledge of disease genes. Consequently, these studies are limited in their discovery of new and unknown disease relationships. We present the first quantitative framework to compare and contrast diseases by an integrated analysis of disease-related mRNA expression data and the human protein interaction network. We identified 4,620 functional modules in the human protein network and provided a quantitative metric to record their responses in 54 diseases leading to 138 significant similarities between diseases. Fourteen of the significant disease correlations also shared common drugs, supporting the hypothesis that similar diseases can be treated by the same drugs, allowing us to make predictions for new uses of existing drugs. Finally, we also identified 59 modules that were dysregulated in at least half of the diseases, representing a common disease-state "signature". These modules were significantly enriched for genes that are known to be drug targets. Interestingly, drugs known to target these genes/proteins are already known to treat significantly more diseases than drugs targeting other genes/proteins, highlighting the importance of these core modules as prime therapeutic opportunities.

Journal ArticleDOI
TL;DR: This work uses genome-scale stoichiometric models of metabolism to identify media that can sustain growth for a pair of species, but fail to do so for one or both individual species, thereby inducing putative symbiotic interactions.
Abstract: Interactions between microbial species are sometimes mediated by the exchange of small molecules, secreted by one species and metabolized by another. Both one-way (commensal) and two-way (mutualistic) interactions may contribute to complex networks of interdependencies. Understanding these interactions constitutes an open challenge in microbial ecology, with applications ranging from the human microbiome to environmental sustainability. In parallel to natural communities, it is possible to explore interactions in artificial microbial ecosystems, e.g. pairs of genetically engineered mutualistic strains. Here we computationally generate artificial microbial ecosystems without re-engineering the microbes themselves, but rather by predicting their growth on appropriately designed media. We use genome-scale stoichiometric models of metabolism to identify media that can sustain growth for a pair of species, but fail to do so for one or both individual species, thereby inducing putative symbiotic interactions. We first tested our approach on two previously studied mutualistic pairs, and on a pair of highly curated model organisms, showing that our algorithms successfully recapitulate known interactions, robustly predict new ones, and provide novel insight on exchanged molecules. We then applied our method to all possible pairs of seven microbial species, and found that it is always possible to identify putative media that induce commensalism or mutualism. Our analysis also suggests that symbiotic interactions may arise more readily through environmental fluctuations than genetic modifications. We envision that our approach will help generate microbe-microbe interaction maps useful for understanding microbial consortia dynamics and evolution, and for exploring the full potential of natural metabolic pathways for metabolic engineering applications.

Journal ArticleDOI
TL;DR: It is concluded that reduced stability of the mRNA secondary structure near the start codon is a universal feature of all cellular life and the origin of this reduction is selection for efficient recognition of the startcodon by initiator-tRNA.
Abstract: Recent studies have suggested that the thermodynamic stability of mRNA secondary structure near the start codon can regulate translation efficiency in Escherichia coli, and that translation is more efficient the less stable the secondary structure. We survey the complete genomes of 340 species for signals of reduced mRNA secondary structure near the start codon. Our analysis includes bacteria, archaea, fungi, plants, insects, fishes, birds, and mammals. We find that nearly all species show evidence for reduced mRNA stability near the start codon. The reduction in stability generally increases with increasing genomic GC content. In prokaryotes, the reduction also increases with decreasing optimal growth temperature. Within genomes, there is variation in the stability among genes, and this variation correlates with gene GC content, codon bias, and gene expression level. For birds and mammals, however, we do not find a genome-wide trend of reduced mRNA stability near the start codon. Yet the most GC rich genes in these organisms do show such a signal. We conclude that reduced stability of the mRNA secondary structure near the start codon is a universal feature of all cellular life. We suggest that the origin of this reduction is selection for efficient recognition of the start codon by initiator-tRNA.

Journal ArticleDOI
TL;DR: It is demonstrated that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2.
Abstract: Metazoan genomes encode hundreds of RNA-binding proteins (RBPs). These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.

Journal ArticleDOI
TL;DR: In the single compartment models studied here, AP energy consumption varies greatly among vertebrate and invertebrate neurons, with several mammalian neuron models using close to the capacitive minimum of energy needed.
Abstract: The initiation and propagation of action potentials (APs) places high demands on the energetic resources of neural tissue. Each AP forces ATP-driven ion pumps to work harder to restore the ionic concentration gradients, thus consuming more energy. Here, we ask whether the ionic currents underlying the AP can be predicted theoretically from the principle of minimum energy consumption. A long-held supposition that APs are energetically wasteful, based on theoretical analysis of the squid giant axon AP, has recently been overturned by studies that measured the currents contributing to the AP in several mammalian neurons. In the single compartment models studied here, AP energy consumption varies greatly among vertebrate and invertebrate neurons, with several mammalian neuron models using close to the capacitive minimum of energy needed. Strikingly, energy consumption can increase by more than ten-fold simply by changing the overlap of the Na(+) and K(+) currents during the AP without changing the APs shape. As a consequence, the height and width of the AP are poor predictors of energy consumption. In the Hodgkin-Huxley model of the squid axon, optimizing the kinetics or number of Na(+) and K(+) channels can whittle down the number of ATP molecules needed for each AP by a factor of four. In contrast to the squid AP, the temporal profile of the currents underlying APs of some mammalian neurons are nearly perfectly matched to the optimized properties of ionic conductances so as to minimize the ATP cost.

Journal ArticleDOI
TL;DR: The hypothesis that G4 DNA has in vivo functions that are under evolutionary constraint is supported, as the nucleotide-level conservation patterns suggested that the motif conservation was the result of the formation of G4DNA structures.
Abstract: G-quadruplex DNA is a four-stranded DNA structure formed by non-Watson-Crick base pairing between stacked sets of four guanines. Many possible functions have been proposed for this structure, but its in vivo role in the cell is still largely unresolved. We carried out a genome-wide survey of the evolutionary conservation of regions with the potential to form G-quadruplex DNA structures (G4 DNA motifs) across seven yeast species. We found that G4 DNA motifs were significantly more conserved than expected by chance, and the nucleotide-level conservation patterns suggested that the motif conservation was the result of the formation of G4 DNA structures. We characterized the association of conserved and non-conserved G4 DNA motifs in Saccharomyces cerevisiae with more than 40 known genome features and gene classes. Our comprehensive, integrated evolutionary and functional analysis confirmed the previously observed associations of G4 DNA motifs with promoter regions and the rDNA, and it identified several previously unrecognized associations of G4 DNA motifs with genomic features, such as mitotic and meiotic double-strand break sites (DSBs). Conserved G4 DNA motifs maintained strong associations with promoters and the rDNA, but not with DSBs. We also performed the first analysis of G4 DNA motifs in the mitochondria, and surprisingly found a tenfold higher concentration of the motifs in the AT-rich yeast mitochondrial DNA than in nuclear DNA. The evolutionary conservation of the G4 DNA motif and its association with specific genome features supports the hypothesis that G4 DNA has in vivo functions that are under evolutionary constraint.

Journal ArticleDOI
TL;DR: Through the use of contraction theory, a powerful tool from dynamical systems theory, it is shown that certain systems driven by external periodic signals have the property that all their solutions converge to a fixed limit cycle.
Abstract: This paper addresses the problem of providing mathematical conditions that allow one to ensure that biological networks, such as transcriptional systems, can be globally entrained to external periodic inputs. Despite appearing obvious at first, this is by no means a generic property of nonlinear dynamical systems. Through the use of contraction theory, a powerful tool from dynamical systems theory, it is shown that certain systems driven by external periodic signals have the property that all their solutions converge to a fixed limit cycle. General results are proved, and the properties are verified in the specific cases of models of transcriptional systems as well as constructs of interest in synthetic biology. A self-contained exposition of all needed results is given in the paper.