scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Computational Biology in 2007"


Journal ArticleDOI
TL;DR: Efficiency was reduced disproportionately to cost in older people, and the detrimental effects of age on efficiency were localised to frontal and temporal cortical and subcortical regions.
Abstract: Brain anatomical networks are sparse, complex, and have economical small-world properties. We investigated the efficiency and cost of human brain functional networks measured using functional magnetic resonance imaging (fMRI) in a factorial design: two groups of healthy old (N = 11; mean age = 66.5 years) and healthy young (N = 15; mean age = 24.7 years) volunteers were each scanned twice in a no-task or “resting” state following placebo or a single dose of a dopamine receptor antagonist (sulpiride 400 mg). Functional connectivity between 90 cortical and subcortical regions was estimated by wavelet correlation analysis, in the frequency interval 0.06–0.11 Hz, and thresholded to construct undirected graphs. These brain functional networks were small-world and economical in the sense of providing high global and local efficiency of parallel information processing for low connection cost. Efficiency was reduced disproportionately to cost in older people, and the detrimental effects of age on efficiency were localised to frontal and temporal cortical and subcortical regions. Dopamine antagonism also impaired global and local efficiency of the network, but this effect was differentially localised and did not interact with the effect of age. Brain functional networks have economical small-world properties—supporting efficient parallel information transfer at relatively low cost—which are differently impaired by normal aging and pharmacological blockade of dopamine transmission.

2,208 citations


Journal ArticleDOI
TL;DR: The implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences, is demonstrated, demonstrating that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications.
Abstract: Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.

687 citations


Journal ArticleDOI
TL;DR: This tutorial discusses the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data in the field of supervised learning in R, the open source data analysis and visualization language.
Abstract: The term machine learning refers to a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. Two facets of mechanization should be acknowledged when considering machine learning in broad terms. Firstly, it is intended that the classification and prediction tasks can be accomplished by a suitably programmed computing machine. That is, the product of machine learning is a classifier that can be feasibly used on available hardware. Secondly, it is intended that the creation of the classifier should itself be highly mechanized, and should not involve too much human input. This second facet is inevitably vague, but the basic objective is that the use of automatic algorithm construction methods can minimize the possibility that human biases could affect the selection and performance of the algorithm. Both the creation of the algorithm and its operation to classify objects or predict events are to be based on concrete, observable data. The history of relations between biology and the field of machine learning is long and complex. An early technique [1] for machine learning called the perceptron constituted an attempt to model actual neuronal behavior, and the field of artificial neural network (ANN) design emerged from this attempt. Early work on the analysis of translation initiation sequences [2] employed the perceptron to define criteria for start sites in Escherichia coli. Further artificial neural network architectures such as the adaptive resonance theory (ART) [3] and neocognitron [4] were inspired from the organization of the visual nervous system. In the intervening years, the flexibility of machine learning techniques has grown along with mathematical frameworks for measuring their reliability, and it is natural to hope that machine learning methods will improve the efficiency of discovery and understanding in the mounting volume and complexity of biological data. This tutorial is structured in four main components. Firstly, a brief section reviews definitions and mathematical prerequisites. Secondly, the field of supervised learning is described. Thirdly, methods of unsupervised learning are reviewed. Finally, a section reviews methods and examples as implemented in the open source data analysis and visualization language R (http://www.r-project.org).

523 citations


Journal ArticleDOI
TL;DR: Local and global architectural features of the mammalian miR regulatory network are uncovered and hundreds of “target hubs” genes, each potentially subject to massive regulation by dozens of miRs are revealed, providing new insights on the architecture of the combined transcriptional–post transcriptional regulatory network.
Abstract: microRNAs (miRs) are small RNAs that regulate gene expression at the posttranscriptional level. It is anticipated that, in combination with transcription factors (TFs), they span a regulatory network that controls thousands of mammalian genes. Here we set out to uncover local and global architectural features of the mammalian miR regulatory network. Using evolutionarily conserved potential binding sites of miRs in human targets, and conserved binding sites of TFs in promoters, we uncovered two regulation networks. The first depicts combinatorial interactions between pairs of miRs with many shared targets. The network reveals several levels of hierarchy, whereby a few miRs interact with many other lowly connected miR partners. We revealed hundreds of “target hubs” genes, each potentially subject to massive regulation by dozens of miRs. Interestingly, many of these target hub genes are transcription regulators and they are often related to various developmental processes. The second network consists of miR–TF pairs that coregulate large sets of common targets. We discovered that the network consists of several recurring motifs. Most notably, in a significant fraction of the miR–TF coregulators the TF appears to regulate the miR, or to be regulated by the miR, forming a diversity of feed-forward loops. Together these findings provide new insights on the architecture of the combined transcriptional–post transcriptional regulatory network.

505 citations


Journal ArticleDOI
TL;DR: A new mathematical model is developed for the somatic evolution of colorectal cancers that predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%.
Abstract: Cancer results from genetic alterations that disturb the normal cooperative behavior of cells. Recent high-throughput genomic studies of cancer cells have shown that the mutational landscape of cancer is complex and that individual cancers may evolve through mutations in as many as 20 different cancer-associated genes. We use data published by Sjoblom et al. (2006) to develop a new mathematical model for the somatic evolution of colorectal cancers. We employ the Wright-Fisher process for exploring the basic parameters of this evolutionary process and derive an analytical approximation for the expected waiting time to the cancer phenotype. Our results highlight the relative importance of selection over both the size of the cell population at risk and the mutation rate. The model predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%. Increased mutation rates due to genetic instability would allow even smaller selective advantages during tumorigenesis. The complexity of cancer progression can be understood as the result of multiple sequential mutations, each of which has a relatively small but positive effect on net cell growth.

432 citations


Journal ArticleDOI
TL;DR: A model of transcriptional regulation networks, in which millions of different network topologies are explored, shows that connectedness and evolvability of robust networks may be a general organizational principle of biological networks.
Abstract: The topology of cellular circuits (the who-interacts-with-whom) is key to understand their robustness to both mutations and noise. The reason is that many biochemical parameters driving circuit behavior vary extensively and are thus not fine-tuned. Existing work in this area asks to what extent the function of any one given circuit is robust. But is high robustness truly remarkable, or would it be expected for many circuits of similar topology? And how can high robustness come about through gradual Darwinian evolution that changes circuit topology gradually, one interaction at a time? We here ask these questions for a model of transcriptional regulation networks, in which we explore millions of different network topologies. Robustness to mutations and noise are correlated in these networks. They show a skewed distribution, with a very small number of networks being vastly more robust than the rest. All networks that attain a given gene expression state can be organized into a graph whose nodes are networks that differ in their topology. Remarkably, this graph is connected and can be easily traversed by gradual changes of network topologies. Thus, robustness is an evolvable property. This connectedness and evolvability of robust networks may be a general organizational principle of biological networks. In addition, it exists also for RNA and protein structures, and may thus be a general organizational principle of all biological systems.

406 citations


Journal ArticleDOI
TL;DR: Different approaches to predict protein interaction partners as well as recent achievements in the prediction of specific domains mediating protein-protein interactions are described.
Abstract: Recent advances in high-throughput experimental methods for the identification of protein interactions have resulted in a large amount of diverse data that are somewhat incomplete and contradictory. As valuable as they are, such experimental approaches studying protein interactomes have certain limitations that can be complemented by the computational methods for predicting protein interactions. In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions. We discuss the applicability of computational methods to different types of prediction problems and point out limitations common to all of them.

379 citations


Journal ArticleDOI
TL;DR: This review describes different experimental techniques of protein interaction identification together with various databases which attempt to classify the large array of experimental data and presents several approaches to verify and validate the diverse experimental data produced by high-throughput techniques.
Abstract: Proteins interact with each other in a highly specific manner, and protein interactions play a key role in many cellular processes; in particular, the distortion of protein interfaces may lead to the development of many diseases. To understand the mechanisms of protein recognition at the molecular level and to unravel the global picture of protein interactions in the cell, different experimental techniques have been developed. Some methods characterize individual protein interactions while others are advanced for screening interactions on a genome-wide scale. In this review we describe different experimental techniques of protein interaction identification together with various databases which attempt to classify the large array of experimental data. We discuss the main promises and pitfalls of different methods and present several approaches to verify and validate the diverse experimental data produced by high-throughput techniques.

333 citations


Journal ArticleDOI
TL;DR: This primer aims to introduce BNs to the computational biologist, focusing on the concepts behind methods for learning the parameters and structure of models, at a time when they are becoming the machine learning method of choice.
Abstract: Bayesian networks (BNs) provide a neat and compact representation for expressing joint probability distributions (JPDs) and for inference. They are becoming increasingly important in the biological sciences for the tasks of inferring cellular networks [1], modelling protein signalling pathways [2], systems biology, data integration [3], classification [4], and genetic data analysis [5]. The representation and use of probability theory makes BNs suitable for combining domain knowledge and data, expressing causal relationships, avoiding overfitting a model to training data, and learning from incomplete datasets. The probabilistic formalism provides a natural treatment for the stochastic nature of biological systems and measurements. This primer aims to introduce BNs to the computational biologist, focusing on the concepts behind methods for learning the parameters and structure of models, at a time when they are becoming the machine learning method of choice. There are many applications in biology where we wish to classify data; for example, gene function prediction. To solve such problems, a set of rules are required that can be used for prediction, but often such knowledge is unavailable, or in practice there turn out to be many exceptions to the rules or so many rules that this approach produces poor results. Machine learning approaches often produce better results, where a large number of examples (the training set) is used to adapt the parameters of a model that can then be used for performing predictions or classifications on data. There are many different types of models that may be required and many different approaches to training the models, each with its pros and cons. An excellent overview of the topic can be found in [6] and [7]. Neural networks, for example, are often able to learn a model from training data, but it is often difficult to extract information about the model, which with other methods can provide valuable insights into the data or problem being solved. A common problem in machine learning is overfitting, where the learned model is too complex and generalises poorly to unseen data. Increasing the size of the training dataset may reduce this; however, this assumes more training data is readily available, which is often not the case. In addition, often it is important to determine the uncertainty in the learned model parameters or even in the choice of model. This primer focuses on the use of BNs, which offer a solution to these issues. The use of Bayesian probability theory provides mechanisms for describing uncertainty and for adapting the number of parameters to the size of the data. Using a graphical representation provides a simple way to visualise the structure of a model. Inspection of models can provide valuable insights into the properties of the data and allow new models to be produced.

287 citations


Journal ArticleDOI
TL;DR: Analysing genome-wide gene transcription across 61 mouse tissues, the unusual topography of the large and highly structured networks produced are described, and it is demonstrated how they can be used to visualise, cluster, and mine large datasets.
Abstract: Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express3D.

265 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein–protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development.
Abstract: Protein-protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein-protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as "hotspots," these essential residues are defined as residues that impede protein-protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein-protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein-protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis.

Journal ArticleDOI
TL;DR: This review focuses only on the latest developments, including meta-methods and template-based alignment techniques that have not yet managed to deliver biologically perfect MSAs.
Abstract: An ever-increasing number of biological modeling methods depend on the assembly of an accurate multiple sequence alignment (MSA). These include phylogenetic trees, profiles, and structure prediction. Assembling a suitable MSA is not, however, a trivial task, and none of the existing methods have yet managed to deliver biologically perfect MSAs. Many of the algorithms published these last years have been extensively described [1–3], and this review focuses only on the latest developments, including meta-methods and template-based alignment techniques.

Journal ArticleDOI
TL;DR: A central contribution of this work is to rigorously show that hit/commute times have physical origins directly relevant to the equilibrium fluctuations of residues predicted by EN models.
Abstract: Elastic network (EN) models have been widely used in recent years for describing protein dynamics, based on the premise that the motions naturally accessible to native structures are relevant to biological function. We posit that equilibrium motions also determine communication mechanisms inherent to the network architecture. To this end, we explore the stochastics of a discrete-time, discrete-state Markov process of information transfer across the network of residues. We measure the communication abilities of residue pairs in terms of hit and commute times, i.e., the number of steps it takes on an average to send and receive signals. Functionally active residues are found to possess enhanced communication propensities, evidenced by their short hit times. Furthermore, secondary structural elements emerge as efficient mediators of communication. The present findings provide us with insights on the topological basis of communication in proteins and design principles for efficient signal transduction. While hit/commute times are information-theoretic concepts, a central contribution of this work is to rigorously show that they have physical origins directly relevant to the equilibrium fluctuations of residues predicted by EN models.

Journal ArticleDOI
TL;DR: This work presents a model for the self-organized formation of place cells, head-direction cells, and spatial-view cells in the hippocampal formation based on unsupervised learning on quasi-natural visual stimuli, which comprises a hierarchy of Slow Feature Analysis nodes.
Abstract: We present a model for the self-organized formation of place cells, head-direction cells, and spatial-view cells in the hippocampal formation based on unsupervised learning on quasi-natural visual stimuli. The model comprises a hierarchy of Slow Feature Analysis (SFA) nodes, which were recently shown to reproduce many properties of complex cells in the early visual system [1]. The system extracts a distributed grid-like representation of position and orientation, which is transcoded into a localized place-field, head-direction, or view representation, by sparse coding. The type of cells that develops depends solely on the relevant input statistics, i.e., the movement pattern of the simulated animal. The numerical simulations are complemented by a mathematical analysis that allows us to accurately predict the output of the top SFA layer.

Journal ArticleDOI
TL;DR: It is demonstrated that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy.
Abstract: To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Journal ArticleDOI
TL;DR: The results show that the process of aggregation into either ordered or amorphous species is largely determined by a competition between the hydrophobicity of the amino acid sequence and the tendency of polypeptide chains to form arrays of hydrogen bonds.
Abstract: Increasing evidence indicates that oligomeric protein assemblies may represent the molecular species responsible for cytotoxicity in a range of neurological disorders including Alzheimer and Parkinson diseases. We use all-atom computer simulations to reveal that the process of oligomerization can be divided into two steps. The first is characterised by a hydrophobic coalescence resulting in the formation of molten oligomers in which hydrophobic residues are sequestered away from the solvent. In the second step, the oligomers undergo a process of reorganisation driven by interchain hydrogen bonding interactions that induce the formation of β sheet rich assemblies in which hydrophobic groups can become exposed. Our results show that the process of aggregation into either ordered or amorphous species is largely determined by a competition between the hydrophobicity of the amino acid sequence and the tendency of polypeptide chains to form arrays of hydrogen bonds. We discuss how the increase in solvent-exposed hydrophobic surface resulting from such a competition offers an explanation for recent observations concerning the cytotoxicity of oligomeric species formed prior to mature amyloid fibrils.

Journal ArticleDOI
TL;DR: The analysis of the functional relationship between the TA and measured protein concentrations suggests that the TA follows Michaelis–Menten kinetics, and a significant correlation to recently published degradation rates supports this approach.
Abstract: Recent analyses indicate that differences in protein concentrations are only 20%–40% attributable to variable mRNA levels, underlining the importance of posttranscriptional regulation. Generally, protein concentrations depend on the translation rate (which is proportional to the translational activity, TA) and the degradation rate. By integrating 12 publicly available large-scale datasets and additional database information of the yeast Saccharomyces cerevisiae, we systematically analyzed five factors contributing to TA: mRNA concentration, ribosome density, ribosome occupancy, the codon adaptation index, and a newly developed “tRNA adaptation index.” Our analysis of the functional relationship between the TA and measured protein concentrations suggests that the TA follows Michaelis–Menten kinetics. The calculated TA, together with measured protein concentrations, allowed us to estimate degradation rates for 4,125 proteins under standard conditions. A significant correlation to recently published degradation rates supports our approach. Moreover, based on a newly developed scoring system, we identified and analyzed genes subjected to the posttranscriptional regulation mechanism, translation on demand. Next we applied these findings to publicly available data of protein and mRNA concentrations under four stress conditions. The integration of these measurements allowed us to compare the condition-specific responses at the posttranscriptional level. Our analysis of all 62 proteins that have been measured under all four conditions revealed proteins with very specific posttranscriptional stress response, in contrast to more generic responders, which were nonspecifically regulated under several conditions. The concept of specific and generic responders is known for transcriptional regulation. Here we show that it also holds true at the posttranscriptional level.

Journal ArticleDOI
TL;DR: The total quasi-steady state approximation provides an excellent kinetic formalism for protein interaction networks, because it unveils the modular structure of the enzymatic reactions, it suggests a simple algorithm to formulate correct kinetic equations, and it succeeds in faithfully reproducing the dynamics of the network both qualitatively and quantitatively.
Abstract: In metabolic networks, metabolites are usually present in great excess over the enzymes that catalyze their interconversion, and describing the rates of these reactions by using the Michaelis–Menten rate law is perfectly valid. This rate law assumes that the concentration of enzyme–substrate complex (C) is much less than the free substrate concentration (S0). However, in protein interaction networks, the enzymes and substrates are all proteins in comparable concentrations, and neglecting C with respect to S0 is not valid. Borghans, DeBoer, and Segel developed an alternative description of enzyme kinetics that is valid when C is comparable to S0. We extend this description, which Borghans et al. call the total quasi-steady state approximation, to networks of coupled enzymatic reactions. First, we analyze an isolated Goldbeter–Koshland switch when enzymes and substrates are present in comparable concentrations. Then, on the basis of a real example of the molecular network governing cell cycle progression, we couple two and three Goldbeter–Koshland switches together to study the effects of feedback in networks of protein kinases and phosphatases. Our analysis shows that the total quasi-steady state approximation provides an excellent kinetic formalism for protein interaction networks, because (1) it unveils the modular structure of the enzymatic reactions, (2) it suggests a simple algorithm to formulate correct kinetic equations, and (3) contrary to classical Michaelis–Menten kinetics, it succeeds in faithfully reproducing the dynamics of the network both qualitatively and quantitatively.

Journal ArticleDOI
TL;DR: It is found that severe epidemics cannot be prevented unless vaccination programs offer incentives, and incentive-based vaccination programs are necessary to control influenza, but some may be detrimental.
Abstract: Previous modeling studies have identified the vaccination coverage level necessary for preventing influenza epidemics, but have not shown whether this critical coverage can be reached. Here we use computational modeling to determine, for the first time, whether the critical coverage for influenza can be achieved by voluntary vaccination. We construct a novel individual-level model of human cognition and behavior; individuals are characterized by two biological attributes (memory and adaptability) that they use when making vaccination decisions. We couple this model with a population-level model of influenza that includes vaccination dynamics. The coupled models allow individual-level decisions to influence influenza epidemiology and, conversely, influenza epidemiology to influence individual-level decisions. By including the effects of adaptive decision-making within an epidemic model, we can reproduce two essential characteristics of influenza epidemiology: annual variation in epidemic severity and sporadic occurrence of severe epidemics. We suggest that individual-level adaptive decision-making may be an important (previously overlooked) causal factor in driving influenza epidemiology. We find that severe epidemics cannot be prevented unless vaccination programs offer incentives. Frequency of severe epidemics could be reduced if programs provide, as an incentive to be vaccinated, several years of free vaccines to individuals who pay for one year of vaccination. Magnitude of epidemic amelioration will be determined by the number of years of free vaccination, an individuals' adaptability in decision-making, and their memory. This type of incentive program could control epidemics if individuals are very adaptable and have long-term memories. However, incentive-based programs that provide free vaccination for families could increase the frequency of severe epidemics. We conclude that incentive-based vaccination programs are necessary to control influenza, but some may be detrimental. Surprisingly, we find that individuals' memories and flexibility in adaptive decision-making can be extremely important factors in determining the success of influenza vaccination programs. Finally, we discuss the implication of our results for controlling pandemics.

Journal ArticleDOI
TL;DR: A top-down approach to the symptoms of schizophrenia is proposed based on a statistical dynamical framework that shows that a reduced depth in the basins of attraction of cortical attractor states destabilizes the activity at the network level due to the constant statistical fluctuations caused by the stochastic spiking of neurons.
Abstract: We propose a top-down approach to the symptoms of schizophrenia based on a statistical dynamical framework. We show that a reduced depth in the basins of attraction of cortical attractor states destabilizes the activity at the network level due to the constant statistical fluctuations caused by the stochastic spiking of neurons. In integrate-and-fire network simulations, a decrease in the NMDA receptor conductances, which reduces the depth of the attractor basins, decreases the stability of short-term memory states and increases distractibility. The cognitive symptoms of schizophrenia such as distractibility, working memory deficits, or poor attention could be caused by this instability of attractor states in prefrontal cortical networks. Lower firing rates are also produced, and in the orbitofrontal and anterior cingulate cortex could account for the negative symptoms, including a reduction of emotions. Decreasing the GABA as well as the NMDA conductances produces not only switches between the attractor states, but also jumps from spontaneous activity into one of the attractors. We relate this to the positive symptoms of schizophrenia, including delusions, paranoia, and hallucinations, which may arise because the basins of attraction are shallow and there is instability in temporal lobe semantic memory networks, leading thoughts to move too freely round the attractor energy landscape.

Journal ArticleDOI
TL;DR: The results show that axonal variability is a general problem and should be taken into account when considering both neural coding and the reliability of synaptic transmission in densely connected cortical networks, where small synapses are typically innervated by thin axons.
Abstract: It is generally assumed that axons use action potentials (APs) to transmit information fast and reliably to synapses. Yet, the reliability of transmission along fibers below 0.5 μm diameter, such as cortical and cerebellar axons, is unknown. Using detailed models of rodent cortical and squid axons and stochastic simulations, we show how conduction along such thin axons is affected by the probabilistic nature of voltage-gated ion channels (channel noise). We identify four distinct effects that corrupt propagating spike trains in thin axons: spikes were added, deleted, jittered, or split into groups depending upon the temporal pattern of spikes. Additional APs may appear spontaneously; however, APs in general seldom fail (<1%). Spike timing is jittered on the order of milliseconds over distances of millimeters, as conduction velocity fluctuates in two ways. First, variability in the number of Na channels opening in the early rising phase of the AP cause propagation speed to fluctuate gradually. Second, a novel mode of AP propagation (stochastic microsaltatory conduction), where the AP leaps ahead toward spontaneously formed clusters of open Na channels, produces random discrete jumps in spike time reliability. The combined effect of these two mechanisms depends on the pattern of spikes. Our results show that axonal variability is a general problem and should be taken into account when considering both neural coding and the reliability of synaptic transmission in densely connected cortical networks, where small synapses are typically innervated by thin axons. In contrast we find that thicker axons above 0.5 μm diameter are reliable.

Journal ArticleDOI
TL;DR: The study provides estimates for parameters that can be directly used in mathematical and computational models to study how NI usage might lead to the emergence and spread of resistance in the population and finds that the initial generation of resistant cases is most likely lower than the fraction reported.
Abstract: Neuraminidase Inhibitors (NI) are currently the most effective drugs against influenza. Recent cases of NI resistance are a cause for concern. To assess the danger of NI resistance, a number of studies have reported the fraction of treated patients from which resistant strains could be isolated. Unfortunately, those results strongly depend on the details of the experimental protocol. Additionally, knowing the fraction of patients harboring resistance is not too useful by itself. Instead, we want to know how likely it is that an infected patient can generate a resistant infection in a secondary host, and how likely it is that the resistant strain subsequently spreads. While estimates for these parameters can often be obtained from epidemiological data, such data is lacking for NI resistance in influenza. Here, we use an approach that does not rely on epidemiological data. Instead, we combine data from influenza infections of human volunteers with a mathematical framework that allows estimation of the parameters that govern the initial generation and subsequent spread of resistance. We show how these parameters are influenced by changes in drug efficacy, timing of treatment, fitness of the resistant strain, and details of virus and immune system dynamics. Our study provides estimates for parameters that can be directly used in mathematical and computational models to study how NI usage might lead to the emergence and spread of resistance in the population. We find that the initial generation of resistant cases is most likely lower than the fraction of resistant cases reported. However, we also show that the results depend strongly on the details of the within-host dynamics of influenza infections, and most importantly, the role the immune system plays. Better knowledge of the quantitative dynamics of the immune response during influenza infections will be crucial to further improve the results.

Journal ArticleDOI
TL;DR: Comparing protein abundance levels across poor and rich media, the authors find a general trend for homeostatic regulation where transcription and translation change in a reciprocal manner and show that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.
Abstract: The translation efficiency of most Saccharomyces cerevisiae genes remains fairly constant across poor and rich growth media. This observation has led us to revisit the available data and to examine the potential utility of a protein abundance predictor in reinterpreting existing mRNA expression data. Our predictor is based on large-scale data of mRNA levels, the tRNA adaptation index, and the evolutionary rate. It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe). The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels. Analysis of gene expression measurement experiments using the predicted protein abundance levels yields new insights that are not readily discernable when clustering the corresponding mRNA expression levels. Comparing protein abundance levels across poor and rich media, we find a general trend for homeostatic regulation where transcription and translation change in a reciprocal manner. This phenomenon is more prominent near origins of replications. Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.

Journal ArticleDOI
TL;DR: This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks.
Abstract: Human walking is a dynamic, partly self-stabilizing process relying on the interaction of the biomechanical design with its neuronal control. The coordination of this process is a very difficult problem, and it has been suggested that it involves a hierarchy of levels, where the lower ones, e.g., interactions between muscles and the spinal cord, are largely autonomous, and where higher level control (e.g., cortical) arises only pointwise, as needed. This requires an architecture of several nested, sensori–motor loops where the walking process provides feedback signals to the walker's sensory systems, which can be used to coordinate its movements. To complicate the situation, at a maximal walking speed of more than four leg-lengths per second, the cycle period available to coordinate all these loops is rather short. In this study we present a planar biped robot, which uses the design principle of nested loops to combine the self-stabilizing properties of its biomechanical design with several levels of neuronal control. Specifically, we show how to adapt control by including online learning mechanisms based on simulated synaptic plasticity. This robot can walk with a high speed (>3.0 leg length/s), self-adapting to minor disturbances, and reacting in a robust way to abruptly induced gait changes. At the same time, it can learn walking on different terrains, requiring only few learning experiences. This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks.

Journal ArticleDOI
TL;DR: A computationally efficient pipeline for phylogenomic classification of proteins using the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction.
Abstract: Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.

Journal ArticleDOI
TL;DR: A chaperone interaction network for the malarial parasite, Plasmodium falciparum, is constructed by combining experimental interactome data with in silico analysis and allows us to make predictions regarding the functions of hypothetical proteins based on their interactions.
Abstract: Molecular chaperones participate in the maintenance of cellular protein homeostasis, cell growth and differentiation, signal transduction, and development. Although a vast body of information is available regarding individual chaperones, few studies have attempted a systems level analysis of chaperone function. In this paper, we have constructed a chaperone interaction network for the malarial parasite, Plasmodium falciparum. P. falciparum is responsible for several million deaths every year, and understanding the biology of the parasite is a top priority. The parasite regularly experiences heat shock as part of its life cycle, and chaperones have often been implicated in parasite survival and growth. To better understand the participation of chaperones in cellular processes, we created a parasite chaperone network by combining experimental interactome data with in silico analysis. We used interolog mapping to predict protein–protein interactions for parasite chaperones based on the interactions of corresponding human chaperones. This data was then combined with information derived from existing high-throughput yeast two-hybrid assays. Analysis of the network reveals the broad range of functions regulated by chaperones. The network predicts involvement of chaperones in chromatin remodeling, protein trafficking, and cytoadherence. Importantly, it allows us to make predictions regarding the functions of hypothetical proteins based on their interactions. It allows us to make specific predictions about Hsp70–Hsp40 interactions in the parasite and assign functions to members of the Hsp90 and Hsp100 families. Analysis of the network provides a rational basis for the anti-malarial activity of geldanamycin, a well-known Hsp90 inhibitor. Finally, analysis of the network provides a theoretical basis for further experiments designed toward understanding the involvement of this important class of molecules in parasite biology.

Journal ArticleDOI
TL;DR: A dynamic computational model incorporating the current mechanistic understanding of gene interactions during Caenorhabditis elegans vulval development is developed, validating two key predictions provided by the modeling work and substantiate the usefulness of executing and analyzing mechanistic models to investigate complex biological behaviors.
Abstract: Caenorhabditis elegans vulval development provides an important paradigm for studying the process of cell fate determination and pattern formation during animal development. Although many genes controlling vulval cell fate specification have been identified, how they orchestrate themselves to generate a robust and invariant pattern of cell fates is not yet completely understood. Here, we have developed a dynamic computational model incorporating the current mechanistic understanding of gene interactions during this patterning process. A key feature of our model is the inclusion of multiple modes of crosstalk between the epidermal growth factor receptor (EGFR) and LIN-12/Notch signaling pathways, which together determine the fates of the six vulval precursor cells (VPCs). Computational analysis, using the model-checking technique, provides new biological insights into the regulatory network governing VPC fate specification and predicts novel negative feedback loops. In addition, our analysis shows that most mutations affecting vulval development lead to stable fate patterns in spite of variations in synchronicity between VPCs. Computational searches for the basis of this robustness show that a sequential activation of the EGFR-mediated inductive signaling and LIN-12 / Notch-mediated lateral signaling pathways is key to achieve a stable cell fate pattern. We demonstrate experimentally a time-delay between the activation of the inductive and lateral signaling pathways in wild-type animals and the loss of sequential signaling in mutants showing unstable fate patterns; thus, validating two key predictions provided by our modeling work. The insights gained by our modeling study further substantiate the usefulness of executing and analyzing mechanistic models to investigate complex biological behaviors.

Journal ArticleDOI
TL;DR: This work model the dynamics of nucleocytoplasmic transport as diffusion in an effective potential resulting from the interaction of the transport factors with the flexible FG nups, using a minimal number of assumptions consistent with the most well-established structural and functional properties of NPC transport.
Abstract: All materials enter or exit the cell nucleus through nuclear pore complexes (NPCs), efficient transport devices that combine high selectivity and throughput. NPC-associated proteins containing phenylalanine–glycine repeats (FG nups) have large, flexible, unstructured proteinaceous regions, and line the NPC. A central feature of NPC-mediated transport is the binding of cargo-carrying soluble transport factors to the unstructured regions of FG nups. Here, we model the dynamics of nucleocytoplasmic transport as diffusion in an effective potential resulting from the interaction of the transport factors with the flexible FG nups, using a minimal number of assumptions consistent with the most well-established structural and functional properties of NPC transport. We discuss how specific binding of transport factors to the FG nups facilitates transport, and how this binding and competition between transport factors and other macromolecules for binding sites and space inside the NPC accounts for the high selectivity of transport. We also account for why transport is relatively insensitive to changes in the number and distribution of FG nups in the NPC, providing an explanation for recent experiments where up to half the total mass of the FG nups has been deleted without abolishing transport. Our results suggest strategies for the creation of artificial nanomolecular sorting devices.

Journal ArticleDOI
TL;DR: CATHEDRAL builds on the features of a fast secondary-structure–based method to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries.
Abstract: We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.

Journal ArticleDOI
TL;DR: This work explores the possibility that small RNAs participate in sharpening a gene expression profile that was crudely established by a morphogen, and points out the functional significance of some mechanistic properties, such as mobility of smallRNAs and the irreversibility of their interactions.
Abstract: The precise establishment of gene expression patterns is a crucial step in development. Formation of a sharp boundary between high and low spatial expression domains requires a genetic mechanism that exhibits sensitivity, yet is robust to fluctuations, a demand that may not be easily achieved by morphogens alone. Recently, it has been demonstrated that small RNAs (and, in particular, microRNAs) play many roles in embryonic development. Whereas some RNAs are essential for embryogenesis, others are limited to fine-tuning a predetermined gene expression pattern. Here, we explore the possibility that small RNAs participate in sharpening a gene expression profile that was crudely established by a morphogen. To this end, we study a model in which small RNAs interact with a target gene and diffusively move from cell to cell. Though diffusion generally smoothens spatial expression patterns, we find that intercellular mobility of small RNAs is actually critical in sharpening the interface between target expression domains in a robust manner. This sharpening occurs as small RNAs diffuse into regions of low mRNA expression and eliminate target molecules therein, but cannot affect regions of high mRNA levels. We discuss the applicability of our results, as examples, to the case of leaf polarity establishment in maize and Hox patterning in the early Drosophila embryo. Our findings point out the functional significance of some mechanistic properties, such as mobility of small RNAs and the irreversibility of their interactions. These properties are yet to be established directly for most classes of small RNAs. An indirect yet simple experimental test of the proposed mechanism is suggested in some detail.