scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Computational Biology in 2015"


Journal ArticleDOI
TL;DR: The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate.
Abstract: By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well.

2,147 citations


Journal ArticleDOI
TL;DR: SParse InversE Covariance Estimation for Ecological Association Inference is presented, a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios.
Abstract: 16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.

1,013 citations


Journal ArticleDOI
TL;DR: This work uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction and finds evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582.
Abstract: Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/.

684 citations


Journal ArticleDOI
TL;DR: VDJtools is reported, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API.
Abstract: Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools.

428 citations


Journal ArticleDOI
TL;DR: The results provide a simple formula for estimating the time course of the LFP from LIF network simulations in cases where a single pyramidal population dominates the L FP generation, and thereby facilitate quantitative comparison between computational models and experimental LFP recordings in vivo.
Abstract: Leaky integrate-and-fire (LIF) network models are commonly used to study how the spiking dynamics of neural networks changes with stimuli, tasks or dynamic network states. However, neurophysiological studies in vivo often rather measure the mass activity of neuronal microcircuits with the local field potential (LFP). Given that LFPs are generated by spatially separated currents across the neuronal membrane, they cannot be computed directly from quantities defined in models of point-like LIF neurons. Here, we explore the best approximation for predicting the LFP based on standard output from point-neuron LIF networks. To search for this best “LFP proxy”, we compared LFP predictions from candidate proxies based on LIF network output (e.g, firing rates, membrane potentials, synaptic currents) with “ground-truth” LFP obtained when the LIF network synaptic input currents were injected into an analogous three-dimensional (3D) network model of multi-compartmental neurons with realistic morphology, spatial distributions of somata and synapses. We found that a specific fixed linear combination of the LIF synaptic currents provided an accurate LFP proxy, accounting for most of the variance of the LFP time course observed in the 3D network for all recording locations. This proxy performed well over a broad set of conditions, including substantial variations of the neuronal morphologies. Our results provide a simple formula for estimating the time course of the LFP from LIF network simulations in cases where a single pyramidal population dominates the LFP generation, and thereby facilitate quantitative comparison between computational models and experimental LFP recordings in vivo.

374 citations


Journal ArticleDOI
TL;DR: The results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.
Abstract: We present a machine learning-based methodology capable of providing real-time (“nowcast”) and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC’s ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013–2014 (retrospective) and 2014–2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method’s predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT’s real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.

365 citations


Journal ArticleDOI
TL;DR: The third version of PathVisio 3 is presented with the newest additions and improvements of the application, and introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application.
Abstract: PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways. Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application. PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.

343 citations


Journal ArticleDOI
TL;DR: This study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells and distinguished major cell types of fetal mouse lung.
Abstract: A major challenge in developmental biology is to understand the genetic and cellular processes/programs driving organ formation and differentiation of the diverse cell types that comprise the embryo. While recent studies using single cell transcriptome analysis illustrate the power to measure and understand cellular heterogeneity in complex biological systems, processing large amounts of RNA-seq data from heterogeneous cell populations creates the need for readily accessible tools for the analysis of single-cell RNA-seq (scRNA-seq) profiles. The present study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells. The pipeline supports the analysis for: 1) the distinction and identification of major cell types; 2) the identification of cell type specific gene signatures; and 3) the determination of driving forces of given cell types. We applied this pipeline to the RNA-seq analysis of single cells isolated from embryonic mouse lung at E16.5. Through the pipeline analysis, we distinguished major cell types of fetal mouse lung, including epithelial, endothelial, smooth muscle, pericyte, and fibroblast-like cell types, and identified cell type specific gene signatures, bioprocesses, and key regulators. SINCERA is implemented in R, licensed under the GNU General Public License v3, and freely available from CCHMC PBGE website, https://research.cchmc.org/pbge/sincera.html.

310 citations


Journal ArticleDOI
TL;DR: This work finds that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity, which inspires the design of a novel Disease Module Detection algorithm.
Abstract: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

308 citations


Journal ArticleDOI
TL;DR: This study investigated whether particular PTM-types are associated with proteins with specific and possibly “strategic” placements in the network of all protein interactions by determining informative network-theoretic properties.
Abstract: Among other effects, post-translational modifications (PTMs) have been shown to exert their function via the modulation of protein-protein interactions. For twelve different main PTM-types and associated subtypes and across 9 diverse species, we investigated whether particular PTM-types are associated with proteins with specific and possibly “strategic” placements in the network of all protein interactions by determining informative network-theoretic properties. Proteins undergoing a PTM were observed to engage in more interactions and positioned in more central locations than non-PTM proteins. Among the twelve considered PTM-types, phosphorylated proteins were identified most consistently as being situated in central network locations and with the broadest interaction spectrum to proteins carrying other PTM-types, while glycosylated proteins are preferentially located at the network periphery. For the human interactome, proteins undergoing sumoylation or proteolytic cleavage were found with the most characteristic network properties. PTM-type-specific protein interaction network (PIN) properties can be rationalized with regard to the function of the respective PTM-carrying proteins. For example, glycosylation sites were found enriched in proteins with plasma membrane localizations and transporter or receptor activity, which generally have fewer interacting partners. The involvement in disease processes of human proteins undergoing PTMs was also found associated with characteristic PIN properties. By integrating global protein interaction networks and specific PTMs, our study offers a novel approach to unraveling the role of PTMs in cellular processes.

303 citations


Journal ArticleDOI
TL;DR: How the development approach used for Escher can be used to guide the development of future visualization tools is explained and examples of each of these features are provided.
Abstract: Escher is a web application for visualizing data on biological pathways. Three key features make Escher a uniquely effective tool for pathway visualization. First, users can rapidly design new pathway maps. Escher provides pathway suggestions based on user data and genome-scale models, so users can draw pathways in a semi-automated way. Second, users can visualize data related to genes or proteins on the associated reactions and pathways, using rules that define which enzymes catalyze each reaction. Thus, users can identify trends in common genomic data types (e.g. RNA-Seq, proteomics, ChIP)--in conjunction with metabolite- and reaction-oriented data types (e.g. metabolomics, fluxomics). Third, Escher harnesses the strengths of web technologies (SVG, D3, developer tools) so that visualizations can be rapidly adapted, extended, shared, and embedded. This paper provides examples of each of these features and explains how the development approach used for Escher can be used to guide the development of future visualization tools.

Journal ArticleDOI
TL;DR: BASiCS (Bayesian Analysis of Single-Cell Sequencing data) provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study, formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions.
Abstract: Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.

Journal ArticleDOI
TL;DR: Automated docking of drug-like molecules into receptors is an essential tool in structure-based drug design and this work presents AutoDockFR–AutoDock for Flexible Receptors (ADFR), a new docking engine based on the AutoD dock4 scoring function, which addresses the aforementioned challenges with a new Genetic Algorithm and customized scoring function.
Abstract: Automated docking of drug-like molecules into receptors is an essential tool in structure-based drug design. While modeling receptor flexibility is important for correctly predicting ligand binding, it still remains challenging. This work focuses on an approach in which receptor flexibility is modeled by explicitly specifying a set of receptor side-chains a-priori. The challenges of this approach include the: 1) exponential growth of the search space, demanding more efficient search methods; and 2) increased number of false positives, calling for scoring functions tailored for flexible receptor docking. We present AutoDockFR-AutoDock for Flexible Receptors (ADFR), a new docking engine based on the AutoDock4 scoring function, which addresses the aforementioned challenges with a new Genetic Algorithm (GA) and customized scoring function. We validate ADFR using the Astex Diverse Set, demonstrating an increase in efficiency and reliability of its GA over the one implemented in AutoDock4. We demonstrate greatly increased success rates when cross-docking ligands into apo receptors that require side-chain conformational changes for ligand binding. These cross-docking experiments are based on two datasets: 1) SEQ17 -a receptor diversity set containing 17 pairs of apo-holo structures; and 2) CDK2 -a ligand diversity set composed of one CDK2 apo structure and 52 known bound inhibitors. We show that, when cross-docking ligands into the apo conformation of the receptors with up to 14 flexible side-chains, ADFR reports more correctly cross-docked ligands than AutoDock Vina on both datasets with solutions found for 70.6% vs. 35.3% systems on SEQ17, and 76.9% vs. 61.5% on CDK2. ADFR also outperforms AutoDock Vina in number of top ranking solutions on both datasets. Furthermore, we show that correctly docked CDK2 complexes re-create on average 79.8% of all pairwise atomic interactions between the ligand and moving receptor atoms in the holo complexes. Finally, we show that down-weighting the receptor internal energy improves the ranking of correctly docked poses and that runtime for AutoDockFR scales linearly when side-chain flexibility is added.

Journal ArticleDOI
TL;DR: This work has identified epithelial cells that naturally exist in an intermediate state with bidirectional differentiation potential, and found the balance between EMT-promoting and -inhibiting factors to be critical in achieving and selecting between intermediate states.
Abstract: Reversible epithelial-to-mesenchymal transition (EMT) is central to tissue development, epithelial stemness, and cancer metastasis. While many regulatory elements have been identified to induce EMT, the complex process underlying such cellular plasticity remains poorly understood. Utilizing a systems biology approach integrating modeling and experiments, we found multiple intermediate states contributing to EMT and that the robustness of the transitions is modulated by transcriptional factor Ovol2. In particular, we obtained evidence for a mutual inhibition relationship between Ovol2 and EMT inducer Zeb1, and observed that adding this regulation generates a novel four-state system consisting of two distinct intermediate phenotypes that differ in differentiation propensities and are favored in different environmental conditions. We identified epithelial cells that naturally exist in an intermediate state with bidirectional differentiation potential, and found the balance between EMT-promoting and -inhibiting factors to be critical in achieving and selecting between intermediate states. Our analysis suggests a new design principle in controlling cellular plasticity through multiple intermediate cell fates and underscores the critical involvement of Ovol2 and its associated molecular regulations.

Journal ArticleDOI
TL;DR: The model consistently approximates the temporal and spatial synchronization patterns of the empirical data, and reveals that multiple clusters that transiently synchronize and desynchronize emerge from the complex topology of anatomical connections, provided that oscillators are heterogeneous.
Abstract: Spatial patterns of coherent activity across different brain areas have been identified during the resting-state fluctuations of the brain. However, recent studies indicate that resting-state activity is not stationary, but shows complex temporal dynamics. We were interested in the spatiotemporal dynamics of the phase interactions among resting-state fMRI BOLD signals from human subjects. We found that the global phase synchrony of the BOLD signals evolves on a characteristic ultra-slow (<0.01Hz) time scale, and that its temporal variations reflect the transient formation and dissolution of multiple communities of synchronized brain regions. Synchronized communities reoccurred intermittently in time and across scanning sessions. We found that the synchronization communities relate to previously defined functional networks known to be engaged in sensory-motor or cognitive function, called resting-state networks (RSNs), including the default mode network, the somato-motor network, the visual network, the auditory network, the cognitive control networks, the self-referential network, and combinations of these and other RSNs. We studied the mechanism originating the observed spatiotemporal synchronization dynamics by using a network model of phase oscillators connected through the brain’s anatomical connectivity estimated using diffusion imaging human data. The model consistently approximates the temporal and spatial synchronization patterns of the empirical data, and reveals that multiple clusters that transiently synchronize and desynchronize emerge from the complex topology of anatomical connections, provided that oscillators are heterogeneous.

Journal ArticleDOI
TL;DR: HGT is a major source of phenotypic innovation and a mechanism of niche adaptation, and as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it can be difficult to ascertain all but simple and clear-cut HGT events.
Abstract: Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages [1]. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.

Journal ArticleDOI
TL;DR: In this paper, the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic ϕ which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps.
Abstract: In the life sciences, many measurement methods yield only the relative abundances of different components in a sample. With such relative—or compositional—data, differential expression needs careful interpretation, and correlation—a statistical workhorse for analyzing pairwise relationships—is an inappropriate measure of association. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data. We show how the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic ϕ which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps. While the main aim of this study is to present proportionality as a means to analyse relative data, it also raises intriguing questions about the molecular mechanisms underlying the proportional regulation of a range of yeast genes.

Journal ArticleDOI
TL;DR: The developed ShortBRED (Short, Better Representative Extract Dataset) method is applied to profile antibiotic resistance protein families in the gut microbiomes of individuals from the United States, China, Malawi, and Venezuela, and supports antibiotic resistance as a core function in the human gut microbiome.
Abstract: Profiling microbial community function from metagenomic sequencing data remains a computationally challenging problem. Mapping millions of DNA reads from such samples to reference protein databases requires long run-times, and short read lengths can result in spurious hits to unrelated proteins (loss of specificity). We developed ShortBRED (Short, Better Representative Extract Dataset) to address these challenges, facilitating fast, accurate functional profiling of metagenomic samples. ShortBRED consists of two components: (i) a method that reduces reference proteins of interest to short, highly representative amino acid sequences (“markers”) and (ii) a search step that maps reads to these markers to quantify the relative abundance of their associated proteins. After evaluating ShortBRED on synthetic data, we applied it to profile antibiotic resistance protein families in the gut microbiomes of individuals from the United States, China, Malawi, and Venezuela. Our results support antibiotic resistance as a core function in the human gut microbiome, with tetracycline-resistant ribosomal protection proteins and Class A beta-lactamases being the most widely distributed resistance mechanisms worldwide. ShortBRED markers are applicable to other homology-based search tasks, which we demonstrate here by identifying phylogenetic signatures of antibiotic resistance across more than 3,000 microbial isolate genomes. ShortBRED can be applied to profile a wide variety of protein families of interest; the software, source code, and documentation are available for download at http://huttenhower.sph.harvard.edu/shortbred

Journal ArticleDOI
TL;DR: The model provides a precise summary of the prototypical patterns for each emotion category, and demonstrates that a sufficient characterization of emotion categories relies on differential patterns of involvement in neocortical systems that differ between humans and other species.
Abstract: Understanding emotion is critical for a science of healthy and disordered brain function, but the neurophysiological basis of emotional experience is still poorly understood. We analyzed human brain activity patterns from 148 studies of emotion categories (2159 total participants) using a novel hierarchical Bayesian model. The model allowed us to classify which of five categories—fear, anger, disgust, sadness, or happiness—is engaged by a study with 66% accuracy (43-86% across categories). Analyses of the activity patterns encoded in the model revealed that each emotion category is associated with unique, prototypical patterns of activity across multiple brain systems including the cortex, thalamus, amygdala, and other structures. The results indicate that emotion categories are not contained within any one region or system, but are represented as configurations across multiple brain networks. The model provides a precise summary of the prototypical patterns for each emotion category, and demonstrates that a sufficient characterization of emotion categories relies on (a) differential patterns of involvement in neocortical systems that differ between humans and other species, and (b) distinctive patterns of cortical-subcortical interactions. Thus, these findings are incompatible with several contemporary theories of emotion, including those that emphasize emotion-dedicated brain systems and those that propose emotion is localized primarily in subcortical activity. They are consistent with componential and constructionist views, which propose that emotions are differentiated by a combination of perceptual, mnemonic, prospective, and motivational elements. Such brain-based models of emotion provide a foundation for new translational and clinical approaches.

Journal ArticleDOI
TL;DR: This research presents a novel probabilistic approach to estimating the response of the immune system to laser-spot assisted, 3D image analysis of central nervous system injury.
Abstract: Note: Editorial Reference EPFL-ARTICLE-214482doi:10.1371/journal.pcbi.1003904 Record created on 2015-12-10, modified on 2017-05-12

Journal ArticleDOI
TL;DR: A new co- expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) is developed by introducing quality control of co-expression similarities, parallelizing embedded network construction, and developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs).
Abstract: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Journal ArticleDOI
TL;DR: A mapping between general stochastic models of gene expression and systems studied in queueing theory is invoked to derive exact analytical expressions for the moments associated with mRNA/protein steady-state distributions, and approaches for accurate estimation of burst parameters are developed.
Abstract: Gene expression in individual cells is highly variable and sporadic, often resulting in the synthesis of mRNAs and proteins in bursts. Such bursting has important consequences for cell-fate decisions in diverse processes ranging from HIV-1 viral infections to stem-cell differentiation. It is generally assumed that bursts are geometrically distributed and that they arrive according to a Poisson process. On the other hand, recent single-cell experiments provide evidence for complex burst arrival processes, highlighting the need for analysis of more general stochastic models. To address this issue, we invoke a mapping between general stochastic models of gene expression and systems studied in queueing theory to derive exact analytical expressions for the moments associated with mRNA/protein steady-state distributions. These results are then used to derive noise signatures, i.e. explicit conditions based entirely on experimentally measurable quantities, that determine if the burst distributions deviate from the geometric distribution or if burst arrival deviates from a Poisson process. For non-Poisson arrivals, we develop approaches for accurate estimation of burst parameters. The proposed approaches can lead to new insights into transcriptional bursting based on measurements of steady-state mRNA/protein distributions.

Journal ArticleDOI
TL;DR: It is proposed that cryptic 3’SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point, and this model provides both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF 3B1 mutation.
Abstract: Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3’ splice sites (3’SSs) are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3’ SSs and propose that cryptic 3’SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3’SSs are present at low frequency (<10%) relative to nearby canonical 3’SSs, we identified ten genes that preferred out-of-frame cryptic 3’SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3’SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation.

Journal ArticleDOI
TL;DR: The dual-layer integrated cell line-drug network model correctly predicted that BRAF mutant cell lines would be more sensitive than BRAF wild-type cell lines to three MEK1/2 inhibitors tested, which is significantly better than the previous results using the elastic net model.
Abstract: The ability to predict the response of a cancer patient to a therapeutic agent is a major goal in modern oncology that should ultimately lead to personalized treatment. Existing approaches to predicting drug sensitivity rely primarily on profiling of cancer cell line panels that have been treated with different drugs and selecting genomic or functional genomic features to regress or classify the drug response. Here, we propose a dual-layer integrated cell line-drug network model, which uses both cell line similarity network (CSN) data and drug similarity network (DSN) data to predict the drug response of a given cell line using a weighted model. Using the Cancer Cell Line Encyclopedia (CCLE) and Cancer Genome Project (CGP) studies as benchmark datasets, our single-layer model with CSN or DSN and only a single parameter achieved a prediction performance comparable to the previously generated elastic net model. When using the dual-layer model integrating both CSN and DSN, our predicted response reached a 0.6 Pearson correlation coefficient with observed responses for most drugs, which is significantly better than the previous results using the elastic net model. We have also applied the dual-layer cell line-drug integrated network model to fill in the missing drug response values in the CGP dataset. Even though the dual-layer integrated cell line-drug network model does not specifically model mutation information, it correctly predicted that BRAF mutant cell lines would be more sensitive than BRAF wild-type cell lines to three MEK1/2 inhibitors tested.

Journal ArticleDOI
TL;DR: This work combines modern data assimilation methods with Wikipedia access logs and CDC influenza-like illness (ILI) reports to create a weekly forecast for seasonal influenza, and adjusts the initialization and parametrization of a disease model to determine systematic model bias.
Abstract: Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza-like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013-2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method produced 50% and 95% credible intervals for the 2013-2014 ILI observations that contained the actual observations for most weeks in the forecast. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has passed.

Journal ArticleDOI
TL;DR: This work infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues, and shows that modules conserved across tissues are especially likely to have functions common to all tissues.
Abstract: To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.

Journal ArticleDOI
TL;DR: A novel technique is developed to objectively track seizure states from dynamic functional networks constructed from intracranial recordings that implicate distributed cortical structures in seizure generation, propagation and termination, and may have practical significance in determining which circuits to modulate with implantable devices.
Abstract: The epileptic network is characterized by pathologic, seizure-generating ‘foci’ embedded in a web of structural and functional connections. Clinically, seizure foci are considered optimal targets for surgery. However, poor surgical outcome suggests a complex relationship between foci and the surrounding network that drives seizure dynamics. We developed a novel technique to objectively track seizure states from dynamic functional networks constructed from intracranial recordings. Each dynamical state captures unique patterns of network connections that indicate synchronized and desynchronized hubs of neural populations. Our approach suggests that seizures are generated when synchronous relationships near foci work in tandem with rapidly changing desynchronous relationships from the surrounding epileptic network. As seizures progress, topographical and geometrical changes in network connectivity strengthen and tighten synchronous connectivity near foci—a mechanism that may aid seizure termination. Collectively, our observations implicate distributed cortical structures in seizure generation, propagation and termination, and may have practical significance in determining which circuits to modulate with implantable devices.

Journal ArticleDOI
TL;DR: A new strategy in the war against antibiotic–resistant organisms is suggested: drug sequencing to shepherd evolution through genotype space to states from which resistance cannot emerge and by which to maximize the chance of successful therapy.
Abstract: The increasing rate of antibiotic resistance and slowing discovery of novel antibiotic treatments presents a growing threat to public health. Here, we consider a simple model of evolution in asexually reproducing populations which considers adaptation as a biased random walk on a fitness landscape. This model associates the global properties of the fitness landscape with the algebraic properties of a Markov chain transition matrix and allows us to derive general results on the non-commutativity and irreversibility of natural selection as well as antibiotic cycling strategies. Using this formalism, we analyze 15 empirical fitness landscapes of E. coli under selection by different β-lactam antibiotics and demonstrate that the emergence of resistance to a given antibiotic can be either hindered or promoted by different sequences of drug application. Specifically, we demonstrate that the majority, approximately 70%, of sequential drug treatments with 2–4 drugs promote resistance to the final antibiotic. Further, we derive optimal drug application sequences with which we can probabilistically ‘steer’ the population through genotype space to avoid the emergence of resistance. This suggests a new strategy in the war against antibiotic–resistant organisms: drug sequencing to shepherd evolution through genotype space to states from which resistance cannot emerge and by which to maximize the chance of successful therapy.

Journal ArticleDOI
TL;DR: The utility of this software is demonstrated by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol and it is shown that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources.
Abstract: TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit.

Journal ArticleDOI
TL;DR: Novel methods in network science are developed and applied to quantify how patterns of functional connectivity between brain regions reconfigure as human subjects perform 64 different tasks, providing a new conceptual framework for understanding the dynamic integration and recruitment of cognitive systems in enabling behavioral adaptability across both task and rest conditions.
Abstract: One of the most remarkable features of the human brain is its ability to adapt rapidly and efficiently to external task demands. Novel and non-routine tasks, for example, are implemented faster than structural connections can be formed. The neural underpinnings of these dynamics are far from understood. Here we develop and apply novel methods in network science to quantify how patterns of functional connectivity between brain regions reconfigure as human subjects perform 64 different tasks. By applying dynamic community detection algorithms, we identify groups of brain regions that form putative functional communities, and we uncover changes in these groups across the 64-task battery. We summarize these reconfiguration patterns by quantifying the probability that two brain regions engage in the same network community (or putative functional module) across tasks. These tools enable us to demonstrate that classically defined cognitive systems—including visual, sensorimotor, auditory, default mode, fronto-parietal, cingulo-opercular and salience systems—engage dynamically in cohesive network communities across tasks. We define the network role that a cognitive system plays in these dynamics along the following two dimensions: (i) stability vs. flexibility and (ii) connected vs. isolated. The role of each system is therefore summarized by how stably that system is recruited over the 64 tasks, and how consistently that system interacts with other systems. Using this cartography, classically defined cognitive systems can be categorized as ephemeral integrators, stable loners, and anything in between. Our results provide a new conceptual framework for understanding the dynamic integration and recruitment of cognitive systems in enabling behavioral adaptability across both task and rest conditions. This work has important implications for understanding cognitive network reconfiguration during different task sets and its relationship to cognitive effort, individual variation in cognitive performance, and fatigue.