Showing papers in "PLOS Computational Biology in 2017"

PDF

Open Access

Journal Article•DOI•

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

[...]

Ryan R. Wick¹, Louise M. Judd¹, Claire L. Gorrie¹, Kathryn E. Holt¹•Institutions (1)

08 Jun 2017-PLOS Computational Biology

TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low.

...read moreread less

Abstract: The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

...read moreread less

2,245 citations

Journal Article•DOI•

mixOmics: An R package for 'omics feature selection and multiple data integration

[...]

Florian Rohart¹, Benoit Gautier¹, Amrit Singh², Kim-Anh Lê Cao³, Kim-Anh Lê Cao¹ - Show less +1 more•Institutions (3)

University of Queensland¹, University of British Columbia², University of Melbourne³

03 Nov 2017-PLOS Computational Biology

TL;DR: MixOmics is introduced, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation and extends Projection to Latent Structure models for discriminant analysis.

...read moreread less

Abstract: The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.

...read moreread less

1,862 citations

Journal Article•DOI•

OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.

[...]

Peter Eastman¹, Jason M. Swails², John D. Chodera³, Robert T. McGibbon¹, Yutong Zhao¹, Kyle A. Beauchamp³, Lee-Ping Wang⁴, Andrew C. Simmonett⁵, Matthew P. Harrigan¹, Chaya D. Stern³, Rafal P. Wiewiora³, Bernard R. Brooks⁵, Vijay S. Pande¹ - Show less +9 more•Institutions (5)

Stanford University¹, Rutgers University², Memorial Sloan Kettering Cancer Center³, University of California, Davis⁴, National Institutes of Health⁵

26 Jul 2017-PLOS Computational Biology

TL;DR: OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility, which makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.

...read moreread less

Abstract: OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM. This makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.

...read moreread less

1,364 citations

Journal Article•DOI•

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

[...]

Sheng Wang¹, Siqi Sun¹, Zhen Li¹, Renyu Zhang¹, Jinbo Xu¹ - Show less +1 more•Institutions (1)

Toyota Technological Institute at Chicago¹

05 Jan 2017-PLOS Computational Biology

TL;DR: A new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks that greatly outperforms existing methods and leads to much more accurate contact-assisted folding.

...read moreread less

Abstract: Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/

...read moreread less

779 citations

Journal Article•DOI•

Projecting social contact matrices in 152 countries using contact surveys and demographic data.

[...]

Kiesha Prem¹, Alex R. Cook¹, Mark Jit², Mark Jit³•Institutions (3)

National University of Singapore¹, University of London², Health Protection Agency³

12 Sep 2017-PLOS Computational Biology

TL;DR: Estimates of mixing patterns for societies for which contact data such as POLYMOD are not yet available are provided, finding contact patterns are highly assortative with age across all countries considered, but pronounced regional differences in the age-specific contacts at home were noticeable.

...read moreread less

Abstract: Heterogeneities in contact networks have a major effect in determining whether a pathogen can become epidemic or persist at endemic levels. Epidemic models that determine which interventions can successfully prevent an outbreak need to account for social structure and mixing patterns. Contact patterns vary across age and locations (e.g. home, work, and school), and including them as predictors in transmission dynamic models of pathogens that spread socially will improve the models' realism. Data from population-based contact diaries in eight European countries from the POLYMOD study were projected to 144 other countries using a Bayesian hierarchical model that estimated the proclivity of age-and-location-specific contact patterns for the countries, using Markov chain Monte Carlo simulation. Household level data from the Demographic and Health Surveys for nine lower-income countries and socio-demographic factors from several on-line databases for 152 countries were used to quantify similarity of countries to estimate contact patterns in the home, work, school and other locations for countries for which no contact data are available, accounting for demographic structure, household structure where known, and a variety of metrics including workforce participation and school enrolment. Contacts are highly assortative with age across all countries considered, but pronounced regional differences in the age-specific contacts at home were noticeable, with more inter-generational contacts in Asian countries than in other settings. Moreover, there were variations in contact patterns by location, with work-place contacts being least assortative. These variations led to differences in the effect of social distancing measures in an age structured epidemic model. Contacts have an important role in transmission dynamic models that use contact rates to characterize the spread of contact-transmissible diseases. This study provides estimates of mixing patterns for societies for which contact data such as POLYMOD are not yet available.

...read moreread less

734 citations

Journal Article•DOI•

Transcriptomics technologies

[...]

Rohan G. T. Lowe¹, Neil J. Shirley², Mark R. Bleackley¹, Stephen K. Dolan³, Thomas Shafee¹ - Show less +1 more•Institutions (3)

La Trobe University¹, University of Adelaide², University of Cambridge³

18 May 2017-PLOS Computational Biology

TL;DR: The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline as mentioned in this paper, which has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease.

...read moreread less

Abstract: Transcriptomics technologies are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst noncoding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline. Transcriptomics has been defined by repeated technological innovations that transform the field. There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA sequencing (RNA-Seq), which uses high-throughput sequencing to capture all sequences. Measuring the expression of an organism’s genes in different tissues, conditions, or time points gives information on how genes are regulated and reveals details of an organism’s biology. It can also help to infer the functions of previously unannotated genes. Transcriptomic analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.

...read moreread less

525 citations

Journal Article•DOI•

Metacoder: An R package for visualization and manipulation of community taxonomic diversity data

[...]

Zachary Foster¹, Thomas J. Sharpton¹, Niklaus J. Grünwald²•Institutions (2)

Oregon State University¹, Agricultural Research Service²

21 Feb 2017-PLOS Computational Biology

TL;DR: Metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data, designed for data from metabarcoding research, can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data.

...read moreread less

Abstract: Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.

...read moreread less

409 citations

Journal Article•DOI•

Mindboggling morphometry of human brains

[...]

Arno Klein¹, Satrajit S. Ghosh², Satrajit S. Ghosh³, Forrest Sheng Bao⁴, Joachim Giard⁵, Yrjö Häme⁶, Eliezer Stavsky⁶, Noah Lee⁶, Brian Rossa, Martin Reuter³, Elias Chaibub Neto⁷, Anisha Keshavan⁸ - Show less +8 more•Institutions (8)

MIND Institute¹, McGovern Institute for Brain Research², Harvard University³, University of Akron⁴, Catholic University of Leuven⁵, Columbia University⁶, Sage Bionetworks⁷, University of California, San Francisco⁸

23 Feb 2017-PLOS Computational Biology

TL;DR: Mindboggle’s algorithms are evaluated using the largest set of manually labeled, publicly available brain images in the world and compare them against state-of-the-art algorithms where they exist and results are publicly available.

...read moreread less

Abstract: Mindboggle (http://mindboggle.info) is an open source brain morphometry platform that takes in preprocessed T1-weighted MRI data and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. In this article, we document the software and demonstrate its use in studies of shape variation in healthy and diseased humans. The number of different shape measures and the size of the populations make this the largest and most detailed shape analysis of human brains ever conducted. Brain image morphometry shows great potential for providing much-needed biological markers for diagnosing, tracking, and predicting progression of mental health disorders. Very few software algorithms provide more than measures of volume and cortical thickness, while more subtle shape measures may provide more sensitive and specific biomarkers. Mindboggle computes a variety of (primarily surface-based) shapes: area, volume, thickness, curvature, depth, Laplace-Beltrami spectra, Zernike moments, etc. We evaluate Mindboggle’s algorithms using the largest set of manually labeled, publicly available brain images in the world and compare them against state-of-the-art algorithms where they exist. All data, code, and results of these evaluations are publicly available.

...read moreread less

403 citations

Journal Article•DOI•

Fast online deconvolution of calcium imaging data.

[...]

Johannes Friedrich¹, Pengcheng Zhou¹, Pengcheng Zhou², Liam Paninski¹•Institutions (2)

Columbia University¹, Carnegie Mellon University²

14 Mar 2017-PLOS Computational Biology

TL;DR: The algorithm is a generalization of the pool adjacent violators algorithm (PAVA) for isotonic regression and inherits its linear-time computational complexity and gains remarkable increases in processing speed: more than one order of magnitude compared to currently employed state of the art convex solvers relying on interior point methods.

...read moreread less

Abstract: Fluorescent calcium indicators are a popular means for observing the spiking activity of large neuronal populations, but extracting the activity of each neuron from raw fluorescence calcium imaging data is a nontrivial problem. We present a fast online active set method to solve this sparse non-negative deconvolution problem. Importantly, the algorithm 3progresses through each time series sequentially from beginning to end, thus enabling real-time online estimation of neural activity during the imaging session. Our algorithm is a generalization of the pool adjacent violators algorithm (PAVA) for isotonic regression and inherits its linear-time computational complexity. We gain remarkable increases in processing speed: more than one order of magnitude compared to currently employed state of the art convex solvers relying on interior point methods. Unlike these approaches, our method can exploit warm starts; therefore optimizing model hyperparameters only requires a handful of passes through the data. A minor modification can further improve the quality of activity inference by imposing a constraint on the minimum spike size. The algorithm enables real-time simultaneous deconvolution of O(105) traces of whole-brain larval zebrafish imaging data on a laptop.

...read moreread less

390 citations

Journal Article•DOI•

PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction.

[...]

Zhu-Hong You¹, Zhi-An Huang², Zexuan Zhu², Guiying Yan¹, Zheng-Wei Li³, Zhenkun Wen², Xing Chen³ - Show less +3 more•Institutions (3)

Chinese Academy of Sciences¹, Shenzhen University², China University of Mining and Technology³

24 Mar 2017-PLOS Computational Biology

TL;DR: The reliable performance of Path-Based MiRNA-Disease Association is demonstrated, which demonstrates that PBMDA could serve as a powerful computational tool to accelerate the identification of disease-miRNA associations.

...read moreread less

Abstract: In the recent few years, an increasing number of studies have shown that microRNAs (miRNAs) play critical roles in many fundamental and important biological processes. As one of pathogenetic factors, the molecular mechanisms underlying human complex diseases still have not been completely understood from the perspective of miRNA. Predicting potential miRNA-disease associations makes important contributions to understanding the pathogenesis of diseases, developing new drugs, and formulating individualized diagnosis and treatment for diverse human complex diseases. Instead of only depending on expensive and time-consuming biological experiments, computational prediction models are effective by predicting potential miRNA-disease associations, prioritizing candidate miRNAs for the investigated diseases, and selecting those miRNAs with higher association probabilities for further experimental validation. In this study, Path-Based MiRNA-Disease Association (PBMDA) prediction model was proposed by integrating known human miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity for miRNAs and diseases. This model constructed a heterogeneous graph consisting of three interlinked sub-graphs and further adopted depth-first search algorithm to infer potential miRNA-disease associations. As a result, PBMDA achieved reliable performance in the frameworks of both local and global LOOCV (AUCs of 0.8341 and 0.9169, respectively) and 5-fold cross validation (average AUC of 0.9172). In the cases studies of three important human diseases, 88% (Esophageal Neoplasms), 88% (Kidney Neoplasms) and 90% (Colon Neoplasms) of top-50 predicted miRNAs have been manually confirmed by previous experimental reports from literatures. Through the comparison performance between PBMDA and other previous models in case studies, the reliable performance also demonstrates that PBMDA could serve as a powerful computational tool to accelerate the identification of disease-miRNA associations.

...read moreread less

323 citations

Journal Article•DOI•

A comprehensive overview and evaluation of circular RNA detection tools.

[...]

Xiangxiang Zeng¹, Wei Lin¹, Maozu Guo², Quan Zou³•Institutions (3)

Xiamen University¹, Beijing University of Civil Engineering and Architecture², Tianjin University³

08 Jun 2017-PLOS Computational Biology

TL;DR: An improved and easy-to-use circRNA read simulator that can produce mimicking backsplicing reads supporting circRNAs deposited in CircBase is provided and the performance of 11 circRNA detection tools on both simulated and real datasets is compared.

...read moreread less

Abstract: Circular RNA (circRNA) is mainly generated by the splice donor of a downstream exon joining to an upstream splice acceptor, a phenomenon known as backsplicing. It has been reported that circRNA can function as microRNA (miRNA) sponges, transcriptional regulators, or potential biomarkers. The availability of massive non-polyadenylated transcriptomes data has facilitated the genome-wide identification of thousands of circRNAs. Several circRNA detection tools or pipelines have recently been developed, and it is essential to provide useful guidelines on these pipelines for users, including a comprehensive and unbiased comparison. Here, we provide an improved and easy-to-use circRNA read simulator that can produce mimicking backsplicing reads supporting circRNAs deposited in CircBase. Moreover, we compared the performance of 11 circRNA detection tools on both simulated and real datasets. We assessed their performance regarding metrics such as precision, sensitivity, F1 score, and Area under Curve. It is concluded that no single method dominated on all of these metrics. Among all of the state-of-the-art tools, CIRI, CIRCexplorer, and KNIFE, which achieved better balanced performance between their precision and sensitivity, compared favorably to the other methods.

...read moreread less

Journal Article•DOI•

Good enough practices in scientific computing

[...]

Greg Wilson, Jennifer Bryan¹, Karen Cranston², Justin Kitzes³, Lex Nederbragt⁴, Tracy K. Teal - Show less +2 more•Institutions (4)

University of British Columbia¹, Duke University², University of California, Berkeley³, University of Oslo⁴

22 Jun 2017-PLOS Computational Biology

TL;DR: In this article, the authors present a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.

...read moreread less

Abstract: Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don't know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.

...read moreread less

Journal Article•DOI•

Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis.

[...]

Jörn Diedrichsen¹, Nikolaus Kriegeskorte²•Institutions (2)

University of Western Ontario¹, University of Cambridge²

24 Apr 2017-PLOS Computational Biology

TL;DR: A common mathematical framework is developed for understanding the relationship of these three methods, which share one core commonality: all three evaluate the second moment of the distribution of activity profiles, which determines the representational geometry, and thus how well any feature can be decoded from population activity.

...read moreread less

Abstract: Representational models specify how activity patterns in populations of neurons (or, more generally, in multivariate brain-activity measurements) relate to sensory stimuli, motor responses, or cognitive processes. In an experimental context, representational models can be defined as hypotheses about the distribution of activity profiles across experimental conditions. Currently, three different methods are being used to test such hypotheses: encoding analysis, pattern component modeling (PCM), and representational similarity analysis (RSA). Here we develop a common mathematical framework for understanding the relationship of these three methods, which share one core commonality: all three evaluate the second moment of the distribution of activity profiles, which determines the representational geometry, and thus how well any feature can be decoded from population activity. Using simulated data for three different experimental designs, we compare the power of the methods to adjudicate between competing representational models. PCM implements a likelihood-ratio test and therefore provides the most powerful test if its assumptions hold. However, the other two approaches-when conducted appropriately-can perform similarly. In encoding analysis, the linear model needs to be appropriately regularized, which effectively imposes a prior on the activity profiles. With such a prior, an encoding model specifies a well-defined distribution of activity profiles. In RSA, the unequal variances and statistical dependencies of the dissimilarity estimates need to be taken into account to reach near-optimal power in inference. The three methods render different aspects of the information explicit (e.g. single-response tuning in encoding analysis and population-response representational dissimilarity in RSA) and have specific advantages in terms of computational demands, ease of use, and extensibility. The three methods are properly construed as complementary components of a single data-analytical toolkit for understanding neural representations on the basis of multivariate brain-activity data.

...read moreread less

Journal Article•DOI•

Ten simple rules for responsible big data research

[...]

Matthew Zook¹, Solon Barocas², danah boyd², Kate Crawford², Kate Crawford³, Emily F. Keller, Seeta Peña Gangadharan⁴, Alyssa A. Goodman⁵, Rachelle D. Hollander⁶, Barbara A. Koenig⁷, Jacob Metcalf, Arvind Narayanan⁸, Alondra Nelson⁹, Frank A. Pasquale¹⁰ - Show less +10 more•Institutions (10)

University of Kentucky¹, Microsoft², New York University³, London School of Economics and Political Science⁴, Harvard University⁵, National Academy of Engineering⁶, University of California, San Francisco⁷, Princeton University⁸, Columbia University⁹, University of Maryland, Baltimore¹⁰

30 Mar 2017-PLOS Computational Biology

TL;DR: The tools of big data research are increasingly woven into the authors' daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

...read moreread less

Abstract: The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more. The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult. Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm. The variety in data sources, research topics, and methodological approaches in big data belies a one-size-fits-all checklist; as a result, these rules are less specific than some might hope. Rather, we exhort researchers to recognize the human participants and complex systems contained within their data and make grappling with ethical questions part of their standard workflow. Towards this end, we structure the first five rules around how to reduce the chance of harm resulting from big data research practices; the second five rules focus on ways researchers can contribute to building best practices that fit their disciplinary and methodological approaches. At the core of these rules, we challenge big data researchers who consider their data disentangled from the ability to harm to reexamine their assumptions. The examples in this paper show how often even seemingly innocuous and anonymized data have produced unanticipated ethical questions and detrimental impacts. This paper is a result of a two-year National Science Foundation (NSF)-funded project that established the Council for Big Data, Ethics, and Society, a group of 20 scholars from a wide range of social, natural, and computational sciences (http://bdes.datasociety.net/). The Council was charged with providing guidance to the NSF on how to best encourage ethical practices in scientific and engineering research, utilizing big data research methods and infrastructures [1].

...read moreread less

Journal Article•DOI•

BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods

[...]

Krzysztof J. Gorgolewski¹, Fidel Alfaro-Almagro², Tibor Auer³, Pierre Bellec⁴, Mihai Capota⁵, M. Mallar Chakravarty⁶, M. Mallar Chakravarty⁷, Nathan W. Churchill⁸, Alexander L. Cohen⁹, R. Cameron Craddock¹⁰, Gabriel A. Devenyi⁶, Gabriel A. Devenyi⁷, Anders Eklund¹¹, Oscar Esteban¹, Guillaume Flandin¹², Satrajit S. Ghosh¹³, Satrajit S. Ghosh¹⁴, J. Swaroop Guntupalli¹⁵, Mark Jenkinson², Anisha Keshavan¹⁶, Gregory Kiar¹⁷, Franziskus Liem¹⁸, Pradeep Reddy Raamana¹⁹, David Raffelt²⁰, Christopher J. Steele⁷, Christopher J. Steele⁶, Pierre-Olivier Quirion¹¹, Robert E. Smith²⁰, Stephen C. Strother¹⁹, Gaël Varoquaux²¹, Yida Wang⁵, Tal Yarkoni²², Russell A. Poldrack¹ - Show less +29 more•Institutions (22)

09 Mar 2017-PLOS Computational Biology

TL;DR: This work introduces a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS).

...read moreread less

Abstract: The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the comprehensiveness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms.

...read moreread less

Journal Article•DOI•

Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors.

[...]

François Serra¹, Davide Baù¹, Mike N. Goodstadt¹, David Castillo¹, Guillaume J. Filion¹, Marc A. Marti-Renom - Show less +2 more•Institutions (1)

Pompeu Fabra University¹

19 Jul 2017-PLOS Computational Biology

TL;DR: TADbit provides three-dimensional models built from 3C-based experiments, which are ready for visualization and for characterizing their relation to gene expression and epigenetic states, and TADbit is an open-source Python library available for download.

...read moreread less

Abstract: The sequence of a genome is insufficient to understand all genomic processes carried out in the cell nucleus. To achieve this, the knowledge of its three-dimensional architecture is necessary. Advances in genomic technologies and the development of new analytical methods, such as Chromosome Conformation Capture (3C) and its derivatives, provide unprecedented insights in the spatial organization of genomes. Here we present TADbit, a computational framework to analyze and model the chromatin fiber in three dimensions. Our package takes as input the sequencing reads of 3C-based experiments and performs the following main tasks: (i) pre-process the reads, (ii) map the reads to a reference genome, (iii) filter and normalize the interaction data, (iv) analyze the resulting interaction matrices, (v) build 3D models of selected genomic domains, and (vi) analyze the resulting models to characterize their structural properties. To illustrate the use of TADbit, we automatically modeled 50 genomic domains from the fly genome revealing differential structural features of the previously defined chromatin colors, establishing a link between the conformation of the genome and the local chromatin composition. TADbit provides three-dimensional models built from 3C-based experiments, which are ready for visualization and for characterizing their relation to gene expression and epigenetic states. TADbit is an open-source Python library available for download from https://github.com/3DGenomes/tadbit.

...read moreread less

Journal Article•DOI•

TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions.

[...]

Zixuan Cang¹, Guo-Wei Wei¹•Institutions (1)

Michigan State University¹

27 Jul 2017-PLOS Computational Biology

TL;DR: A multi-task multichannel topological convolutional neural network (MM-TCNN) that outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein foldingfree energy changes, and mutation induced membrane protein folding free energy changes.

...read moreread less

Abstract: Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the geometric and biological complexity. To address this problem we introduce the element-specific persistent homology (ESPH) method. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains important biological information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomolecules. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topological convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes. Availability: weilab.math.msu.edu/TDL/

...read moreread less

Journal Article•DOI•

Predictive representations can link model-based reinforcement learning to model-free mechanisms.

[...]

Evan M. Russek¹, Ida Momennejad², Matthew Botvinick³, Samuel J. Gershman⁴, Nathaniel D. Daw² - Show less +1 more•Institutions (4)

Center for Neural Science¹, Princeton University², University College London³, Harvard University⁴

25 Sep 2017-PLOS Computational Biology

TL;DR: This work lays out a family of approaches by which model-based computation may be built upon a core of TD learning, and suggests that this framework represents a neurally plausible family of mechanisms for model- based evaluation.

...read moreread less

Abstract: Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.

...read moreread less

Journal Article•DOI•

Could a Neuroscientist Understand a Microprocessor

[...]

Eric Jonas¹, Konrad P. Kording²•Institutions (2)

University of California, Berkeley¹, Rehabilitation Institute of Chicago²

12 Jan 2017-PLOS Computational Biology

TL;DR: It is shown that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor, suggesting current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data.

...read moreread less

Abstract: There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor. This suggests current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data. Additionally, we argue for scientists using complex non-linear dynamical systems with known ground truth, such as the microprocessor as a validation platform for time-series and structure discovery methods.

...read moreread less

Journal Article•DOI•

Comparing individual-based approaches to modelling the self-organization of multicellular tissues

[...]

James M. Osborne¹, Alexander G. Fletcher², Joe Pitt-Francis³, Philip K. Maini³, David J. Gavaghan³ - Show less +1 more•Institutions (3)

University of Melbourne¹, University of Sheffield², University of Oxford³

13 Feb 2017-PLOS Computational Biology

TL;DR: This paper compares model implementations using four case studies, chosen to reflect the key cellular processes of proliferation, adhesion, and short- and long-range signalling, and demonstrates the applicability of each model and provides a guide for model usage.

...read moreread less

Abstract: The coordinated behaviour of populations of cells plays a central role in tissue growth and renewal. Cells react to their microenvironment by modulating processes such as movement, growth and proliferation, and signalling. Alongside experimental studies, computational models offer a useful means by which to investigate these processes. To this end a variety of cell-based modelling approaches have been developed, ranging from lattice-based cellular automata to lattice-free models that treat cells as point-like particles or extended shapes. However, it remains unclear how these approaches compare when applied to the same biological problem, and what differences in behaviour are due to different model assumptions and abstractions. Here, we exploit the availability of an implementation of five popular cell-based modelling approaches within a consistent computational framework, Chaste (http://www.cs.ox.ac.uk/chaste). This framework allows one to easily change constitutive assumptions within these models. In each case we provide full details of all technical aspects of our model implementations. We compare model implementations using four case studies, chosen to reflect the key cellular processes of proliferation, adhesion, and short- and long-range signalling. These case studies demonstrate the applicability of each model and provide a guide for model usage.

...read moreread less

Journal Article•DOI•

LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction.

[...]

Xing Chen¹, Li Huang²•Institutions (2)

China University of Mining and Technology¹, National University of Singapore²

18 Dec 2017-PLOS Computational Biology

TL;DR: A computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace and would be a valuable computational tool for miRNA-disease association prediction.

...read moreread less

Abstract: Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs' potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases' statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model's superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction.

...read moreread less

Journal Article•DOI•

Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.

[...]

Michal Bassani-Sternberg¹, Chloe Chong¹, Chloe Chong², Philippe Guillaume¹, Philippe Guillaume², Marthe Solleder³, Marthe Solleder¹, HuiSong Pak¹, HuiSong Pak², Philippe O. Gannon², Lana E. Kandalaft², Lana E. Kandalaft¹, George Coukos², George Coukos¹, David Gfeller³, David Gfeller¹, David Gfeller² - Show less +13 more•Institutions (3)

University of Lausanne¹, University Hospital of Lausanne², Swiss Institute of Bioinformatics³

23 Aug 2017-PLOS Computational Biology

TL;DR: The approach recapitulates and refines known motifs for 43 of the most frequent alleles, uncovers new motifs and provides a scalable framework to incorporate additional HLA peptidomics studies in the future and improves neo-antigen and cancer testis antigen predictions.

...read moreread less

Abstract: The precise identification of Human Leukocyte Antigen class I (HLA-I) binding motifs plays a central role in our ability to understand and predict (neo-)antigen presentation in infectious diseases and cancer. Here, by exploiting co-occurrence of HLA-I alleles across ten newly generated as well as forty public HLA peptidomics datasets comprising more than 115,000 unique peptides, we show that we can rapidly and accurately identify many HLA-I binding motifs and map them to their corresponding alleles without any a priori knowledge of HLA-I binding specificity. Our approach recapitulates and refines known motifs for 43 of the most frequent alleles, uncovers new motifs for 9 alleles that up to now had less than five known ligands and provides a scalable framework to incorporate additional HLA peptidomics studies in the future. The refined motifs improve neo-antigen and cancer testis antigen predictions, indicating that unbiased HLA peptidomics data are ideal for in silico predictions of neo-antigens from tumor exome sequencing data. The new motifs further reveal distant modulation of the binding specificity at P2 for some HLA-I alleles by residues in the HLA-I binding site but outside of the B-pocket and we unravel the underlying mechanisms by protein structure analysis, mutagenesis and in vitro binding assays.

...read moreread less

Journal Article•DOI•

Active Vertex Model for cell-resolution description of epithelial tissue mechanics.

[...]

Daniel L. Barton¹, Silke Henkes², Cornelis J. Weijer¹, Rastko Sknepnek¹•Institutions (2)

University of Dundee¹, University of Aberdeen²

30 Jun 2017-PLOS Computational Biology

TL;DR: The Active Vertex Model (AVM) as discussed by the authors was proposed for cell-resolution studies of the mechanics of confluent epithelial tissues consisting of tens of thousands of cells, with a level of detail inaccessible to other methods.

...read moreread less

Abstract: We introduce an Active Vertex Model (AVM) for cell-resolution studies of the mechanics of confluent epithelial tissues consisting of tens of thousands of cells, with a level of detail inaccessible to similar methods. The AVM combines the Vertex Model for confluent epithelial tissues with active matter dynamics. This introduces a natural description of the cell motion and accounts for motion patterns observed on multiple scales. Furthermore, cell contacts are generated dynamically from positions of cell centres. This not only enables efficient numerical implementation, but provides a natural description of the T1 transition events responsible for local tissue rearrangements. The AVM also includes cell alignment, cell-specific mechanical properties, cell growth, division and apoptosis. In addition, the AVM introduces a flexible, dynamically changing boundary of the epithelial sheet allowing for studies of phenomena such as the fingering instability or wound healing. We illustrate these capabilities with a number of case studies.

...read moreread less

Journal Article•DOI•

BacArena: Individual-based metabolic modeling of heterogeneous microbes in complex communities

[...]

Eugen Bauer¹, Johannes Zimmermann², Federico Baldini¹, Ines Thiele¹, Christoph Kaleta² - Show less +1 more•Institutions (2)

University of Luxembourg¹, University of Kiel²

22 May 2017-PLOS Computational Biology

TL;DR: This study combinesconstraint-based and individual-based modeling techniques into the R package BacArena to generate novel biological insights into Pseudomonas aeruginosa biofilm formation as well as a seven species model community of the human gut.

...read moreread less

Abstract: Recent advances focusing on the metabolic interactions within and between cellular populations have emphasized the importance of microbial communities for human health Constraint-based modeling, with flux balance analysis in particular, has been established as a key approach for studying microbial metabolism, whereas individual-based modeling has been commonly used to study complex dynamics between interacting organisms In this study, we combine both techniques into the R package BacArena (https://cranr-projectorg/package=BacArena) to generate novel biological insights into Pseudomonas aeruginosa biofilm formation as well as a seven species model community of the human gut For our P aeruginosa model, we found that cross-feeding of fermentation products cause a spatial differentiation of emerging metabolic phenotypes in the biofilm over time In the human gut model community, we found that spatial gradients of mucus glycans are important for niche formations which shape the overall community structure Additionally, we could provide novel hypothesis concerning the metabolic interactions between the microbes These results demonstrate the importance of spatial and temporal multi-scale modeling approaches such as BacArena

...read moreread less

Journal Article•DOI•

Human mobility and the spatial transmission of influenza in the United States

[...]

Vivek Charu¹, Vivek Charu², Scott L. Zeger¹, Julia R. Gog³, Julia R. Gog², Ottar N. Bjørnstad⁴, Ottar N. Bjørnstad², Stephen M Kissler³, Lone Simonsen⁵, Lone Simonsen², Bryan T. Grenfell⁶, Bryan T. Grenfell², Cécile Viboud² - Show less +9 more•Institutions (6)

Johns Hopkins University¹, National Institutes of Health², University of Cambridge³, Pennsylvania State University⁴, University of Copenhagen⁵, Princeton University⁶

10 Feb 2017-PLOS Computational Biology

TL;DR: Gravity model estimates indicate a sharp decay in influenza transmission with the distance between infectious and susceptible cities, consistent with spread dominated by work commutes rather than air traffic.

...read moreread less

Abstract: Seasonal influenza epidemics offer unique opportunities to study the invasion and re-invasion waves of a pathogen in a partially immune population. Detailed patterns of spread remain elusive, however, due to lack of granular disease data. Here we model high-volume city-level medical claims data and human mobility proxies to explore the drivers of influenza spread in the US during 2002-2010. Although the speed and pathways of spread varied across seasons, seven of eight epidemics likely originated in the Southern US. Each epidemic was associated with 1-5 early long-range transmission events, half of which sparked onward transmission. Gravity model estimates indicate a sharp decay in influenza transmission with the distance between infectious and susceptible cities, consistent with spread dominated by work commutes rather than air traffic. Two early-onset seasons associated with antigenic novelty had particularly localized modes of spread, suggesting that novel strains may spread in a more localized fashion than previously anticipated.

...read moreread less

Journal Article•DOI•

A deep convolutional neural network for classification of red blood cells in sickle cell anemia

[...]

Mengjia Xu¹, Mengjia Xu², Dimitrios P. Papageorgiou³, Sabia Z. Abidi³, Ming Dao³, Hong Zhao², George Em Karniadakis¹ - Show less +3 more•Institutions (3)

Brown University¹, Chinese Ministry of Education², Massachusetts Institute of Technology³

19 Oct 2017-PLOS Computational Biology

TL;DR: In this article, a hierarchical RBC extraction method was proposed to detect the RBC region (ROI) from the background, and then separate touching RBCs in the ROI images by applying an improved random walk method based on automatic seed generation.

...read moreread less

Abstract: Sickle cell disease (SCD) is a hematological disorder leading to blood vessel occlusion accompanied by painful episodes and even death. Red blood cells (RBCs) of SCD patients have diverse shapes that reveal important biomechanical and bio-rheological characteristics, e.g. their density, fragility, adhesive properties, etc. Hence, having an objective and effective way of RBC shape quantification and classification will lead to better insights and eventual better prognosis of the disease. To this end, we have developed an automated, high-throughput, ex-vivo RBC shape classification framework that consists of three stages. First, we present an automatic hierarchical RBC extraction method to detect the RBC region (ROI) from the background, and then separate touching RBCs in the ROI images by applying an improved random walk method based on automatic seed generation. Second, we apply a mask-based RBC patch-size normalization method to normalize the variant size of segmented single RBC patches into uniform size. Third, we employ deep convolutional neural networks (CNNs) to realize RBC classification; the alternating convolution and pooling operations can deal with non-linear and complex patterns. Furthermore, we investigate the specific shape factor quantification for the classified RBC image data in order to develop a general multiscale shape analysis. We perform several experiments on raw microscopy image datasets from 8 SCD patients (over 7,000 single RBC images) through a 5-fold cross validation method both for oxygenated and deoxygenated RBCs. We demonstrate that the proposed framework can successfully classify sickle shape RBCs in an automated manner with high accuracy, and we also provide the corresponding shape factor analysis, which can be used synergistically with the CNN analysis for more robust predictions. Moreover, the trained deep CNN exhibits good performance even for a deoxygenated dataset and distinguishes the subtle differences in texture alteration inside the oxygenated and deoxygenated RBCs.

...read moreread less

Journal Article•DOI•

Post-transcriptional regulation across human tissues.

[...]

Alexander Franks¹, Edoardo M. Airoldi², Nikolai Slavov³•Institutions (3)

University of Washington¹, Harvard University², Northeastern University³

08 May 2017-PLOS Computational Biology

TL;DR: Estimating the contribution of transcript levels to orthogonal sources of variability found that scaled mRNA levels can account for most of the mean-level-variability but not necessarily for across-tissues variability, suggesting extensive post-transcriptional regulation.

...read moreread less

Abstract: Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physiological variability of the same protein across different tissue types, i.e., across-tissues variability. We sought to estimate the contribution of transcript levels to these two orthogonal sources of variability, and found that scaled mRNA levels can account for most of the mean-level-variability but not necessarily for across-tissues variability. The reliable quantification of the latter estimate is limited by substantial measurement noise. However, protein-to-mRNA ratios exhibit substantial across-tissues variability that is functionally concerted and reproducible across different datasets, suggesting extensive post-transcriptional regulation. These results caution against estimating protein fold-changes from mRNA fold-changes between different cell-types, and highlight the contribution of post-transcriptional regulation to shaping tissue-type-specific proteomes.

...read moreread less

Journal Article•DOI•

Interrogating the topological robustness of gene regulatory circuits by randomization.

[...]

Bin Huang¹, Mingyang Lu¹, Dongya Jia¹, Eshel Ben-Jacob¹, Eshel Ben-Jacob², Herbert Levine, José N. Onuchic - Show less +3 more•Institutions (2)

Rice University¹, Tel Aviv University²

31 Mar 2017-PLOS Computational Biology

TL;DR: The results suggest that dynamics of a gene circuit is mainly determined by its topology, not by detailed circuit parameters, and provides a theoretical foundation for circuit-based systems biology modeling.

...read moreread less

Abstract: One of the most important roles of cells is performing their cellular tasks properly for survival. Cells usually achieve robust functionality, for example, cell-fate decision-making and signal transduction, through multiple layers of regulation involving many genes. Despite the combinatorial complexity of gene regulation, its quantitative behavior has been typically studied on the basis of experimentally verified core gene regulatory circuitry, composed of a small set of important elements. It is still unclear how such a core circuit operates in the presence of many other regulatory molecules and in a crowded and noisy cellular environment. Here we report a new computational method, named random circuit perturbation (RACIPE), for interrogating the robust dynamical behavior of a gene regulatory circuit even without accurate measurements of circuit kinetic parameters. RACIPE generates an ensemble of random kinetic models corresponding to a fixed circuit topology, and utilizes statistical tools to identify generic properties of the circuit. By applying RACIPE to simple toggle-switch-like motifs, we observed that the stable states of all models converge to experimentally observed gene state clusters even when the parameters are strongly perturbed. RACIPE was further applied to a proposed 22-gene network of the Epithelial-to-Mesenchymal Transition (EMT), from which we identified four experimentally observed gene states, including the states that are associated with two different types of hybrid Epithelial/Mesenchymal phenotypes. Our results suggest that dynamics of a gene circuit is mainly determined by its topology, not by detailed circuit parameters. Our work provides a theoretical foundation for circuit-based systems biology modeling. We anticipate RACIPE to be a powerful tool to predict and decode circuit design principles in an unbiased manner, and to quantitatively evaluate the robustness and heterogeneity of gene expression.

...read moreread less

Journal Article•DOI•

Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies

[...]

Cox Lwaka Tamba¹, Yuan-Li Ni¹, Yuan-Ming Zhang¹, Yuan-Ming Zhang²•Institutions (2)

Nanjing Agricultural University¹, Huazhong Agricultural University²

31 Jan 2017-PLOS Computational Biology

TL;DR: This study used an iterative modified-sure independence screening (ISIS) approach in reducing the number of SNPs to a moderate size and identified most previously reported genes, suggesting the new method is a good alternative for multi-locus GWAS.

...read moreread less

Abstract: Genome-wide association study (GWAS) entails examining a large number of single nucleotide polymorphisms (SNPs) in a limited sample with hundreds of individuals, implying a variable selection problem in the high dimensional dataset. Although many single-locus GWAS approaches under polygenic background and population structure controls have been widely used, some significant loci fail to be detected. In this study, we used an iterative modified-sure independence screening (ISIS) approach in reducing the number of SNPs to a moderate size. Expectation-Maximization (EM)-Bayesian least absolute shrinkage and selection operator (BLASSO) was used to estimate all the selected SNP effects for true quantitative trait nucleotide (QTN) detection. This method is referred to as ISIS EM-BLASSO algorithm. Monte Carlo simulation studies validated the new method, which has the highest empirical power in QTN detection and the highest accuracy in QTN effect estimation, and it is the fastest, as compared with efficient mixed-model association (EMMA), smoothly clipped absolute deviation (SCAD), fixed and random model circulating probability unification (FarmCPU), and multi-locus random-SNP-effect mixed linear model (mrMLM). To further demonstrate the new method, six flowering time traits in Arabidopsis thaliana were re-analyzed by four methods (New method, EMMA, FarmCPU, and mrMLM). As a result, the new method identified most previously reported genes. Therefore, the new method is a good alternative for multi-locus GWAS.

...read moreread less

Journal Article•DOI•

Towards a theory of cortical columns: From spiking neurons to interacting neural populations of finite size.

[...]

Tilo Schwalger¹, Moritz Deger¹, Moritz Deger², Wulfram Gerstner¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Cologne²

19 Apr 2017-PLOS Computational Biology

TL;DR: The theory establishes a general framework for modeling finite-size neural population dynamics based on single cell and synapse parameters and offers an efficient approach to analyzing cortical circuits and computations.

...read moreread less

Abstract: Neural population equations such as neural mass or field models are widely used to study brain activity on a large scale. However, the relation of these models to the properties of single neurons is unclear. Here we derive an equation for several interacting populations at the mesoscopic scale starting from a microscopic model of randomly connected generalized integrate-and-fire neuron models. Each population consists of 50-2000 neurons of the same type but different populations account for different neuron types. The stochastic population equations that we find reveal how spike-history effects in single-neuron dynamics such as refractoriness and adaptation interact with finite-size fluctuations on the population level. Efficient integration of the stochastic mesoscopic equations reproduces the statistical behavior of the population activities obtained from microscopic simulations of a full spiking neural network model. The theory describes nonlinear emergent dynamics such as finite-size-induced stochastic transitions in multistable networks and synchronization in balanced networks of excitatory and inhibitory neurons. The mesoscopic equations are employed to rapidly integrate a model of a cortical microcircuit consisting of eight neuron types, which allows us to predict spontaneous population activities as well as evoked responses to thalamic input. Our theory establishes a general framework for modeling finite-size neural population dynamics based on single cell and synapse parameters and offers an efficient approach to analyzing cortical circuits and computations.

...read moreread less

Collapse