scispace - formally typeset
Search or ask a question

Showing papers in "bioRxiv in 2013"


Posted ContentDOI
02 Dec 2013-bioRxiv
TL;DR: This work develops efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework and proposes useful heuristic scores to identify the number of populations represented in a dataset and a new hierarchical prior to detect weak population structure in the data.
Abstract: Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a dataset and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data, and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias towards detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

99 citations


Posted ContentDOI
01 Dec 2013-bioRxiv
TL;DR: Broad data dissemination is essential for advancements in genetics, but also brings to light concerns regarding privacy as well as potential mitigation strategies for privacy-preserving dissemination of genetic data.
Abstract: We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.

79 citations


Posted ContentDOI
12 Nov 2013-bioRxiv
TL;DR: This work uses restriction associated DNA sequencing to obtain several thousand genome wide sequence markers and uses association mapping to identify previously unmapped colour pattern loci, in particular the Ro locus, which confirms that the colour pattern controlling loci account for the majority of divergent regions across the genome.
Abstract: Hybrid zones can be valuable tools for studying evolution and identifying genomic regions responsible for adaptive divergence and underlying phenotypic variation. Hybrid zones between subspecies of Heliconius butterflies can be very narrow and are maintained by strong selection acting on colour pattern. The co-mimetic species H. erato and H. melpomene have parallel hybrid zones where both species undergo a change from one colour pattern form to another. We use restriction associated DNA sequencing to obtain several thousand genome wide sequence markers and use these to analyse patterns of population divergence across two pairs of parallel hybrid zones in Peru and Ecuador. We compare two approaches for analysis of this type of data; alignment to a reference genome and de novo assembly, and find that alignment gives the best results for species both closely (H. melpomene) and distantly (H. erato, ~15% divergent) related to the reference sequence. Our results confirm that the colour pattern controlling loci account for the majority of divergent regions across the genome, but we also detect other divergent regions apparently unlinked to colour pattern differences. We also use association mapping to identify previously unmapped colour pattern loci, in particular the Ro locus. Finally, we identify within our sample a new cryptic population of H. timareta in Ecuador, which occurs at relatively low altitude and is mimetic with H. melpomene malleti.

72 citations


Posted ContentDOI
03 Dec 2013-bioRxiv
TL;DR: It is shown through re-analysis of an empirical RADseq data set that indels are a common feature of such data, even at shallow phylogenetic scales, as well as an optional hierarchical clustering method which allow it to rapidly assemble phylogenetic data sets with hundreds of sampled individuals.
Abstract: Restriction-site associated genomic markers are a powerful tool for investigating evolutionary questions at the population level, but are limited in their utility at deeper phylogenetic scales where fewer orthologous loci are typically recovered across disparate taxa. While this limitation stems in part from mutations to restriction recognition sites that disrupt data generation, an alternative source of data loss comes from the failure to identify homology during bioinformatic analyses. Clustering methods that allow for lower similarity thresholds and the inclusion of indel variation will perform better at assembling RADseq loci at the phylogenetic scale. PyRAD is a pipeline to assemble de novo RADseq loci with the aim of optimizing coverage across phylogenetic data sets. It utilizes a wrapper around an alignment-clustering algorithm which allows for indel variation within and between samples, as well as for incomplete overlap among reads (e.g., paired-end). Here I compare PyRAD with the program Stacks in their performance analyzing a simulated RADseq data set that includes indel variation. Indels disrupt clustering of homologous loci in Stacks but not in PyRAD, such that the latter recovers more shared loci across disparate taxa. I show through re-analysis of an empirical RADseq data set that indels are a common feature of such data, even at shallow phylogenetic scales. PyRAD utilizes parallel processing as well as an optional hierarchical clustering method which allow it to rapidly assemble phylogenetic data sets with hundreds of sampled individuals.

70 citations


Posted ContentDOI
07 Nov 2013-bioRxiv
TL;DR: A powerful method for identifying genetic loci that influence protein expression in very large populations of the yeast Saccharomyes cerevisiae is developed and detected many more loci per gene than previous studies.
Abstract: Many DNA sequence variants influence phenotypes by altering gene expression. Our understanding of these variants is limited by sample sizes of current studies and by measurements of mRNA rather than protein abundance. We developed a powerful method for identifying genetic loci that influence protein expression in very large populations of the yeast Saccharomyes cerevisiae. The method measures single-cell protein abundance through the use of green-fluorescent-protein tags. We applied this method to 160 genes and detected many more loci per gene than previous studies. We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene. Most loci cluster at hotspot locations that influence multiple proteins—in some cases, more than half of those examined. The variants that underlie these hotspots have profound effects on the gene regulatory network and provide insights into genetic variation in cell physiology between yeast strains.

65 citations


Posted ContentDOI
19 Nov 2013-bioRxiv
TL;DR: A statistical model that uses association statistics computed across the genome to identify classes of genomic element that are enriched or depleted for loci that influence a trait and increases the number of loci with high-confidence associations by around 5%.
Abstract: Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWAS). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. We describe a statistical model that uses association statistics computed across the genome to identify classes of genomic element that are enriched or depleted for loci that influence a trait. The model naturally incorporates multiple types of annotations. We applied the model to GWAS of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, BMI, and Crohn’s disease. For each trait, we evaluated the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over a hundred tissues and cell lines. We show that the fraction of phenotype-associated SNPs that influence protein sequence ranges from around 2% (for platelet volume) up to around 20% (for LDL cholesterol); that repressed chromatin is significantly depleted for SNPs associated with several traits; and that cell type-specific DNase-I hypersensitive sites are enriched for SNPs associated with several traits (for example, fibroblasts in Crohn’s disease and muscle tissue in bone density). Finally, by re-weighting each GWAS using information from functional genomics, we increase the number of loci with high-confidence associations by around 5%

64 citations


Posted ContentDOI
22 Dec 2013-bioRxiv
TL;DR: A generative model is developed that allows for the systematic study of the presence of community structure and its impact on network function and dynamics in empirical biological networks.
Abstract: A modular pattern, also called community structure, is ubiquitous in biological networks. There has been an increased interest in unraveling the community structure of biological systems as it may provide important insights into a system's functional components and the impact of local structures on dynamics at a global scale. Choosing an appropriate community detection algorithm to identify the community structure in an empirical network can be difficult, however, as the many algorithms available are based on a variety of cost functions and are difficult to validate. Even when community structure is identified in an empirical system, disentangling the effect of community structure from other network properties such as clustering coefficient and assortativity can be a challenge. Here, we develop a generative model to produce undirected, simple, connected graphs with a specified degrees and pattern of communities, while maintaining a graph structure that is as random as possible. Additionally, we demonstrate two important applications of our model: (a) to generate networks that can be used to benchmark existing and new algorithms for detecting communities in biological networks; and (b) to generate null models to serve as random controls when investigating the impact of complex network features beyond the byproduct of degree and modularity in empirical biological networks. Our model allows for the systematic study of the presence of community structure and its impact on network function and dynamics. This process is a crucial step in unraveling the functional consequences of the structural properties of biological systems and uncovering the mechanisms that drive these systems.

37 citations


Posted ContentDOI
17 Dec 2013-bioRxiv
TL;DR: The newly identified regions of selection signatures in worldwide sheep populations reveal the extensive genome response to selection on morphology, color and adaptation to new environments.
Abstract: The diversity of populations in domestic species offers great opportunities to study genome response to selection. The recently published Sheep HapMap dataset is a great example of characterization of the world wide genetic diversity in sheep. In this study, we re-analyzed the Sheep HapMap dataset to identify selection signatures in worldwide sheep populations. Compared to previous analyses, we made use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of linkage disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows pinpointing several new selection signatures in the sheep genome and distinguishing those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the ones previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments.

31 citations


Posted ContentDOI
15 Nov 2013-bioRxiv
TL;DR: This paper presents a curated repository of multielectrode array recordings of spontaneous activity in developing mouse and ferret retina, and describes the structure of the data, along with examples of reproducible research using these data files.
Abstract: Background: During early development, neural circuits fire spontaneously, generating activity episodes with complex spatiotemporal patterns. Recordings of spontaneous activity have been made in many parts of the nervous system over the last 20 years, reporting developmental changes in activity patterns and the effects of various genetic perturbations. Results: We present a curated repository of multielectrode array recordings of spontaneous activity in developing mouse and ferret retina. The data have been annotated with minimal metadata and converted into the HDF5 format. This paper describes the structure of the data, along with examples of reproducible research using these data files. We also demonstrate how these data can be analysed in the CARMEN workflow system. This article is written as a literate programming document; all programs and data described here are freely available. Conclusions: 1. We hope this repository will lead to novel analysis of spontaneous activity recorded in different laboratories. 2. We encourage published data to be added to the repository. 3. This repository serves as an example of how multielectrode array recordings can be stored for long-term reuse.

29 citations


Posted ContentDOI
16 Dec 2013-bioRxiv
TL;DR: It is provided evidence that the evolution of a highly folded neocortex, as observed in humans, requires the traversal of a threshold of ∼109 neurons, and that species above and below the threshold exhibit a bimodal distribution of physiological and life-history traits, establishing two phenotypic groups.
Abstract: Expansion of the neocortex is a hallmark of human evolution. However, it remains an open question what adaptive mechanisms facilitated its expansion. Here we show, using gyrencephaly index (GI) and other physiological and life-history data for 102 mammalian species, that gyrencephaly is an ancestral mammalian trait. We provide evidence that the evolution of a highly folded neocortex, as observed in humans, requires the traversal of a threshold of ∼109 neurons, and that species above and below the threshold exhibit a bimodal distribution of physiological and life-history traits, establishing two phenotypic groups. We identify, using discrete mathematical models, proliferative divisions of progenitors in the basal compartment of the developing neocortex as evolutionarily necessary and sufficient for generating a fourteen-fold increase in daily prenatal neuron production and thus traversal of the neuronal threshold. Finally, using RNA-seq data from fetal human neocortical germinal zones, we show a genomic correlate to the neuron threshold in the differential conservation of long intergenic non-coding RNA.

22 citations


Posted ContentDOI
16 Dec 2013-bioRxiv
TL;DR: In this article, the scaling and cost performance characteristics of current and projected connectomics approaches, with reference to the potential implications of recent advances in diverse contributing fields, are analyzed, and three generalized strategies for dense connectivity mapping at the scale of whole mammalian brains are considered: electron microscopic axon tracing, optical imaging of combinatorial molecular markers at synapses, and bulk DNA sequencing of transsynaptically exchanged nucleic acid barcode pairs.
Abstract: We analyze the scaling and cost-performance characteristics of current and projected connectomics approaches, with reference to the potential implications of recent advances in diverse contributing fields. Three generalized strategies for dense connectivity mapping at the scale of whole mammalian brains are considered: electron microscopic axon tracing, optical imaging of combinatorial molecular markers at synapses, and bulk DNA sequencing of trans-synaptically exchanged nucleic acid barcode pairs. Due to advances in parallel-beam instrumentation, whole mouse brain electron microscopic image acquisition could cost less than $100 million, with total costs presently limited by image analysis to trace axons through large image stacks. Optical microscopy at 50 to 100 nm isotropic resolution could potentially read combinatorially multiplexed molecular information from individual synapses, which could indicate the identifies of the pre-synaptic and post-synaptic cells without relying on axon tracing. An optical approach to whole mouse brain connectomics may be achievable for less than $10 million and could be enabled by emerging technologies to sequence nucleic acids in-situ in fixed tissue via fluorescent microscopy. Novel strategies relying on bulk DNA sequencing, which would extract the connectome without direct imaging of the tissue, could produce a whole mouse brain connectome for $100k to $1 million or a mouse cortical connectome for $10k to $100k. Anticipated further reductions in the cost of DNA sequencing could lead to a $1000 mouse cortical connectome.

Posted ContentDOI
07 Nov 2013-bioRxiv
TL;DR: A hybrid, discrete/continuous computational cellular automaton model of a generalised stem-cell driven tissue with a simple microenvironment is presented, finding good agreement between the parameters of this model and other theoretical models in terms of the intrinsic cellular parameters, which are difficult to study biologically.
Abstract: Since the discovery of tumour initiating cells (TICs) in solid tumours, studies focussing on their role in cancer initiation and progression have abounded. The biological interrogation of these cells continues to yield volumes of information on their pro-tumourigenic behaviour, but actionable generalised conclusions have been scarce. Further, new information suggesting a dependence of tumour composition and growth on the microenvironment has yet to be studied theoretically. To address this point, we created a hybrid, discrete/continuous computational cellular automaton model of a generalised stem-cell driven tissue with a simple microenvironment. Using the model we explored the phenotypic traits inherent to the tumour initiating cells and the effect of the microenvironment on tissue growth. We identify the regions in phenotype parameter space where TICs are able to cause a disruption in homeostasis, leading to tissue overgrowth and tumour maintenance. As our parameters and model are non-specific, they could apply to any tissue TIC and do not assume specific genetic mutations. Targeting these phenotypic traits could represent a generalizable therapeutic strategy across cancer types. Further, we find that the microenvironmental variable does not strongly effect the outcomes, suggesting a need for direct feedback from the microenvironment onto stem-cell behaviour in future modelling endeavours.

Posted ContentDOI
11 Dec 2013-bioRxiv
TL;DR: The work demonstrates a direct interaction between an oomycete CRN and a host target required for suppression of immunity and hint at a virulence strategy that is conserved within the Oomycetes and may allow engineering of resistance to a wide range of crop pathogens.
Abstract: Phytophthora spp. secrete vast arrays of effector molecules upon infection. A main class of intracellular effectors are the CRNs. They are translocated into the host cell and specifically localise to the nucleus where they are thought to perturb many different cellular processes. Although CRN proteins have been implicated as effectors, direct evidence of CRN mediated perturbation of host processes has been lacking. Here we show that a conserved CRN effector from P. capsici directly binds to tomato transcription factor SlTCP14-2. Previous studies in Arabidopsis thaliana have revealed that transcription factor TCP14 may be key immune signalling protein, targeted by effectors from divergent species. We extend on our understanding of TCP targeting by pathogen effectors by showing that the P. capsici effector CRN12_997 binds to SlTCP14-2 in plants. SlTCP14-2 over-expression enhances immunity to P. capsici, a phenotypic outcome that can be abolished by co-expression of CRN12_997. We show that in the presence of CRN12_997, SlTCP14-2 association with nuclear chromatin is diminished, resulting in altered SlTCP14 subnuclear localisation. These results suggest that CRN12_997 prevents SlTCP14 from positively regulating defence against P. capsici. Our work demonstrates a direct interaction between an oomycete CRN and a host target required for suppression of immunity. Collectively, our results hint at a virulence strategy that is conserved within the oomycetes and may allow engineering of resistance to a wide range of crop pathogens.

Posted ContentDOI
07 Nov 2013-bioRxiv
TL;DR: This analysis uncovers a number of putative signals of local adaptation, and a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion is laid out.
Abstract: Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

Posted ContentDOI
10 Dec 2013-bioRxiv
TL;DR: Analysis of scaling and cost-performance characteristics of current and projected connectomics approaches suggests potential cost-effective strategies for dense connectivity mapping at the scale of whole mammalian brains.
Abstract: We analyze the scaling and cost-performance characteristics of current and projected connectomics approaches, with reference to the potential implications of recent advances in diverse contributing fields. Three generalized strategies for dense connectivity mapping at the scale of whole mammalian brains are considered: electron microscopic axon tracing, optical imaging of combinatorial molecular markers at synapses, and bulk DNA sequencing of trans-synaptically exchanged nucleic acid barcode pairs. Due to advances in parallel-beam instrumentation, whole mouse brain electron microscopic image acquisition could cost less than $100 million, with total costs presently limited by image analysis to trace axons through large image stacks. Optical microscopy at 50 to 100 nm isotropic resolution could potentially read combinatorially multiplexed molecular information from individual synapses, which could indicate the identifies of the pre-synaptic and post-synaptic cells without relying on axon tracing. An optical approach to whole mouse brain connectomics may be achievable for less than $10 million and could be enabled by emerging technologies to sequence nucleic acids in-situ in fixed tissue via fluorescent microscopy. Novel strategies relying on bulk DNA sequencing, which would extract the connectome without direct imaging of the tissue, could produce a whole mouse brain connectome for $100k to $1 million or a mouse cortical connectome for $10k to $100k. Anticipated further reductions in the cost of DNA sequencing could lead to a $1000 mouse cortical connectome.

Posted ContentDOI
01 Dec 2013-bioRxiv
TL;DR: In this paper, an instrument is presented to simultaneously manipulate neural activity via Channelrhodopsin, monitor neural response via GCaMP3, and observe behavior in freely moving C. elegans.
Abstract: A fundamental goal of systems neuroscience is to probe the dynamics of neural activity that drive behavior. Here we present an instrument to simultaneously manipulate neural activity via Channelrhodopsin, monitor neural response via GCaMP3, and observes behavior in freely moving C. elegans. We use the instrument to directly observe the relation between sensory stimuli, interneuron activity and locomotion in the mechanosensory circuit. Now published as: Front Neural Circuits 8:28, doi:10.3389/fncir.2014.00028

Posted ContentDOI
22 Nov 2013-bioRxiv
TL;DR: Some serious issues with the TopHat2 algorithms are highlighted, such as poor recall of alignments with a moderate (>3) number of mismatches, low sensitivity and high false discovery rate for splice junction detection, loss of precision for the realignment algorithm, and large number of false chimeric alignments.
Abstract: In the recent paper by Kim et al. (Genome biology, 2013. 14(4): p. R36) the accuracy of TopHat2 was compared to other RNA-seq aligners. In this comment we re-examine most important analyses from this paper and identify several deficiencies that significantly diminished performance of some of the aligners, including incorrect choice of mapping parameters, unfair comparison metrics, and unrealistic simulated data. Using STAR (Dobin et al., Bioinformatics, 2013. 29(1): p. 15-21) as an exemplar, we demonstrate that correcting these deficiencies makes its accuracy equal or better than that of TopHat2. Furthermore, this exercise highlighted some serious issues with the TopHat2 algorithms, such as poor recall of alignments with a moderate (>3) number of mismatches, low sensitivity and high false discovery rate for splice junction detection, loss of precision for the realignment algorithm, and large number of false chimeric alignments.

Posted ContentDOI
11 Nov 2013-bioRxiv
TL;DR: It is shown that the functional data alone are predictive of a SNP’s presence in the GC, and it is demonstrated that their use as prior data when testing for association is practical at the genome–wide scale and improves power to detect associations.
Abstract: We describe the development and application of a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs housed in the GWAS Catalog (GC). The set of functional predictors we examined includes measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants included in the Database of Genomic Variants (DGV) and known regulatory elements included in the Open Regulatory Annotation database (ORegAnno), PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotation variables would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations.

Posted ContentDOI
01 Jan 2013-bioRxiv
TL;DR: A method for mapping the behavioral space of organisms, relying only upon the underlying structure of postural movement data to organize and classify behaviors, finds that six different drosophilid species each perform a mix of non-stereotyped actions and over one hundred hierarchically-organized, stereotyped behaviors.
Abstract: Most animals possess the ability to actuate a vast diversity of movements, ostensibly constrained only by morphology and physics. In practice, however, a frequent assumption in behavioral science is that most of an animal’s activities can be described in terms of a small set of stereotyped motifs. Here we introduce a method for mapping the behavioral space of organisms, relying only upon the underlying structure of postural movement data to organize and classify behaviors. We find that six different drosophilid species each perform a mix of non-stereotyped actions and over one hundred hierarchically-organized, stereotyped behaviors. Moreover, we use this approach to compare these species’ behavioral spaces, systematically identifying subtle behavioral differences between closely-related species.

Posted ContentDOI
14 Nov 2013-bioRxiv
TL;DR: It is shown with analysis and experiments that negative autoregulation matches the production and demand of the outputs: the magnitude of the regulatory signal is proportional to the “error” between the circuit output concentration and its actual demand.
Abstract: We propose a negative feedback architecture that regulates activity of artificial genes, or "genelets", to meet their output downstream demand, achieving robustness with respect to uncertain open-loop output production rates. In particular, we consider the case where the outputs of two genelets interact to form a single assembled product. We show with analysis and experiments that negative autoregulation matches the production and demand of the outputs: the magnitude of the regulatory signal is proportional to the error between the circuit output concentration and its actual demand. This two-device system is experimentally implemented using in vitro transcriptional networks, where reactions are systematically designed by optimizing nucleic acid sequences with publicly available software packages. We build a predictive ordinary differential equation (ODE) model that captures the dynamics of the system, and can be used to numerically assess the scalability of this architecture to larger sets of interconnected genes. Finally, with numerical simulations we contrast our negative autoregulation scheme with a cross-activation architecture, which is less scalable and results in slower response times.

Posted ContentDOI
12 Nov 2013-bioRxiv
TL;DR: The finding that the timing of events within development all scaled uniformly across species and temperatures astonished us, suggesting the existence of a previously unrecognized timer controlling the progress of embryogenesis that has been tuned by natural selection in response to the thermal environment in which each species lives.
Abstract: Temperature affects both the timing and outcome of animal development, but the detailed effects of temperature on the progress of early development have been poorly characterized. To determine the impact of temperature on the order and timing of events during Drosophila melanogaster embryogenesis, we used time-lapse imaging to track the progress of embryos from shortly after egg laying through hatching at seven precisely maintained temperatures between 17.5°C and 32.5°C. We employed a combination of automated and manual annotation to determine when 36 milestones occurred in each embryo. D. melanogaster embryogenesis takes 33 hours at 17.5°C, and accelerates with increasing temperature to a low of 16 hours at 27.5°C, above which embryogenesis slows slightly. Remarkably, while the total time of embryogenesis varies over two fold, the relative timing of events from cellularization through hatching is constant across temperatures. To further explore the relationship between temperature and embryogenesis, we expanded our analysis to cover ten additional Drosophila species of varying climatic origins. Six of these species, like D. melanogaster, are of tropical origin, and embryogenesis time at different temperatures was similar for them all. D. mojavensis, a sub-tropical fly, develops slower than the tropical species at lower temperatures, while D. virilis, a temperate fly, exhibits slower development at all temperatures. The alpine sister species D. persimilis and D. pseudoobscura develop as rapidly as tropical flies at cooler temperatures, but exhibit diminished acceleration above 22.5°C and have drastically slowed development by 30°C. Despite ranging from 13 hours for D. erecta at 30°C to 46 hours for D. virilis at 17.5°C, the relative timing of events from cellularization through hatching is constant across all of the species and temperatures examined here, suggesting the existence of a previously unrecognized timer controlling the progress of embryogenesis that has been tuned by natural selection in response to the thermal environment in which each species lives.

Posted ContentDOI
20 Nov 2013-bioRxiv
TL;DR: A mathematical model describing the dynamics of plasma virus and the transcriptional subclasses of HIV-1-infected cells is developed and predicts that the pool of latently infected cells becomes rapidly established during the first months of acute infection and continues to increase slowlyDuring the first years of chronic infection.
Abstract: Background: HIV-1-infected cells in peripheral blood can be grouped into different transcriptional subclasses. Quantifying the turnover of these cellular subclasses can provide important insights into the viral life cycle and the generation and maintenance of latently infected cells. Results: We used previously published data from five patients chronically infected with HIV-1 that initiated combination antiretroviral therapy (cART). Patient-matched PCR for unspliced and multiply spliced viral RNAs combined with limiting dilution analysis provided measurements of transcriptional profiles at the single cell level. Furthermore, measurement of intracellular transcripts and extracellular virion-enclosed HIV-1 RNA allowed us to distinguish productive from non-productive cells. We developed a mathematical model describing the dynamics of plasma virus and the transcriptional subclasses of HIV-1-infected cells. Fitting the model to the data allowed us to better understand the phenotype of different transcriptional subclasses and their contribution to the overall turnover of HIV-1 before and during cART. The average number of virus-producing cells in peripheral blood is small during chronic infection (25.7 cells per ml). We find that 14.0%, 0.3% and 21.2% of infected cells become defectively, latently and persistently infected cells, respectively. Assuming that the infection is homogenous throughout the body, we estimate an average in vivo viral burst size of 2.1 x 10^4 virions per cell. Conclusions: Our study provides novel quantitative insights into the turnover and development of different subclasses of HIV-1-infected cells. The model predicts that the pool of latently infected cells becomes rapidly established during the first months of acute infection and continues to increase slowly during the first years of chronic infection. Having a detailed understanding of this process will be useful for the evaluation of viral eradication strategies that aim to deplete the latent reservoir of HIV-1.

Posted ContentDOI
23 Aug 2013-bioRxiv
TL;DR: In this paper, the authors introduce a distinction between easy landscapes where local fitness peaks can be found in a moderate number of steps and hard landscapes where finding evolutionary equilibria requires an infeasible amount of time.
Abstract: Experiments show that fitness landscapes can have a rich combinatorial structure due to epistasis and yet theory assumes that local peaks can be reached quickly. I introduce a distinction between easy landscapes where local fitness peaks can be found in a moderate number of steps and hard landscapes where finding evolutionary equilibria requires an infeasible amount of time. Hard examples exist even among landscapes with no reciprocal sign epistasis; on these, strong selection weak mutation dynamics cannot find the unique peak in polynomial time. On hard rugged fitness landscapes, no evolutionary dynamics -- even ones that do not follow adaptive paths -- can find a local fitness peak quickly; and the fitness advantage of nearby mutants cannot drop off exponentially fast but has to follow a power-law that long term evolution experiments have associated with unbounded growth in fitness. I present candidates for hard landscapes at scales from singles genes, to microbes, to complex organisms with costly learning (Baldwin effect). Even though hard landscapes are static and finite, local evolutionary equilibrium cannot be assumed.

Posted ContentDOI
04 Dec 2013-bioRxiv
TL;DR: The ability of the Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset is investigated and it is found that combining functional data with association data improves the ability to prioritise novel hits.
Abstract: The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals (“hits”) to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL) and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype-specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher predicted values showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in psoriasis GWAS data and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data.

Posted ContentDOI
07 Nov 2013-bioRxiv
TL;DR: A scalable iterative algorithm is proposed for the systematic design of sparse, small gain feedback strategies that stabilize the evolutionary dynamics of a generic disease model by augmenting the optimization problems with l1 and l2 regularization terms.
Abstract: It has been shown that optimal controller synthesis for positive systems can be formulated as a linear program. Leveraging these results, we propose a scalable iterative algorithm for the systematic design of sparse, small gain feedback strategies that stabilize the evolutionary dynamics of a generic disease model. We achieve the desired feedback structure by augmenting the optimization problems with l1 and l2 regularization terms, and illustrate our method on an example inspired by an experimental study aimed at finding appropriate HIV neutralizing antibody therapy combinations in the presence of escape mutants.

Posted ContentDOI
10 Dec 2013-bioRxiv
TL;DR: It is found that broad-scale spatial patterns of pond canopy cover and pond morphology strongly influenced metacommunity structure, and species composition was spatially autocorrelated at short distances.
Abstract: Spatial and environmental processes influence species composition at distinct scales. Previous studies suggested that the landscape-scale distribution of larval anurans is influenced by environmental gradients related to adult breeding site selection, such as pond canopy cover, but not water chemistry. However, the combined effects of spatial, pond morphology, and water chemistry variables on metacommunity structure of larval anurans have not been analyzed. We used a partial redundancy analysis with variation partitioning to analyze the relative influence of pond morphology (e.g., depth, area, and aquatic vegetation), water chemistry, and spatial variables on a tadpole metacommunity from southeastern Brazil. We predict that the metacommunity will be spatially structured at broad spatial scales, while environmental variables, mainly related to adult habitat selection, would play a larger role at fine spatial scales. We found that broad-scale spatial patterns of pond canopy cover and pond morphology strongly influenced metacommunity structure. Additionally, species composition was spatially autocorrelated at short distances. We suggest that the reproductive behavior of adult anurans is driving tadpole metacommunity dynamics, since pond morphology, but not water chemistry affects breeding site selection by adults. Our results contribute to the understanding of amphibian species diversity in tropical environments.

Posted ContentDOI
22 Nov 2013-bioRxiv
TL;DR: An approach for generating high-resolution a priori maximum parsimony Y-chromosome phylogenies based on SNP and small INDEL variant data from massively-parallel short-read sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty.
Abstract: An approach for generating high-resolution a priori maximum parsimony Y-chromosome (“chrY”) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (“next-generation”) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty in cases for which "no-calls" (through lack of mapped reads or otherwise) at particular site precludes a precise placement of the mutation. The approach leverages careful variant site filtering and a novel iterative reweighting procedure to generate high-accuracy trees while considering variants in regions of chrY that had previously been excluded from analyses based on short-read sequencing data. It is argued that the proposed approach is also superior to previous region-based filtering approaches in that it adapts to the quality of the underlying data and will automatically allow the scope of sites considered to expand as the underlying data quality (e.g. through longer read lengths) improves. Key related issues, including calling of genotypes for the hemizygous chrY, reliability of variant results, read mismappings and "heterozygous" genotype calls, and the mutational stability of different variants are discussed and taken into account. The methodology is demonstrated through application to a dataset consisting of 1292 male samples from diverse populations and haplogroups, with the majority coming from low-coverage sequencing by the 1000 Genomes Project. Application of the tree-generation approach to these data produces a tree involving over 120,000 chrY variant sites (about 45,000 sites if “singletons” are excluded). The utility of this approach in refining the Y-chromosome phylogenetic tree is demonstrated by examining results for several haplogroups. The results indicate a number of new branches on the Y-chromosome phylogenetic tree, many of them subdividing known branches, but also including some that inform the presence of additional levels along the “trunk” of the tree. Finally, opportunities for extensions of this phylogenetic analysis approach to other types of genetic data are examined.

Posted ContentDOI
10 Dec 2013-bioRxiv
TL;DR: Results establish the two isoforms of GSK-3 as essential integrators of multiple developmental signals that act to maintain normal mammary gland function and suppress tumorigenesis.
Abstract: Many components of Wnt/β-catenin signaling pathway have critical functions in mammary gland development and tumor formation, yet the contribution of glycogen synthase kinase-3 (GSK-3α and GSK-3β) to mammopoiesis and oncogenesis is unclear. Here, we report that WAP-Cre-mediated deletion of GSK-3 in the mammary epithelium results in activation of Wnt/β-catenin signaling and induces mammary intraepithelial neoplasia that progresses to squamous transdifferentiation and development of adenosquamous carcinomas at 6 months. To uncover possible β-catenin-independent activities of GSK-3, we generated mammary-specific knock-outs of GSK-3 and β-catenin. Squamous transdifferentiation of the mammary epithelium was largely attenuated, however mammary epithelial cells lost the ability to form mammospheres suggesting perturbation of stem cell properties unrelated to loss of β-catenin alone. At 10 months, adenocarcinomas that developed in glands lacking GSK-3 and β-catenin displayed elevated levels of γ-catenin/plakoglobin as well as activation of the Hedgehog and Notch pathways. Collectively these results establish the two isoforms of GSK-3 as essential integrators of multiple developmental signals that act to maintain normal mammary gland function and suppress tumorigenesis.

Posted ContentDOI
18 Nov 2013-bioRxiv
TL;DR: It is proposed that Aβ, tau, α-synuclein, huntingtin, TDP-43, PrP and AA are members of the innate immune system and the immune reactions and activities associated with the function of these proteins in innate immunity lead to AD, PD, HD, ALS, CJD, RSA and other related diseases, which are innate immunity disorders.
Abstract: Despite decades of research, thousands of studies and numerous advances, the etiologies of Alzheimer’s Disease (AD), Parkinson’s Disease (PD), Huntington’s Disease (HD), Amyotrophic Lateral Sclerosis (ALS), Frontotemporal Lobar Degeneration (FTLD-U), Creutzfeldt-Jakob Disease (CJD), Reactive Systemic Amyloidosis (RSA) and many other neurodegenerative and systemic amyloid diseases have not been defined, nor have the pathogenic mechanisms leading to cellular death and disease. Moreover, the biological functions of APP/amyloid-β (Aβ), tau, α-synuclein, huntingtin, TAR DNA-binding protein 43 (TDP-43), prion protein (PrP), amyloid A (AA) and some of the other primary proteins implicated in amyloid diseases are not known. And, there are no successful preventive or therapeutic approaches. Based on a comprehensive analysis and new interpretation of the existing data in context of an evolutionary framework, it is proposed that: (i) Aβ, tau, α-synuclein, huntingtin, TDP-43, PrP and AA are members of the innate immune system, (ii) the isomeric conformational changes of these proteins and their assembly into various oligomers, plaques, and tangles are not protein misfolding events as defined for decades, nor are they prion-replication activities, but part of their normal, evolutionarily selected innate immune repertoire, and (iii) the immune reactions and activities associated with the function of these proteins in innate immunity lead to AD, PD, HD, ALS, CJD, RSA and other related diseases, which are innate immunity disorders.

Posted ContentDOI
10 Dec 2013-bioRxiv
TL;DR: The general patterns of stability and control effectiveness suggested from the manipulations of forelimb, hindlimb and tail morphology here may help understand the evolution of flight control aerodynamics in vertebrates.
Abstract: We report the effects of posture and morphology on the static aerodynamic stability and control effectiveness of physical models based on the feathered dinosaur, Microraptor gui, from the Cretaceous of China. Postures had similar lift and drag coefficients and were broadly similar when simplified metrics of gliding were considered, but they exhibited different stability characteristics depending on the position of the legs and the presence of feathers on the legs and the tail. Both stability and the function of appendages in generating maneuvering forces and torques changed as the glide angle or angle of attack were changed. These are significant because they represent an aerial environment that may have shifted during the evolution of directed aerial descent and other aerial behaviors. Certain movements were particularly effective (symmetric movements of the wings and tail in pitch, asymmetric wing movements, some tail movements). Other appendages altered their function from creating yaws at high angle of attack to rolls at low angle of attack, or reversed their function entirely. While M. gui lived after Archaeopteryx and likely represents a side experiment with feathered morphology, the general patterns of stability and control effectiveness suggested from the manipulations of forelimb, hindlimb and tail morphology here may help understand the evolution of flight control aerodynamics in vertebrates. Though these results rest on a single specimen, as further fossils with different morphologies tested, the findings here could be applied in a phylogenetic context to reveal biomechanical constraints on extinct flyers arising from the need to maneuver.