scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Quantitative Methods in 2014"


Journal ArticleDOI
TL;DR: In this article, the authors consider mitigation measures that involve the targeted closure of school classes or grades based on readily available information such as the number of symptomatic infectious children in a class.
Abstract: School environments are thought to play an important role in the community spread of airborne infections (e.g., influenza) because of the high mixing rates of school children. The closure of schools has therefore been proposed as efficient mitigation strategy, with however high social and economic costs: alternative, less disruptive interventions are highly desirable. The recent availability of high-resolution contact networks in school environments provides an opportunity to design micro-interventions and compare the outcomes of alternative mitigation measures. We consider mitigation measures that involve the targeted closure of school classes or grades based on readily available information such as the number of symptomatic infectious children in a class. We focus on the case of a primary school for which we have high-resolution data on the close-range interactions of children and teachers. We simulate the spread of an influenza-like illness in this population by using an SEIR model with asymptomatics and compare the outcomes of different mitigation strategies. We find that targeted class closure affords strong mitigation effects: closing a class for a fixed period of time -equal to the sum of the average infectious and latent durations- whenever two infectious individuals are detected in that class decreases the attack rate by almost 70% and strongly decreases the probability of a severe outbreak. The closure of all classes of the same grade mitigates the spread almost as much as closing the whole school. Targeted class closure strategies based on readily available information on symptomatic subjects and on limited information on mixing patterns, such as the grade structure of the school, can be almost as effective as whole-school closure, at a much lower cost. This may inform public health policies for the management and mitigation of influenza-like outbreaks in the community.

170 citations


Posted Content
TL;DR: This work presents the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction, and introduces a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations.
Abstract: Predicting protein secondary structure is a fundamental problem in protein structure prediction. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.

110 citations


Posted Content
TL;DR: This work uses a bidirectional recurrent neural network with long short term memory cells for prediction of secondary structure from the amino acid sequence and reports better performance than state of the art on the secondary structure 8-class problem.
Abstract: Prediction of protein secondary structure from the amino acid sequence is a classical bioinformatics problem. Common methods use feed forward neural networks or SVMs combined with a sliding window, as these models does not naturally handle sequential data. Recurrent neural networks are an generalization of the feed forward neural network that naturally handle sequential data. We use a bidirectional recurrent neural network with long short term memory cells for prediction of secondary structure and evaluate using the CB513 dataset. On the secondary structure 8-class problem we report better performance (0.674) than state of the art (0.664). Our model includes feed forward networks between the long short term memory cells, a path that can be further explored.

108 citations


Posted Content
TL;DR: In this paper, a Lagrangian approach to the transport of pheromones by turbulent flows and exploit it to predict the statistics of odor detection during olfactory searches is presented.
Abstract: The olfactory system of male moths is exquisitely sensitive to pheromones emitted by females and transported in the environment by atmospheric turbulence. Moths respond to minute amounts of pheromones and their behavior is sensitive to the fine-scale structure of turbulent plumes where pheromone concentration is detectible. The signal of pheromone whiffs is qualitatively known to be intermittent, yet quantitative characterization of its statistical properties is lacking. This challenging fluid dynamics problem is also relevant for entomology, neurobiology and the technological design of olfactory stimulators aimed at reproducing physiological odor signals in well-controlled laboratory conditions. Here, we develop a Lagrangian approach to the transport of pheromones by turbulent flows and exploit it to predict the statistics of odor detection during olfactory searches. The theory yields explicit probability distributions for the intensity and the duration of pheromone detections, as well as their spacing in time. Predictions are favorably tested by using numerical simulations, laboratory experiments and field data for the atmospheric surface layer. The resulting signal of odor detections lends to implementation with state-of-the-art technologies and quantifies the amount and the type of information that male moths can exploit during olfactory searches.

107 citations


Posted Content
TL;DR: In this article, the authors focus on four case studies of the sensorimotor dynamics of animals, each of which involves the application of principles from control theory to probe stability and feedback in an organism's response to perturbations.
Abstract: Control theory arose from a need to control synthetic systems. From regulating steam engines to tuning radios to devices capable of autonomous movement, it provided a formal mathematical basis for understanding the role of feedback in the stability (or change) of dynamical systems. It provides a framework for understanding any system with feedback regulation, including biological ones such as regulatory gene networks, cellular metabolic systems, sensorimotor dynamics of moving animals, and even ecological or evolutionary dynamics of organisms and populations. Here we focus on four case studies of the sensorimotor dynamics of animals, each of which involves the application of principles from control theory to probe stability and feedback in an organism's response to perturbations. We use examples from aquatic (electric fish station keeping and jamming avoidance), terrestrial (cockroach wall following) and aerial environments (flight control in moths) to highlight how one can use control theory to understand how feedback mechanisms interact with the physical dynamics of animals to determine their stability and response to sensory inputs and perturbations. Each case study is cast as a control problem with sensory input, neural processing, and motor dynamics, the output of which feeds back to the sensory inputs. Collectively, the interaction of these systems in a closed loop determines the behavior of the entire system.

101 citations


Posted Content
TL;DR: In this article, a general data structure (DBGFM) is proposed for the enumeration of simple paths of the de Bruijn graph using only 43 MB of memory, which is a 46% improvement over previous approaches.
Abstract: The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers proposed a navigational data structure approach in order to improve memory usage. We prove several theoretical space lower bounds to show the limitation of these types of approaches. We further design and implement a general data structure (DBGFM) and demonstrate its use on a human whole-genome dataset, achieving space usage of 1.5 GB and a 46% improvement over previous approaches. As part of DBGFM, we develop the notion of frequency-based minimizers and show how it can be used to enumerate all maximal simple paths of the de Bruijn graph using only 43 MB of memory. Finally, we demonstrate that our approach can be integrated into an existing assembler by modifying the ABySS software to use DBGFM.

77 citations


Journal ArticleDOI
TL;DR: In this article, the effect of synergy and redundancy in the inference of the information flow between subsystems of a complex network was analyzed by means of Granger causality, and two different strategies (based either on informational content for the candidate driver or on selecting the variables with highest pairwise influences) were proposed.
Abstract: We analyze by means of Granger causality the effect of synergy and redundancy in the inference (from time series data) of the information flow between subsystems of a complex network. Whilst we show that fully conditioned Granger causality is not affected by synergy, the pairwise analysis fails to put in evidence synergetic effects. In cases when the number of samples is low, thus making the fully conditioned approach unfeasible, we show that partially conditioned Granger causality is an effective approach if the set of conditioning variables is properly chosen. We consider here two different strategies (based either on informational content for the candidate driver or on selecting the variables with highest pairwise influences) for partially conditioned Granger causality and show that depending on the data structure either one or the other might be valid. On the other hand, we observe that fully conditioned approaches do not work well in presence of redundancy, thus suggesting the strategy of separating the pairwise links in two subsets: those corresponding to indirect connections of the fully conditioned Granger causality (which should thus be excluded) and links that can be ascribed to redundancy effects and, together with the results from the fully connected approach, provide a better description of the causality pattern in presence of redundancy. We finally apply these methods to two different real datasets. First, analyzing electrophysiological data from an epileptic brain, we show that synergetic effects are dominant just before seizure occurrences. Second, our analysis applied to gene expression time series from HeLa culture shows that the underlying regulatory networks are characterized by both redundancy and synergy.

75 citations


Journal ArticleDOI
TL;DR: Analysis of fMRI data from a large dataset of individuals, using resting state BOLD signals, demonstrated that a functional entropy associated with brain activity increases with age, and the entropy of males at birth was lower than that of females.
Abstract: We use entropy to characterize intrinsic ageing properties of the human brain. Analysis of fMRI data from a large dataset of individuals, using resting state BOLD signals, demonstrated that a functional entropy associated with brain activity increases with age. During an average lifespan, the entropy, which was calculated from a population of individuals, increased by approximately 0.1 bits, due to correlations in BOLD activity becoming more widely distributed. We attribute this to the number of excitatory neurons and the excitatory conductance decreasing with age. Incorporating these properties into a computational model leads to quantitatively similar results to the fMRI data. Our dataset involved males and females and we found significant differences between them. The entropy of males at birth was lower than that of females. However, the entropies of the two sexes increase at different rates, and intersect at approximately 50 years; after this age, males have a larger entropy.

63 citations


Journal ArticleDOI
TL;DR: This paper applied a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 8,000 diseases and demonstrated that their method generates phenotypes that correctly identify known disease-associated genes in mice and humans with high accuracy.
Abstract: Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text- mining approach to identify the phenotypes (signs and symptoms) associated with over 8,000 diseases. We demonstrate that our method generates phenotypes that correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that share signs and symptoms cluster together, and we use this network to identify phenotypic disease modules.

56 citations


Posted Content
TL;DR: It is confirmed that the use of random effects is most beneficial for diseases that are known to be highly polygenic: hypertension (HT) and bipolar disorder (BD).
Abstract: To date, efforts to produce high-quality polygenic risk scores from genome-wide studies of common disease have focused on estimating and aggregating the effects of multiple SNPs. Here we propose a novel statistical approach for genetic risk prediction, based on random and mixed effects models. Our approach (termed GeRSI) circumvents the need to estimate the effect sizes of numerous SNPs by treating these effects as random, producing predictions which are consistently superior to current state of the art, as we demonstrate in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC study, we confirm that the use of random effects is most beneficial for diseases that are known to be highly polygenic: hypertension (HT) and bipolar disorder (BD). For HT, there are no significant associations in the WTCCC data. The best existing model yields an AUC of 54%, while GeRSI improves it to 59%. For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at the top 10% of BD risk predictions, using GeRSI substantially increases the BD relative risk from 1.4 to 2.5.

50 citations


Journal ArticleDOI
TL;DR: In this article, an integral fluctuation theorem for the entropy production and a measure of the information accumulated in the memory device were derived. And they showed that the amount of information is bounded by the average thermodynamic entropy produced by the process.
Abstract: In view of the relation between information and thermodynamics we investigate how much information about an external protocol can be stored in the memory of a stochastic measurement device given an energy budget. We consider a layered device with a memory component storing information about the external environment by monitoring the history of a sensory part coupled to the environment. We derive an integral fluctuation theorem for the entropy production and a measure of the information accumulated in the memory device. Its most immediate consequence is that the amount of information is bounded by the average thermodynamic entropy produced by the process. At equilibrium no entropy is produced and therefore the memory device does not add any information about the environment to the sensory component. Consequently, if the system operates at equilibrium the addition of a memory component is superfluous. Such device can be used to model the sensing process of a cell measuring the external concentration of a chemical compound and encoding the measurement in the amount of phosphorylated cytoplasmic proteins.

Journal ArticleDOI
TL;DR: It is demonstrated that when nucleobases pass through a pore, even after sampling over many orientations, changes in the electrical properties of the ribbon can be used to discriminate between bases.
Abstract: We propose a DNA sequencing scheme based on silicene nanopores. Using first principles theory, we compute the electrical properties of such pores in the absence and presence of nucleobases. Within a two-terminal geometry, we analyze the current-voltage relation in the presence of nucleobases with various orientations. We demonstrate that when nucleobases pass through a pore, even after sampling over many orientations, changes in the electrical properties of the ribbon can be used to discriminate between bases.

Posted Content
TL;DR: An algorithm is described that allows for individual <1 MDa particle images to be aligned without frame averaging or linear trajectories, and can be used to improve 3-D maps from single particle cryo-EM.
Abstract: Direct detector device (DDD) cameras have revolutionized single particle electron cryomicroscopy (cryo-EM). In addition to an improved camera detective quantum efficiency, acquisition of DDD movies allows for correction of movement of the specimen, due both to instabilities in the microscope specimen stage and electron beam-induced movement. Unlike specimen stage drift, beam-induced movement is not always homogeneous within an image. Local correlation in the trajectories of nearby particles suggests that beam-induced motion is due to deformation of the ice layer. Algorithms have already been described that can correct movement for large regions of frames and for > 1 MDa protein particles. Another algorithm allows individual < 1 MDa protein particle trajectories to be estimated, but requires rolling averages to be calculated from frames and fits linear trajectories for particles. Here we describe an algorithm that allows for individual < 1 MDa particle images to be aligned without frame averaging or linear trajectories. The algorithm maximizes the overall correlation of the shifted frames with the sum of the shifted frames. The optimum in this single objective function is found efficiently by making use of analytically calculated derivatives of the function. To smooth estimates of particle trajectories, rapid changes in particle positions between frames are penalized in the objective function and weighted averaging of nearby trajectories ensures local correlation in trajectories. This individual particle motion correction, in combination with weighting of Fourier components to account for increasing radiation damage in later frames, can be used to improve 3-D maps from single particle cryo-EM.

Posted Content
TL;DR: In this paper, a Poisson-multivariate normal hierarchical model was developed to learn direct interactions from the count-based output of standard metagenomics sequencing experiments, and the model provided a structured, accurate, and distributionally reasonable way of modeling correlated count based random variables and capturing direct interactions among them.
Abstract: Many microbes associate with higher eukaryotes and impact their vitality. In order to engineer microbiomes for host benefit, we must understand the rules of community assembly and maintenence, which in large part, demands an understanding of the direct interactions between community members. Toward this end, we've developed a Poisson-multivariate normal hierarchical model to learn direct interactions from the count-based output of standard metagenomics sequencing experiments. Our model controls for confounding predictors at the Poisson layer, and captures direct taxon-taxon interactions at the multivariate normal layer using an $\ell_1$ penalized precision matrix. We show in a synthetic experiment that our method handily outperforms state-of-the-art methods such as SparCC and the graphical lasso (glasso). In a real, in planta perturbation experiment of a nine member bacterial community, we show our model, but not SparCC or glasso, correctly resolves a direct interaction structure among three community members that associate with Arabidopsis thaliana roots. We conclude that our method provides a structured, accurate, and distributionally reasonable way of modeling correlated count based random variables and capturing direct interactions among them.

Posted Content
TL;DR: A mathematical model is constructed describing the basic facts of glioma progression and response to radiotherapy and proposed radiation fractionation schemes that might be therapeutically useful by helping to evaluate tumour malignancy while at the same time reducing the toxicity associated to the treatment.
Abstract: Low grade gliomas (LGGs) are a group of primary brain tumors usually encountered in young patient populations. These tumors represent a difficult challenge because many patients survive a decade or more and may be at a higher risk for treatment-related complications. Specifically, radiation therapy is known to have a relevant effect on survival but in many cases it can be deferred to avoid side effects while maintaining its beneficial effect. However, a subset of low-grade gliomas manifests more aggressive clinical behavior and requires earlier intervention. Moreover, the effectiveness of radiotherapy depends on the tumor characteristics. Recently Pallud et al., [Neuro-oncology, 14(4):1-10, 2012], studied patients with LGGs treated with radiation therapy as a first line therapy. and found the counterintuitive result that tumors with a fast response to the therapy had a worse prognosis than those responding late. In this paper we construct a mathematical model describing the basic facts of glioma progression and response to radiotherapy. The model provides also an explanation to the observations of Pallud et al. Using the model we propose radiation fractionation schemes that might be therapeutically useful by helping to evaluate the tumor malignancy while at the same time reducing the toxicity associated to the treatment.

Posted Content
TL;DR: In this paper, a multidi-mensional persistent homology approach was proposed to visualize and discriminate the topological change of integrated brain networks by varying not only threshold but also mixing ratios between two different imaging modalities.
Abstract: Finding the underlying relationships among multiple imaging modalities in a coherent fashion is one of challenging problems in the multimodal analysis. In this study, we propose a novel multimodal network approach based on multidi- mensional persistent homology. In this extension of the previous threshold-free method of persistent homology, we visualize and discriminate the topological change of integrated brain networks by varying not only threshold but also mixing ratios between two different imaging modalities. Moreover, we also pro- pose an integration method for multimodal networks, called one-dimensional projection, with a specific mixing ratio between modalities. We applied the proposed methods to PET and MRI data from 21 autism spectrum disorder (ASD) children and 10 pediatric control subjects. From the results, we found that the brain networks of ASD children and controls differ significantly, with ASD showing asymmetrical changes of connected structures between PET and MRI. The integrated MRI and PET networks showed that ASD children had weaker connections than controls within the visual cortex, between dorsal and ventral parts of the temporal pole, between frontal and parietal regions, and between the left perisylvian and other brain regions. These results provide a multidimensional homological understanding of disease-related PET and MRI networks that discloses the network association with ASD.

Posted Content
TL;DR: A type of DCN is implemented using a modified Locally Competitive Algorithm to investigate the relationship between the number of kernels, the stride, the receptive field size, and the quality of reconstruction, and it is found that for a given stride and number of kernel, the patch size does not significantly affect reconstruction quality.
Abstract: In sparse coding it is common to tile an image into nonoverlapping patches, and then use a dictionary to create a sparse representation of each tile independently. In this situation, the overcompleteness of the dictionary is the number of dictionary elements divided by the patch size. In deconvolutional neural networks (DCNs), dictionaries learned on nonoverlapping tiles are replaced by a family of convolution kernels. Hence adjacent points in the feature maps (V1 layers) have receptive fields in the image that are translations of each other. The translational distance is determined by the dimensions of V1 in comparison to the dimensions of the image space. We refer to this translational distance as the stride. We implement a type of DCN using a modified Locally Competitive Algorithm (LCA) to investigate the relationship between the number of kernels, the stride, the receptive field size, and the quality of reconstruction. We find, for example, that for 16x16-pixel receptive fields, using eight kernels and a stride of 2 leads to sparse reconstructions of comparable quality as using 512 kernels and a stride of 16 (the nonoverlapping case). We also find that for a given stride and number of kernels, the patch size does not significantly affect reconstruction quality. Instead, the learned convolution kernels have a natural support radius independent of the patch size.

Journal ArticleDOI
TL;DR: This work proposes a new experimentally corroborated paradigm in which the truth tables of the brain's logic-gates are time dependent, i.e., dynamic logic-Gates (DLGs), and demonstrates the underlying biological mechanism is the unavoidable increase of neuronal response latencies to ongoing stimulations, which imposes a non-uniform gradual stretching of network delays.
Abstract: In 1943 McCulloch and Pitts suggested that the brain is composed of reliable logic-gates similar to the logic at the core of today's computers. This framework had a limited impact on neuroscience, since neurons exhibit far richer dynamics. Here we propose a new experimentally corroborated paradigm in which the truth tables of the brain's logic-gates are time dependent, i.e. dynamic logicgates (DLGs). The truth tables of the DLGs depend on the history of their activity and the stimulation frequencies of their input neurons. Our experimental results are based on a procedure where conditioned stimulations were enforced on circuits of neurons embedded within a large-scale network of cortical cells in-vitro. We demonstrate that the underlying biological mechanism is the unavoidable increase of neuronal response latencies to ongoing stimulations, which imposes a nonuniform gradual stretching of network delays. The limited experimental results are confirmed and extended by simulations and theoretical arguments based on identical neurons with a fixed increase of the neuronal response latency per evoked spike. We anticipate our results to lead to better understanding of the suitability of this computational paradigm to account for the brain's functionalities and will require the development of new systematic mathematical methods beyond the methods developed for traditional Boolean algebra.

Journal ArticleDOI
TL;DR: A data processing procedure for the quantitative analysis of amplified cDNA fragments separated by electrophoresis is developed that provides an open-end alternative to DNA microarray analysis of the transcriptome and is expected to work equally well with DDRT-PCR and cDNA-AFLP data.
Abstract: Background: Gene expression studies on non-model organisms require open-end strategies for transcription profiling. Gel-based analysis of cDNA fragments allows to detect alterations in gene expression for genes which have neither been sequenced yet nor are available in cDNA libraries. Commonly used protocols are cDNA Differential Display (DDRT-PCR) and cDNA-AFLP. Both methods have been used merely as qualitative gene discovery tools so far. Results: We developed procedures for the conversion of DDRT-PCR data into quantitative transcription profiles. Amplified cDNA fragments are separated on a DNA sequencer. Data processing consists of four steps: (i) cDNA bands in lanes corresponding to samples treated with the same primer combination are matched in order to identify fragments originating from the same transcript, (ii) intensity of bands is determined by densitometry, (iii) densitometric values are normalized, and (iv) intensity ratio is calculated for each pair of corresponding bands. Transcription profiles are represented by sets of intensity ratios (control vs. treatment) for cDNA fragments defined by primer combination and DNA mobility. We demonstrated the procedure by analyzing DDRT-PCR data on the effect of secondary metabolites of oilseed rape Brassica napus on the transcriptome of the pathogenic fungus Leptosphaeria maculans. Conclusion: We developed a data processing procedure for quantitative analysis of amplified cDNA fragments. The system utilizes common software and provides an open-end alternative to microarray analysis. The processing is expected to work equally well with DDRT-PCR and cDNA-AFLP data and be useful in research on organisms for which microarray analysis is not available or economical.

Journal ArticleDOI
TL;DR: In this article, the L\'evy walk and the composite correlated random walk and its associated area-restricted search behavior are compared using likelihood functions and associated statistical measures that assess the relative support for and absolute fit of each model.
Abstract: 1. Understanding how to find targets with very limited information is a topic of interest in many disciplines. In ecology, such research has often focused on the development of two movement models: i) the L\'evy walk and; ii) the composite correlated random walk and its associated area-restricted search behaviour. Although the processes underlying these models differ, they can produce similar movement patterns. Due to this similarity and because of their disparate formulation, current methods cannot reliably differentiate between these two models. 2. Here, we present a method that differentiates between the two models. It consists of likelihood functions, including one for a hidden Markov model, and associated statistical measures that assess the relative support for and absolute fit of each model. 3. Using a simulation study, we show that our method can differentiate between the two search models over a range of parameter values. Using the movement data of two polar bears (\textit{Ursus maritimus}), we show that the method can be applied to complex, real-world movement paths. 4. By providing the means to differentiate between the two most prominent search models in the literature, and a framework that could be extended to include other models, we facilitate further research into the strategies animals use to find resources.

Posted Content
TL;DR: This manuscript presents the first fully-automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction), and develops a metric to assess the quality of the output graphs.
Abstract: Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by manually tracing each neuronal process at this scale is unmanageable, and therefore researchers are developing automated image processing modules. Thus far, state-of-the-art algorithms focus only on the solution to a particular task (e.g., neuron segmentation or synapse identification). In this manuscript we present the first fully automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction). To evaluate overall performance and select the best parameters and methods, we also develop a metric to assess the quality of the output graphs. We evaluate a set of algorithms and parameters, searching possible operating points to identify the best available brain graph for our assessment metric. Finally, we deploy a reference end-to-end version of the pipeline on a large, publicly available data set. This provides a baseline result and framework for community analysis and future algorithm development and testing. All code and data derivatives have been made publicly available toward eventually unlocking new biofidelic computational primitives and understanding of neuropathologies.

Posted Content
TL;DR: This work presents a method for inferring the location of recombination hotspots from patterns of linkage disequilibrium within samples of population genetic data, and shows that it has hotspot detection power of approximately 50-60%, but depending on the magnitude of the hotspot.
Abstract: Motivation: Recombination rates vary considerably at the fine scale within mammalian genomes, with the majority of recombination occurring within hotspots of ~2 kb in width. We present a method for inferring the location of recombination hotspots from patterns of linkage disequilibrium within samples of population genetic data. Results: Using simulations, we show that our method has hotspot detection power of approximately 50-60%, but depending on the magnitude of the hotspot. The false positive rate is between 0.24 and 0.56 false positives per Mb for data typical of humans. Availability: this http URL

Posted Content
TL;DR: In this article, the authors used mathematical models describing the growth of grade II gliomas in response to radiotherapy and found that enlarging substantially the time interval between RT fractions may lead to a better tumor control.
Abstract: Grade II gliomas are slowly growing primary brain tumors that affect mostly young patients and become fatal after a few years. Current clinical handling includes surgery as first line treatment. Cytotoxic therapies (radiotherapy RT or chemotherapy QT) are used initially only for patients having a bad prognosis. Therapies are administered following the 'maximum dose in minimum time' principle, what is the same schedule used for high grade brain tumors. Using mathematical models describing the growth of these tumors in response to radiotherapy, we find that a extreme protraction therapeutical strategy, i.e. enlarging substantially the time interval between RT fractions, may lead to a better tumor control. Explicit formulas are found providing the optimal spacing between doses in a very good agreement with the simulations of the full three-dimensional mathematical model approximating the tumor spatio-temporal dynamics. This idea, although breaking the well-stablished paradigm, has biological meaning since in these slowly growing tumors it may be more favourable to treat the tumor as the different tumor subpopulations move to more sensitive phases of the cell cycle.

Posted Content
TL;DR: This work introduces a large-scale, high-throughput, and semi-automated methodology to efficiently identify synapses and successfully applied this methodology to the Drosophila medulla optic lobe, annotating many more synapses than previous connectome efforts.
Abstract: Reconstructing neuronal circuits at the level of synapses is a central problem in neuroscience and becoming a focus of the emerging field of connectomics. To date, electron microscopy (EM) is the most proven technique for identifying and quantifying synaptic connections. As advances in EM make acquiring larger datasets possible, subsequent manual synapse identification ({\em i.e.}, proofreading) for deciphering a connectome becomes a major time bottleneck. Here we introduce a large-scale, high-throughput, and semi-automated methodology to efficiently identify synapses. We successfully applied our methodology to the Drosophila medulla optic lobe, annotating many more synapses than previous connectome efforts. Our approaches are extensible and will make the often complicated process of synapse identification accessible to a wider-community of potential proofreaders.

Journal ArticleDOI
TL;DR: This study indicates that no single method is uniformly better than all others, and helps identifying pros and cons of the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers.
Abstract: A number of bioinformatic or biostatistical methods are available for analyzing DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses. We have designed and implemented a framework to generate realistic DNA copy number profiles of cancer samples with known truth. These profiles are generated by resampling real SNP microarray data from genomic regions with known copy-number state. The original real data have been extracted from dilutions series of tumor cell lines with matched blood samples at several concentrations. Therefore, the signal-to-noise ratio of the generated profiles can be controlled through the (known) percentage of tumor cells in the sample. In this paper, we describe this framework and illustrate some of the benefits of the proposed data generation approach on a practical use case: a comparison study between methods for segmenting DNA copy number profiles from SNP microarrays. This study indicates that no single method is uniformly better than all others. It also helps identifying pros and cons for the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers. Availability: R package jointSeg: this http URL\_id=1562

Journal ArticleDOI
TL;DR: In this article, the influence of inhaled concentrations on exhaled breath concentrations for VOCs with higher Henry constants was investigated and an additional compartment was added to account for upper airway influence.
Abstract: In a recent paper we presented a simple two compartment model which describes the influence of inhaled concentrations on exhaled breath concentrations for volatile organic compounds (VOCs) with small Henry constants. In this paper we extend this investigation concerning the influence of inhaled concentrations on exhaled breath concentrations for VOCs with higher Henry constants. To this end we extend our model with an additional compartment which takes into account the influence of the upper airways on exhaled breath VOC concentrations.

Journal ArticleDOI
TL;DR: Anderson and Higham as mentioned in this paper proposed a multi-level method to estimate system statistics using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive).
Abstract: Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method (Anderson and Higham, Multiscale Model. Simul. 2012) tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of $\tau$. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel, adaptive time-stepping approach where $\tau$ is chosen according to the stochastic behaviour of each sample path we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.

Posted Content
TL;DR: In this article, a convex sparse supervised canonical correlation analysis (sparse sCCA) is proposed for sparse mCCA when one of the data sets is a vector.
Abstract: We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples One approach that has been proposed for dealing with this type of data is ``sparse multiple canonical correlation analysis'' (sparse mCCA) All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods We derive efficient algorithms for this problem, and illustrate their use on simulated and real data

Posted Content
TL;DR: This work shows that, in certain models, parametric inference can be performed using statistics defined on the computed invariants of persistent homology, and develops this idea with a model from population genetics, the coalescent with recombination.
Abstract: Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics defined on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an influenza dataset, identifying two scales of topological structure which have a distinct biological interpretation.

Posted Content
TL;DR: In this article, the authors explore examples where it has been possible to measure, directly, the flow of information in biological networks, or more generally where information theoretic ideas have been used to guide the analysis of experiments.
Abstract: Life depends as much on the flow of information as on the flow of energy. Here we review the many efforts to make this intuition precise. Starting with the building blocks of information theory, we explore examples where it has been possible to measure, directly, the flow of information in biological networks, or more generally where information theoretic ideas have been used to guide the analysis of experiments. Systems of interest range from single molecules (the sequence diversity in families of proteins) to groups of organisms (the distribution of velocities in flocks of birds), and all scales in between. Many of these analyses are motivated by the idea that biological systems may have evolved to optimize the gathering and representation of information, and we review the experimental evidence for this optimization, again across a wide range of scales.