scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Quantitative Methods in 2008"


Posted Content
TL;DR: This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth.
Abstract: Several of the next generation sequencers are limited in their sample preparation process by the need to make an absolute measurement of the number of template molecules in the library to be sequenced. As currently practiced, the practical effects of this requirement compromise sequencing performance, both by requiring large amounts of sample DNA and by requiring extra sequencing runs to be performed. We used digital PCR to quantitate sequencing libraries, and demonstrated its sensitivity and robustness by preparing and sequencing libraries from subnanogram amounts of bacterial and human DNA on the 454 and Solexa sequencing platforms. This assay allows absolute quantitation and eliminates uncertainties associated with the construction and application of standard curves. The digital PCR platform consumes subfemptogram amounts of the sequencing library and gives highly accurate results, allowing the optimal DNA concentration to be used in setting up sequencing runs without costly and time-consuming titration techniques. This approach also reduces the input sample requirement more than 1000-fold: from micrograms of DNA to less than a nanogram.

176 citations


Posted Content
TL;DR: Examining the use of 2D and 3D descriptors for small molecules, and incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of the chemogenomics model.
Abstract: The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. \textit{In silico} prediction of interactions between GPCRs and small molecules is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies. We propose new methods for in silico chemogenomics and validate them on the virtual screening of GPCRs. The methods represent an extension of a recently proposed machine learning strategy, based on support vector machines (SVM), which provides a flexible framework to incorporate various information sources on the biological space of targets and on the chemical space of small molecules. We investigate the use of 2D and 3D descriptors for small molecules, and test a variety of descriptors for GPCRs. We show fo instance that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of our model. In particular we are able to predict ligands of orphan GPCRs with an estimated accuracy of 78.1%.

101 citations


Posted Content
TL;DR: A combination of recent coordinate descent algorithms with an adaptation of the histogram MonteCarlo method is used, and the resulting algorithm learns the parameters of an Ising model describing a network of forty neurons within a few minutes.
Abstract: Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544(Dated: February 4, 2008)Recent work has shown that probabilistic models based on pairwise interactions|in the simplestcase, the Ising model|provide surprisingly accurate descriptions of experiments on real biologicalnetworks ranging from neurons to genes. Finding these models requires us to solve an inverse prob-lem: given experimentally measured expectation values, what are the parameters of the underlyingHamiltonian? This problem sits at the intersection of statistical physics and machine learning, andwe suggest that more ecient solutions are possible by merging ideas from the two elds. We usea combination of recent coordinate descent algorithms with an adaptation of the histogram MonteCarlo method, and implement these techniques to take advantage of the sparseness found in data onreal neurons. The resulting algorithm learns the parameters of an Ising model describing a networkof forty neurons within a few minutes. This opens the possibility of analyzing much larger data setsnow emerging, and thus testing hypotheses about the collective behaviors of these networks.I. INTRODUCTION

94 citations


Posted Content
TL;DR: In this article, the use of high-level query templates that capture recurring biological questions and that can be automatically translated into temporal logic has been investigated by the analysis of an extended model of the network of global regulators controlling the carbon starvation response in Escherichia coli.
Abstract: Models of the dynamics of cellular interaction networks have become increasingly larger in recent years. Formal verification based on model checking provides a powerful technology to keep up with this increase in scale and complexity. The application of model-checking approaches is hampered, however, by the difficulty for non-expert users to formulate appropriate questions in temporal logic. In order to deal with this problem, we propose the use of patterns, that is, high-level query templates that capture recurring biological questions and that can be automatically translated into temporal logic. The applicability of the developed set of patterns has been investigated by the analysis of an extended model of the network of global regulators controlling the carbon starvation response in Escherichia coli.

91 citations


Posted Content
TL;DR: An algorithm based on Gram-Schmidt orthogonalization (called GS-PCA) is presented, which eliminates this shortcoming of NIPALS- PCA and the numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster than the CPU optimized versions based on CBLas (GNU Scientific Library).
Abstract: Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA) are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).

91 citations


Posted Content
TL;DR: Yeon et al. as discussed by the authors proposed a wavelet decomposition and thresholding-based segmentation method for Array Comparative Genomic Hybridization (aCGH) data.
Abstract: Motivation: Array Comparative Genomic Hybridization (aCGH) is used to scan the entire genome for variations in DNA copy number. A central task in the analysis of aCGH data is the segmentation into groups of probes sharing the same DNA copy number. Some well known segmentation methods suffer from very long running times, preventing interactive data analysis. Results: We suggest a new segmentation method based on wavelet decomposition and thresholding, which detects significant breakpoints in the data. Our algorithm is over 1,000 times faster than leading approaches, with similar performance. Another key advantage of the proposed method is its simplicity and flexibility. Due to its intuitive structure it can be easily generalized to incorporate several types of side information. Here we consider two extensions which include side information indicating the reliability of each measurement, and compensating for a changing variability in the measurement noise. The resulting algorithm outperforms existing methods, both in terms of speed and performance, when applied on real high density CGH data. Availability: Implementation is available under software tab at: this http URL Contact: yonina@ee.technion.ac.il

79 citations


Journal ArticleDOI
TL;DR: In this paper, the existence of a dynamical potential with both local and global meanings in general nonequilibrium processes has been investigated. But no detailed balance condition is required in their demonstration.
Abstract: From a logic point of view this is the third in the series to solve the problem of absence of detailed balance. This paper will be denoted as SDS III. The existence of a dynamical potential with both local and global meanings in general nonequilibrium processes has been controversial. Following an earlier explicit construction by one of us (Ao, J. Phys. {\bf A37}, L25 '04, arXiv:0803.4356, referred to as SDS II), in the present paper we show rigorously its existence for a generic class of situations in physical and biological sciences. The local dynamical meaning of this potential function is demonstrated via a special stochastic differential equation and its global steady-state meaning via a novel and explicit form of Fokker-Planck equation, the zero mass limit. We also give a procedure to obtain the special stochastic differential equation for any given Fokker-Planck equation. No detailed balance condition is required in our demonstration. For the first time we obtain here a formula to describe the noise induced shift in drift force comparing to the steady state distribution, a phenomenon extensively observed in numerical studies. The comparison to two well known stochastic integration methods, Ito and Stratonovich, are made ready. Such comparison was made elsewhere (Ao, Phys. Life Rev. {\bf 2} (2005) 117. q-bio/0605020).

77 citations


Posted Content
TL;DR: A global convergence theorem is proved in a general framework, which includes examples from the literature as particular cases, that proves that feedforward circuits do not adapt to pulse signals, because they display a memory phenomenon.
Abstract: This note studies feedforward circuits as models for perfect adaptation to step signals in biological systems. A global convergence theorem is proved in a general framework, which includes examples from the literature as particular cases. A notable aspect of these circuits is that they do not adapt to pulse signals, because they display a memory phenomenon. Estimates are given of the magnitude of this effect.

57 citations


Posted Content
TL;DR: In this article, a fast GPU implementation of the Matching Pursuit (MP) algorithm is discussed, based on the recently released NVIDIA CUDA API and CUBLAS library, and the results show that the GPU version is substantially faster than the highly optimized CPU version based on CBLAS (GNU Scientific Library).
Abstract: We consider the problem of sparse signal recovery from a small number of random projections (measurements). This is a well known NP-hard to solve combinatorial optimization problem. A frequently used approach is based on greedy iterative procedures, such as the Matching Pursuit (MP) algorithm. Here, we discuss a fast GPU implementation of the MP algorithm, based on the recently released NVIDIA CUDA API and CUBLAS library. The results show that the GPU version is substantially faster (up to 31 times) than the highly optimized CPU version based on CBLAS (GNU Scientific Library).

53 citations


Posted Content
TL;DR: A conceptually very simple algorithm for hierarchical clustering called MIC is reviewed, which applies to the construction of phylogenetic trees from mitochondrial DNA sequences and the reconstruction of the fetal ECG from the output of independent components analysis applied to the ECG of a pregnant woman.
Abstract: Clustering is a concept used in a huge variety of applications. We review a conceptually very simple algorithm for hierarchical clustering called in the following the {\it mutual information clustering} (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use MIC both in the Shannon (probabilistic) version of information theory, where the "objects" are probability distributions represented by random samples, and in the Kolmogorov (algorithmic) version, where the "objects" are symbol sequences. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and we reconstruct the fetal ECG from the output of independent components analysis (ICA) applied to the ECG of a pregnant woman.

49 citations


Book ChapterDOI
TL;DR: A recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data is reviewed.
Abstract: We review a recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data. We present several strategies that have been proposed and that lead to different pattern recognition problems and algorithms. The strenght of these approaches is illustrated on the reconstruction of metabolic, protein-protein and regulatory networks of model organisms. In all cases, state-of-the-art performance is reported.

Journal ArticleDOI
TL;DR: This work presents a Brownian dynamics algorithm for simulating reaction-diffusion systems that rigorously obeys detailed balance for equilibrium reactions and applies it to a "push-pull" network in which two antagonistic enzymes covalently modify a substrate.
Abstract: Brownian Dynamics algorithms are widely used for simulating soft-matter and biochemical systems. In recent times, their application has been extended to the simulation of coarse-grained models of cellular networks in simple organisms. In these models, components move by diffusion, and can react with one another upon contact. However, when reactions are incorporated into a Brownian Dynamics algorithm, attention must be paid to avoid violations of the detailed-balance rule, and therefore introducing systematic errors in the simulation. We present a Brownian Dynamics algorithm for reaction-diffusion systems that rigorously obeys detailed balance for equilibrium reactions. By comparing the simulation results to exact analytical results for a bimolecular reaction, we show that the algorithm correctly reproduces both equilibrium and dynamical quantities. We apply our scheme to a ``push-pull'' network in which two antagonistic enzymes covalently modify a substrate. Our results highlight that the diffusive behaviour of the reacting species can reduce the gain of the response curve of this network.

Posted Content
TL;DR: A detailed calibration of TPM magnitude as a function of DNA length and particle size is carried out and a systematic comparison between measured particle excursions and theoretical expectations is presented, which helps clarify both the experiments and models of DNA conformation.
Abstract: The Tethered Particle Motion (TPM) method has been used to observe and characterize a variety of protein-DNA interactions including DNA looping and transcription. TPM experiments exploit the Brownian motion of a DNA-tethered bead to probe biologically relevant conformational changes of the tether. In these experiments, a change in the extent of the bead's random motion is used as a reporter of the underlying macromolecular dynamics and is often deemed sufficient for TPM analysis. However, a complete understanding of how the motion depends on the physical properties of the tethered particle complex would permit more quantitative and accurate evaluation of TPM data. For instance, such understanding can help extract details about a looped complex geometry (or multiple coexisting geometries) from TPM data. To better characterize the measurement capabilities of TPM experiments involving DNA tethers, we have carried out a detailed calibration of TPM magnitude as a function of DNA length and particle size. We also explore how experimental parameters such as acquisition time and exposure time affect the apparent motion of the tethered particle. We vary the DNA length from 200bp to 2.6kbp and consider particle diameters of 200, 490 and 970nm. We also present a systematic comparison between measured particle excursions and theoretical expectations, which helps clarify both the experiments and models of DNA conformation.

Posted Content
TL;DR: This work measured the individual three-dimensional positions in compact flocks of up to 2700 birds and investigated the main features of the flock as a whole - shape, movement, density and structure - and discusses these as emergent attributes of the grouping phenomenon.
Abstract: Bird flocking is a striking example of collective animal behaviour. A vivid illustration of this phenomenon is provided by the aerial display of vast flocks of starlings gathering at dusk over the roost and swirling with extraordinary spatial coherence. Both the evolutionary justification and the mechanistic laws of flocking are poorly understood, arguably because of a lack of data on large flocks. Here, we report a quantitative study of aerial display. We measured the individual three-dimensional positions in compact flocks of up to 2700 birds. We investigated the main features of the flock as a whole - shape, movement, density and structure - and discuss these as emergent attributes of the grouping phenomenon. We find that flocks are relatively thin, with variable sizes, but constant proportions. They tend to slide parallel to the ground and, during turns, their orientation changes with respect to the direction of motion. Individual birds keep a minimum distance from each other that is comparable to their wingspan. The density within the aggregations is non-homogeneous, as birds are packed more tightly at the border compared to the centre of the flock. These results constitute the first set of large-scale data on three-dimensional animal aggregations. Current models and theories of collective animal behaviour can now be tested against these results.

Posted Content
TL;DR: The main technical problems in 3D data collection of large animal groups are reviewed and how to solve the stereoscopic correspondence - or matching - problem is explained, which was the major bottleneck of all 3D studies in the past.
Abstract: The most startling examples of collective animal behaviour are provided by very large and cohesive groups moving in three dimensions. Paradigmatic examples are bird flocks, fish schools and insect swarms. However, because of the sheer technical difficulty of obtaining 3D data, empirical studies conducted to date have only considered loose groups of a few tens of animals. Moreover, these studies were very seldom conducted in the field. Recently the STARFLAG project achieved the 3D reconstruction of thousands of birds under field conditions, thus opening the way to a new generation of quantitative studies of collective animal behaviour. Here, we review the main technical problems in 3D data collection of large animal groups and we outline some of the methodological solutions adopted by the STARFLAG project. In particular, we explain how to solve the stereoscopic correspondence - or matching - problem, which was the major bottleneck of all 3D studies in the past.

Journal ArticleDOI
TL;DR: In this article, a perturbation theory analogous to that used in quantum mechanics is proposed to determine the first and second cumulants of the distribution of created product molecules as a function of the substrate concentration and the kinetic rates of the intermediate processes.
Abstract: Enzyme-mediated reactions may proceed through multiple intermediate conformational states before creating a final product molecule, and one often wishes to identify such intermediate structures from observations of the product creation. In this paper, we address this problem by solving the chemical master equations for various enzymatic reactions. We devise a perturbation theory analogous to that used in quantum mechanics that allows us to determine the first ( ) and the second (variance) cumulants of the distribution of created product molecules as a function of the substrate concentration and the kinetic rates of the intermediate processes. The mean product flux V=d /dt (or "dose-response" curve) and the Fano factor F=variance/ are both realistically measurable quantities, and while the mean flux can often appear the same for different reaction types, the Fano factor can be quite different. This suggests both qualitative and quantitative ways to discriminate between different reaction schemes, and we explore this possibility in the context of four sample multistep enzymatic reactions. We argue that measuring both the mean flux and the Fano factor can not only discriminate between reaction types, but can also provide some detailed information about the internal, unobserved kinetic rates, and this can be done without measuring single-molecule transition events.

Journal ArticleDOI
TL;DR: A theoretical model based on a map of the protein tertiary structure into a resistor network is implemented to account for a sequential tunneling mechanism of charge transfer through neighbouring amino acids and is validated by comparison with current-voltage experiments.
Abstract: When moving from native to light activated bacteriorhodopsin, modification of charge transport consisting of an increase of conductance is correlated to the protein conformational change. A theoretical model based on a map of the protein tertiary structure into a resistor network is implemented to account for a sequential tunneling mechanism of charge transfer through neighbouring amino acids. The model is validated by comparison with current-voltage experiments. The predictability of the model is further tested on bovine rhodopsin, a G-protein coupled receptor (GPCR) also sensitive to light. In this case, results show an opposite behaviour with a decrease of conductance in the presence of light.

Posted Content
TL;DR: It is demonstrated that neglecting border effects gives rise to artefacts when studying the 3D structure of a group, and that mathematical rigour is essential to distinguish important biological properties from trivial geometric features of animal groups.
Abstract: The study of collective animal behaviour must progress through a comparison between the theoretical predictions of numerical models and data coming from empirical observations. To this aim it is important to develop methods of three-dimensional (3D) analysis that are at the same time informative about the structure of the group and suitable to empirical data. In fact, empirical data are considerably noisier than numerical data, and they are subject to several constraints. We review here the tools of analysis used by the STARFLAG project to characterise the 3D structure of large flocks of starlings in the field. We show how to avoid the most common pitfalls i the quantitative analysis of 3D animal groups, with particular attention to the problem of the bias introduced by the border of the group. By means of practical examples, we demonstrate that neglecting border effects gives rise to artefacts when studying the 3D structure of a group. Moreover, we show that mathematical rigour is essential to distinguish important biological properties from trivial geometric features of animal groups.

Posted Content
TL;DR: The power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function are illustrated.
Abstract: The sequence of amino acids in a protein is believed to determine its native state structure, which in turn is related to the functionality of the protein. In addition, information pertaining to evolutionary relationships is contained in homologous sequences. One powerful method for inferring these sequence attributes is through comparison of a query sequence with reference sequences that contain significant homology and whose structure, function, and/or evolutionary relationships are already known. In spite of decades of concerted work, there is no simple framework for deducing structure, function, and evolutionary (SF&E) relationships directly from sequence information alone, especially when the pair-wise identity is less than a threshold figure ~25% [1,2]. However, recent research has shown that sequence identity as low as 8% is sufficient to yield common structure/function relationships and sequence identities as large as 88% may yet result in distinct structure and function [3,4]. Starting with a basic premise that protein sequence encodes information about SF&E, one might ask how one could tease out these measures in an unbiased manner. Here we present a unified framework for inferring SF&E from sequence information using a knowledge-based approach which generates phylogenetic profiles in an unbiased manner. We illustrate the power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function. These data are in excellent accord with published data and new experiments. Our results suggest that there is a wealth of previously unexplored information in protein sequence.

Posted Content
TL;DR: This work proposes to infer the parameters of the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves by taking advantage of the sparsity of the networks.
Abstract: Statistical inference of genetic regulatory networks is essential for understanding temporal interactions of regulatory elements inside the cells. For inferences of large networks, identification of network structure is typical achieved under the assumption of sparsity of the networks. When the number of time points in the expression experiment is not too small, we propose to infer the parameters in the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves. For networks with a large number of genes, we take advantage of the sparsity of the networks by penalizing the linear coefficients with a L_1 norm. The ability of the algorithm to infer network structure is demonstrated using the cell-cycle time course data for Saccharomyces cerevisiae.

Journal ArticleDOI
TL;DR: A new model is introduced, the activity model, showing analytically and numerically that it also displays a power-law scaling of the depth with tree size at a critical parameter value.
Abstract: Many processes and models --in biological, physical, social, and other contexts-- produce trees whose depth scales logarithmically with the number of leaves. Phylogenetic trees, describing the evolutionary relationships between biological species, are examples of trees for which such scaling is not observed. With this motivation, we analyze numerically two branching models leading to non-logarithmic scaling of the depth with the number of leaves. For Ford's alpha model, although a power-law scaling of the depth with tree size was established analytically, our numerical results illustrate that the asymptotic regime is approached only at very large tree sizes. We introduce here a new model, the activity model, showing analytically and numerically that it also displays a power-law scaling of the depth with tree size at a critical parameter value.

Book ChapterDOI
TL;DR: The Offdiagonal Complexity (OdC), a new, and computationally cheap, measure of complexity is defined, based on the node-node link cross-distribution, whose nondiagonal elements characterize the graph structure beyond link distribution, cluster coefficient, and average path length.
Abstract: Many complex biological, social, and economical networks show topologies drastically differing from random graphs. But what is a complex network, i.e., how can one quantify the complexity of a graph? Here the Offdiagonal Complexity (OdC), a new, and computationally cheap, measure of complexity is defined, based on the node-node link cross-distribution, whose nondiagonal elements characterize the graph structure beyond link distribution, cluster coefficient, and average path length. The OdC approach is applied to the Helicobacter pylori protein interaction network and randomly rewired surrogates thereof. In addition, OdC is used to characterize the spatial complexity of cell aggregates. We investigate the earliest embryo development states of Caenorhabditis elegans. The development states of the premorphogenetic phase are represented by symmetric binary-valued cell connection matrices with dimension growing from 4 to 385. These matrices can be interpreted as adjacency matrices of an undirected graph, or network. The OdC approach allows us to describe quantitatively the complexity of the cell aggregate geometry.

Journal ArticleDOI
TL;DR: The results indicate large differences between the injured patients and the healthy subjects, and in particular, the networks of spinal cord injured patient exhibited a higher density of efficient clusters.
Abstract: We study the topological properties of functional connectivity patterns among cortical areas in the frequency domain. The cortical networks were estimated from high-resolution EEG recordings in a group of spinal cord injured patients and in a group of healthy subjects, during the preparation of a limb movement. We first evaluate global and local efficiency, as indicators of the structural connectivity respectively at a global and local scale. Then, we use the Markov Clustering method to analyse the division of the network into community structures. The results indicate large differences between the injured patients and the healthy subjects. In particular, the networks of spinal cord injured patient exhibited a higher density of efficient clusters. In the Alpha (7-12 Hz) frequency band, the two observed largest communities were mainly composed by the cingulate motor areas with the supplementary motor areas, and by the pre-motor areas with the right primary motor area of the foot. This functional separation strengthens the hypothesis of a compensative mechanism due to the partial alteration in the primary motor areas because of the effects of the spinal cord injury.

Posted Content
TL;DR: The investigations of the genetic code on the basis of matrix approaches ("matrix genetics") and the results are related with the problem of algebraization of bioinformatics take attention to the question: what is life from the viewpoint of algebra.
Abstract: Algebraic properties of the genetic code are analyzed. The investigations of the genetic code on the basis of matrix approaches ("matrix genetics") are described. The degeneracy of the vertebrate mitochondria genetic code is reflected in the black-and-white mosaic of the (8*8)-matrix of 64 triplets, 20 amino acids and stop-signals. This mosaic genetic matrix is connected with the matrix form of presentation of the special 8-dimensional Yin-Yang-algebra and of its particular 4-dimensional case. The special algorithm, which is based on features of genetic molecules, exists to transform the mosaic genomatrix into the matrices of these algebras. Two new numeric systems are defined by these 8-dimensional and 4-dimensional algebras: genetic Yin-Yang-octaves and genetic tetrions. Their comparison with quaternions by Hamilton is presented. Elements of new "genovector calculation" and ideas of "genetic mechanics" are discussed. These algebras are considered as models of the genetic code and as its possible pre-code basis. They are related with binary oppositions of the Yin-Yang type and they give new opportunities to investigate evolution of the genetic code. The revealed fact of the relation between the genetic code and these genetic algebras is discussed in connection with the idea by Pythagoras: "All things are numbers". Simultaneously these genetic algebras can be utilized as the algebras of genetic operators in biological organisms. The described results are related with the problem of algebraization of bioinformatics. They take attention to the question: what is life from the viewpoint of algebra?

Journal ArticleDOI
TL;DR: Two methods for accelerating sampling in SSA models are developed: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound, based on the analysis of the eigenvalues of continuous time Markov models that define the behavior of the SSA.
Abstract: Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated reaction networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction channels operate on widely varying time scales. In this paper, we develop two methods for accelerating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound. Both methods depend on analysis of the eigenvalues of continuous time Markov model graphs that define the behavior of the SSA. We demonstrate these methods for the specific application of sampling breakage times for multiply-connected bond networks, a class of stiff system important to models of self-assembly processes. We show theoretically and empirically that our eigenvalue methods provide substantially reduced sampling times for a wide range of network breakage models. These techniques are also likely to have broad use in accelerating SSA models so as to apply them to systems and parameter ranges that are currently computationally intractable.

Posted Content
TL;DR: Calculations of the chemotactic drift velocity vd of an E. coli cell performing chemotaxis in a uniform, steady shear flow, with a weak chemoattractant gradient at right angles to the flow find that a more elongated body shape is advantageous in performing chemosynthesis in a strongShear flow.
Abstract: Escherichia coli is a motile bacterium that moves up a chemoattractant gradient by performing a biased random walk composed of alternating runs and tumbles. This paper presents calculations of the chemotactic drift velocity vd (the mean velocity up the chemoattractant gradient) of an E. coli cell performing chemotaxis in a uniform, steady shear flow, with a weak chemoattractant gradient at right angles to the flow. Extending earlier models, a combined analytic and numerical approach is used to assess the effect of several complications, namely (i) a cell cannot detect a chemoattractant gradient directly but rather makes temporal comparisons of chemoattractant concentration, (ii) the tumbles exhibit persistence of direction, meaning that the swimming directions before and after a tumble are correlated, (iii) the cell suffers random re-orientations due to rotational Brownian motion, and (iv) the non-spherical shape of the cell affects the way that it is rotated by the shear flow. These complications influence the dependence of vd on the shear rate gamma. When they are all included, it is found that (a) shear disrupts chemotaxis and shear rates beyond gamma = 2/second cause the cell to swim down the chemoattractant gradient rather than up it, (b) in terms of maximising drift velocity, persistence of direction is advantageous in a quiescent fluid but disadvantageous in a shear flow, and (c) a more elongated body shape is advantageous in performing chemotaxis in a strong shear flow.

Posted Content
TL;DR: The assumptions made by McDougal et al (2006), both explicit and implicit, in their estimation of the proportion of "true recent infections" using the BED CEIA are analyzed to derive an identity which shows the relationship between these parameters, allowing the elimination of sensitivity and short term specificity.
Abstract: In this short note, we analyze the assumptions made by McDougal et al (2006), both explicit and implicit, in their estimation of the proportion of "true recent infections" using the BED CEIA. This enables us to write down expressions for the sensitivity, short term specificity and long term specificity of a test for recent infection defined by a BED ODn below a threshold. We then derive an identity which shows the relationship between these parameters, allowing the elimination of sensitivity and short term specificity from an expression relating the proportion of "true recent infections" to the proportion of seropositive individuals testing below threshold. This has two important consequences. Firstly, the simplified formula is substantially more amenable to calibration. Secondly, naively treating the parameters as independent would lead to an incorrect estimate of uncertainty due to imperfect calibration.

BookDOI
TL;DR: While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, the seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.
Abstract: We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

Posted Content
TL;DR: The rejection-free method reported here should be useful for simulating a variety of systems in which multisite molecular interactions yield large molecular aggregates, and it is applied to simulate simple models for ligand-receptor interactions.
Abstract: The system-level dynamics of multivalent biomolecular interactions can be simulated using a rule-based kinetic Monte Carlo method in which a rejection sampling strategy is used to generate reaction events. This method becomes inefficient when simulating aggregation processes with large biomolecular complexes. Here, we present a rejectionfree method for determining the kinetics of multivalent biomolecular interactions, and we apply the method to simulate simple models for ligand-receptor interactions. Simulation results show that performance of the rejection-free method is equal to or better than that of the rejection method over wide parameter ranges, and the rejection-free method is more efficient for simulating systems in which aggregation is extensive. The rejection-free method reported here should be useful for simulating a variety of systems in which multisite molecular interactions yield large molecular aggregates.

Posted Content
TL;DR: In this paper, a method to quantify the topological distance between two networks of different sizes, finding that the architectures of the networks are more similar within the same class than the outside of their class, is presented.
Abstract: Evolutionary mechanism in a self-organized system cause some functional changes that force to adapt new conformation of the interaction pattern between the components of that system. Measuring the structural differences one can retrace the evolutionary relation between two systems. We present a method to quantify the topological distance between two networks of different sizes, finding that the architectures of the networks are more similar within the same class than the outside of their class. With 43 cellular networks of different species, we show that the evolutionary relationship can be elucidated from the structural distances.