Showing papers in "arXiv: Quantitative Methods in 2008"

PDF

Open Access

Posted Content•

Digital PCR provides sensitive and absolute calibration for high throughput sequencing

[...]

Richard A. White¹, Paul C. Blainey¹, H. Christina Fan¹, Stephen R. Quake¹•Institutions (1)

16 Aug 2008-arXiv: Quantitative Methods

TL;DR: This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth.

...read moreread less

Abstract: Several of the next generation sequencers are limited in their sample preparation process by the need to make an absolute measurement of the number of template molecules in the library to be sequenced. As currently practiced, the practical effects of this requirement compromise sequencing performance, both by requiring large amounts of sample DNA and by requiring extra sequencing runs to be performed. We used digital PCR to quantitate sequencing libraries, and demonstrated its sensitivity and robustness by preparing and sequencing libraries from subnanogram amounts of bacterial and human DNA on the 454 and Solexa sequencing platforms. This assay allows absolute quantitation and eliminates uncertainties associated with the construction and application of standard curves. The digital PCR platform consumes subfemptogram amounts of the sequencing library and gives highly accurate results, allowing the optimal DNA concentration to be used in setting up sequencing runs without costly and time-consuming titration techniques. This approach also reduces the input sample requirement more than 1000-fold: from micrograms of DNA to less than a nanogram.

...read moreread less

176 citations

Posted Content•

Virtual screening of GPCRs: an in silico chemogenomics approach

[...]

Laurent Jacob¹, Laurent Jacob², Laurent Jacob³, Brice Hoffmann³, Brice Hoffmann², Brice Hoffmann¹, Véronique Stoven¹, Véronique Stoven², Véronique Stoven³, Jean-Philippe Vert¹, Jean-Philippe Vert³, Jean-Philippe Vert² - Show less +8 more•Institutions (3)

French Institute of Health and Medical Research¹, Mines ParisTech², Curie Institute³

28 Jan 2008-arXiv: Quantitative Methods

TL;DR: Examining the use of 2D and 3D descriptors for small molecules, and incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of the chemogenomics model.

...read moreread less

Abstract: The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. \textit{In silico} prediction of interactions between GPCRs and small molecules is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies. We propose new methods for in silico chemogenomics and validate them on the virtual screening of GPCRs. The methods represent an extension of a recently proposed machine learning strategy, based on support vector machines (SVM), which provides a flexible framework to incorporate various information sources on the biological space of targets and on the chemical space of small molecules. We investigate the use of 2D and 3D descriptors for small molecules, and test a variety of descriptors for GPCRs. We show fo instance that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of our model. In particular we are able to predict ligands of orphan GPCRs with an estimated accuracy of 78.1%.

...read moreread less

101 citations

Posted Content•

Faster solutions of the inverse pairwise Ising problem

[...]

Tamara Broderick, Miroslav Dud, Robert E. Schapire, William Bialek

01 Jan 2008-arXiv: Quantitative Methods

TL;DR: A combination of recent coordinate descent algorithms with an adaptation of the histogram MonteCarlo method is used, and the resulting algorithm learns the parameters of an Ising model describing a network of forty neurons within a few minutes.

...read moreread less

Abstract: Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544(Dated: February 4, 2008)Recent work has shown that probabilistic models based on pairwise interactions|in the simplestcase, the Ising model|provide surprisingly accurate descriptions of experiments on real biologicalnetworks ranging from neurons to genes. Finding these models requires us to solve an inverse prob-lem: given experimentally measured expectation values, what are the parameters of the underlyingHamiltonian? This problem sits at the intersection of statistical physics and machine learning, andwe suggest that more ecient solutions are possible by merging ideas from the two elds. We usea combination of recent coordinate descent algorithms with an adaptation of the histogram MonteCarlo method, and implement these techniques to take advantage of the sparseness found in data onreal neurons. The resulting algorithm learns the parameters of an Ising model describing a networkof forty neurons within a few minutes. This opens the possibility of analyzing much larger data setsnow emerging, and thus testing hypotheses about the collective behaviors of these networks.I. INTRODUCTION

...read moreread less

94 citations

Posted Content•

Temporal Logic Patterns for Querying Dynamic Models of Cellular Interaction Networks

[...]

Pedro T. Monteiro¹, Delphine Ropers¹, Radu Mateescu¹, Ana T. Freitas, Hidde de Jong¹ - Show less +1 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

06 Mar 2008-arXiv: Quantitative Methods

TL;DR: In this article, the use of high-level query templates that capture recurring biological questions and that can be automatically translated into temporal logic has been investigated by the analysis of an extended model of the network of global regulators controlling the carbon starvation response in Escherichia coli.

...read moreread less

Abstract: Models of the dynamics of cellular interaction networks have become increasingly larger in recent years. Formal verification based on model checking provides a powerful technology to keep up with this increase in scale and complexity. The application of model-checking approaches is hampered, however, by the difficulty for non-expert users to formulate appropriate questions in temporal logic. In order to deal with this problem, we propose the use of patterns, that is, high-level query templates that capture recurring biological questions and that can be automatically translated into temporal logic. The applicability of the developed set of patterns has been investigated by the analysis of an extended model of the network of global regulators controlling the carbon starvation response in Escherichia coli.

...read moreread less

91 citations

Posted Content•

Parallel GPU Implementation of Iterative PCA Algorithms

[...]

Mircea Andrecut¹•Institutions (1)

University of Calgary¹

07 Nov 2008-arXiv: Quantitative Methods

TL;DR: An algorithm based on Gram-Schmidt orthogonalization (called GS-PCA) is presented, which eliminates this shortcoming of NIPALS- PCA and the numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster than the CPU optimized versions based on CBLas (GNU Scientific Library).

...read moreread less

Abstract: Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA) are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).

...read moreread less

91 citations

Posted Content•

A Fast and Flexible Method for the Segmentation of aCGH Data

[...]

Erez Ben-Yaacov¹, Yonina C. Eldar¹•Institutions (1)

Technion – Israel Institute of Technology¹

28 Apr 2008-arXiv: Quantitative Methods

TL;DR: Yeon et al. as discussed by the authors proposed a wavelet decomposition and thresholding-based segmentation method for Array Comparative Genomic Hybridization (aCGH) data.

...read moreread less

Abstract: Motivation: Array Comparative Genomic Hybridization (aCGH) is used to scan the entire genome for variations in DNA copy number. A central task in the analysis of aCGH data is the segmentation into groups of probes sharing the same DNA copy number. Some well known segmentation methods suffer from very long running times, preventing interactive data analysis. Results: We suggest a new segmentation method based on wavelet decomposition and thresholding, which detects significant breakpoints in the data. Our algorithm is over 1,000 times faster than leading approaches, with similar performance. Another key advantage of the proposed method is its simplicity and flexibility. Due to its intuitive structure it can be easily generalized to incorporate several types of side information. Here we consider two extensions which include side information indicating the reliability of each measurement, and compensating for a changing variability in the measurement noise. The resulting algorithm outperforms existing methods, both in terms of speed and performance, when applied on real high density CGH data. Availability: Implementation is available under software tab at: this http URL Contact: yonina@ee.technion.ac.il

...read moreread less

79 citations

Journal Article•DOI•

Stochastic Dynamical Structure (SDS) of Nonequilibrium Processes in the Absence of Detailed Balance. III: potential function in local stochastic dynamics and in steady state of Boltzmann-Gibbs type distribution function

[...]

P. Ao

02 Apr 2008-arXiv: Quantitative Methods

TL;DR: In this paper, the existence of a dynamical potential with both local and global meanings in general nonequilibrium processes has been investigated. But no detailed balance condition is required in their demonstration.

...read moreread less

Abstract: From a logic point of view this is the third in the series to solve the problem of absence of detailed balance. This paper will be denoted as SDS III. The existence of a dynamical potential with both local and global meanings in general nonequilibrium processes has been controversial. Following an earlier explicit construction by one of us (Ao, J. Phys. {\bf A37}, L25 '04, arXiv:0803.4356, referred to as SDS II), in the present paper we show rigorously its existence for a generic class of situations in physical and biological sciences. The local dynamical meaning of this potential function is demonstrated via a special stochastic differential equation and its global steady-state meaning via a novel and explicit form of Fokker-Planck equation, the zero mass limit. We also give a procedure to obtain the special stochastic differential equation for any given Fokker-Planck equation. No detailed balance condition is required in our demonstration. For the first time we obtain here a formula to describe the noise induced shift in drift force comparing to the steady state distribution, a phenomenon extensively observed in numerical studies. The comparison to two well known stochastic integration methods, Ito and Stratonovich, are made ready. Such comparison was made elsewhere (Ao, Phys. Life Rev. {\bf 2} (2005) 117. q-bio/0605020).

...read moreread less

77 citations

Posted Content•

Remarks on Feedforward Circuits, Adaptation, and Pulse Memory

[...]

Eduardo D. Sontag

03 Dec 2008-arXiv: Quantitative Methods

TL;DR: A global convergence theorem is proved in a general framework, which includes examples from the literature as particular cases, that proves that feedforward circuits do not adapt to pulse signals, because they display a memory phenomenon.

...read moreread less

Abstract: This note studies feedforward circuits as models for perfect adaptation to step signals in biological systems. A global convergence theorem is proved in a general framework, which includes examples from the literature as particular cases. A notable aspect of these circuits is that they do not adapt to pulse signals, because they display a memory phenomenon. Estimates are given of the magnitude of this effect.

...read moreread less

57 citations

Posted Content•

Fast GPU Implementation of Sparse Signal Recovery from Random Projections

[...]

Mircea Andrecut¹•Institutions (1)

University of Calgary¹

10 Sep 2008-arXiv: Quantitative Methods

TL;DR: In this article, a fast GPU implementation of the Matching Pursuit (MP) algorithm is discussed, based on the recently released NVIDIA CUDA API and CUBLAS library, and the results show that the GPU version is substantially faster than the highly optimized CPU version based on CBLAS (GNU Scientific Library).

...read moreread less

Abstract: We consider the problem of sparse signal recovery from a small number of random projections (measurements). This is a well known NP-hard to solve combinatorial optimization problem. A frequently used approach is based on greedy iterative procedures, such as the Matching Pursuit (MP) algorithm. Here, we discuss a fast GPU implementation of the MP algorithm, based on the recently released NVIDIA CUDA API and CUBLAS library. The results show that the GPU version is substantially faster (up to 31 times) than the highly optimized CPU version based on CBLAS (GNU Scientific Library).

...read moreread less

53 citations

Posted Content•

MIC: Mutual Information based hierarchical Clustering

[...]

Alexander Kraskov¹, Peter Grassberger²•Institutions (2)

UCL Institute of Neurology¹, University of Calgary²

09 Sep 2008-arXiv: Quantitative Methods

TL;DR: A conceptually very simple algorithm for hierarchical clustering called MIC is reviewed, which applies to the construction of phylogenetic trees from mitochondrial DNA sequences and the reconstruction of the fetal ECG from the output of independent components analysis applied to the ECG of a pregnant woman.

...read moreread less

Abstract: Clustering is a concept used in a huge variety of applications. We review a conceptually very simple algorithm for hierarchical clustering called in the following the {\it mutual information clustering} (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use MIC both in the Shannon (probabilistic) version of information theory, where the "objects" are probability distributions represented by random samples, and in the Kolmogorov (algorithmic) version, where the "objects" are symbol sequences. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and we reconstruct the fetal ECG from the output of independent components analysis (ICA) applied to the ECG of a pregnant woman.

...read moreread less

49 citations

Book Chapter•DOI•

Reconstruction of biological networks by supervised machine learning approaches

[...]

Jean-Philippe Vert¹•Institutions (1)

Mines ParisTech¹

22 Sep 2008-arXiv: Quantitative Methods

TL;DR: A recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data is reviewed.

...read moreread less

Abstract: We review a recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data. We present several strategies that have been proposed and that lead to different pattern recognition problems and algorithms. The strenght of these approaches is illustrated on the reconstruction of metabolic, protein-protein and regulatory networks of model organisms. In all cases, state-of-the-art performance is reported.

...read moreread less

Journal Article•DOI•

Reaction Brownian Dynamics and the effect of spatial fluctuations on the gain of a push-pull network

[...]

Marco J. Morelli¹, Pieter Rein ten Wolde•Institutions (1)

Fundamental Research on Matter Institute for Atomic and Molecular Physics¹

25 Apr 2008-arXiv: Quantitative Methods

TL;DR: This work presents a Brownian dynamics algorithm for simulating reaction-diffusion systems that rigorously obeys detailed balance for equilibrium reactions and applies it to a "push-pull" network in which two antagonistic enzymes covalently modify a substrate.

...read moreread less

Abstract: Brownian Dynamics algorithms are widely used for simulating soft-matter and biochemical systems. In recent times, their application has been extended to the simulation of coarse-grained models of cellular networks in simple organisms. In these models, components move by diffusion, and can react with one another upon contact. However, when reactions are incorporated into a Brownian Dynamics algorithm, attention must be paid to avoid violations of the detailed-balance rule, and therefore introducing systematic errors in the simulation. We present a Brownian Dynamics algorithm for reaction-diffusion systems that rigorously obeys detailed balance for equilibrium reactions. By comparing the simulation results to exact analytical results for a bimolecular reaction, we show that the algorithm correctly reproduces both equilibrium and dynamical quantities. We apply our scheme to a ``push-pull'' network in which two antagonistic enzymes covalently modify a substrate. Our results highlight that the diffusive behaviour of the reacting species can reduce the gain of the response curve of this network.

...read moreread less

Posted Content•

Calibration of Tethered Particle Motion Experiments

[...]

Lin Han¹, Bertrand H. Lui², Bertrand H. Lui¹, Seth Blumberg¹, Seth Blumberg³, John F. Beausang⁴, Philip C Nelson⁴, Rob Phillips¹ - Show less +4 more•Institutions (4)

California Institute of Technology¹, Stanford University², University of Michigan³, University of Pennsylvania⁴

13 Oct 2008-arXiv: Quantitative Methods

TL;DR: A detailed calibration of TPM magnitude as a function of DNA length and particle size is carried out and a systematic comparison between measured particle excursions and theoretical expectations is presented, which helps clarify both the experiments and models of DNA conformation.

...read moreread less

Abstract: The Tethered Particle Motion (TPM) method has been used to observe and characterize a variety of protein-DNA interactions including DNA looping and transcription. TPM experiments exploit the Brownian motion of a DNA-tethered bead to probe biologically relevant conformational changes of the tether. In these experiments, a change in the extent of the bead's random motion is used as a reporter of the underlying macromolecular dynamics and is often deemed sufficient for TPM analysis. However, a complete understanding of how the motion depends on the physical properties of the tethered particle complex would permit more quantitative and accurate evaluation of TPM data. For instance, such understanding can help extract details about a looped complex geometry (or multiple coexisting geometries) from TPM data. To better characterize the measurement capabilities of TPM experiments involving DNA tethers, we have carried out a detailed calibration of TPM magnitude as a function of DNA length and particle size. We also explore how experimental parameters such as acquisition time and exposure time affect the apparent motion of the tethered particle. We vary the DNA length from 200bp to 2.6kbp and consider particle diameters of 200, 490 and 970nm. We also present a systematic comparison between measured particle excursions and theoretical expectations, which helps clarify both the experiments and models of DNA conformation.

...read moreread less

Posted Content•

An empirical study of large, naturally occurring starling flocks: a benchmark in collective animal behaviour

[...]

Michele Ballerini, Nicola Cabibbo, Raphaël Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina¹, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale, Vladimir Zdravkovic - Show less +7 more•Institutions (1)

Istituto Nazionale di Fisica Nucleare¹

12 Feb 2008-arXiv: Quantitative Methods

TL;DR: This work measured the individual three-dimensional positions in compact flocks of up to 2700 birds and investigated the main features of the flock as a whole - shape, movement, density and structure - and discusses these as emergent attributes of the grouping phenomenon.

...read moreread less

Abstract: Bird flocking is a striking example of collective animal behaviour. A vivid illustration of this phenomenon is provided by the aerial display of vast flocks of starlings gathering at dusk over the roost and swirling with extraordinary spatial coherence. Both the evolutionary justification and the mechanistic laws of flocking are poorly understood, arguably because of a lack of data on large flocks. Here, we report a quantitative study of aerial display. We measured the individual three-dimensional positions in compact flocks of up to 2700 birds. We investigated the main features of the flock as a whole - shape, movement, density and structure - and discuss these as emergent attributes of the grouping phenomenon. We find that flocks are relatively thin, with variable sizes, but constant proportions. They tend to slide parallel to the ground and, during turns, their orientation changes with respect to the direction of motion. Individual birds keep a minimum distance from each other that is comparable to their wingspan. The density within the aggregations is non-homogeneous, as birds are packed more tightly at the border compared to the centre of the flock. These results constitute the first set of large-scale data on three-dimensional animal aggregations. Current models and theories of collective animal behaviour can now be tested against these results.

...read moreread less

Posted Content•

The STARFLAG handbook on collective animal behaviour: Part I, empirical methods

[...]

Andrea Cavagna, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, Massimiliano Viale, Vladimir Zdravkovic - Show less +3 more

12 Feb 2008-arXiv: Quantitative Methods

TL;DR: The main technical problems in 3D data collection of large animal groups are reviewed and how to solve the stereoscopic correspondence - or matching - problem is explained, which was the major bottleneck of all 3D studies in the past.

...read moreread less

Abstract: The most startling examples of collective animal behaviour are provided by very large and cohesive groups moving in three dimensions. Paradigmatic examples are bird flocks, fish schools and insect swarms. However, because of the sheer technical difficulty of obtaining 3D data, empirical studies conducted to date have only considered loose groups of a few tens of animals. Moreover, these studies were very seldom conducted in the field. Recently the STARFLAG project achieved the 3D reconstruction of thousands of birds under field conditions, thus opening the way to a new generation of quantitative studies of collective animal behaviour. Here, we review the main technical problems in 3D data collection of large animal groups and we outline some of the methodological solutions adopted by the STARFLAG project. In particular, we explain how to solve the stereoscopic correspondence - or matching - problem, which was the major bottleneck of all 3D studies in the past.

...read moreread less

Journal Article•DOI•

Statistical properties of multistep enzyme-mediated reactions

[...]

Wiet de Ronde, Bryan C. Daniels, Andrew Mugler, Nikolai A. Sinitsyn, Ilya Nemenman - Show less +1 more

20 Nov 2008-arXiv: Quantitative Methods

TL;DR: In this article, a perturbation theory analogous to that used in quantum mechanics is proposed to determine the first and second cumulants of the distribution of created product molecules as a function of the substrate concentration and the kinetic rates of the intermediate processes.

...read moreread less

Abstract: Enzyme-mediated reactions may proceed through multiple intermediate conformational states before creating a final product molecule, and one often wishes to identify such intermediate structures from observations of the product creation. In this paper, we address this problem by solving the chemical master equations for various enzymatic reactions. We devise a perturbation theory analogous to that used in quantum mechanics that allows us to determine the first ( ) and the second (variance) cumulants of the distribution of created product molecules as a function of the substrate concentration and the kinetic rates of the intermediate processes. The mean product flux V=d /dt (or "dose-response" curve) and the Fano factor F=variance/ are both realistically measurable quantities, and while the mean flux can often appear the same for different reaction types, the Fano factor can be quite different. This suggests both qualitative and quantitative ways to discriminate between different reaction schemes, and we explore this possibility in the context of four sample multistep enzymatic reactions. We argue that measuring both the mean flux and the Fano factor can not only discriminate between reaction types, but can also provide some detailed information about the internal, unobserved kinetic rates, and this can be done without measuring single-molecule transition events.

...read moreread less

Journal Article•DOI•

Charge transport in bacteriorhodopsin monolayers: The contribution of conformational change to current-voltage characteristics

[...]

Eleonora Alfinito, Lino Reggiani

10 Apr 2008-arXiv: Quantitative Methods

TL;DR: A theoretical model based on a map of the protein tertiary structure into a resistor network is implemented to account for a sequential tunneling mechanism of charge transfer through neighbouring amino acids and is validated by comparison with current-voltage experiments.

...read moreread less

Abstract: When moving from native to light activated bacteriorhodopsin, modification of charge transport consisting of an increase of conductance is correlated to the protein conformational change. A theoretical model based on a map of the protein tertiary structure into a resistor network is implemented to account for a sequential tunneling mechanism of charge transfer through neighbouring amino acids. The model is validated by comparison with current-voltage experiments. The predictability of the model is further tested on bovine rhodopsin, a G-protein coupled receptor (GPCR) also sensitive to light. In this case, results show an opposite behaviour with a decrease of conductance in the presence of light.

...read moreread less

Posted Content•

The STARFLAG handbook on collective animal behaviour: Part II, three-dimensional analysis

[...]

Andrea Cavagna, Irene Giardina, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini - Show less +1 more

12 Feb 2008-arXiv: Quantitative Methods

TL;DR: It is demonstrated that neglecting border effects gives rise to artefacts when studying the 3D structure of a group, and that mathematical rigour is essential to distinguish important biological properties from trivial geometric features of animal groups.

...read moreread less

Abstract: The study of collective animal behaviour must progress through a comparison between the theoretical predictions of numerical models and data coming from empirical observations. To this aim it is important to develop methods of three-dimensional (3D) analysis that are at the same time informative about the structure of the group and suitable to empirical data. In fact, empirical data are considerably noisier than numerical data, and they are subject to several constraints. We review here the tools of analysis used by the STARFLAG project to characterise the 3D structure of large flocks of starlings in the field. We show how to avoid the most common pitfalls i the quantitative analysis of 3D animal groups, with particular attention to the problem of the bias introduced by the border of the group. By means of practical examples, we demonstrate that neglecting border effects gives rise to artefacts when studying the 3D structure of a group. Moreover, we show that mathematical rigour is essential to distinguish important biological properties from trivial geometric features of animal groups.

...read moreread less

Posted Content•

Phylogenetic Profiles as a Unified Framework for Measuring Protein Structure, Function and Evolution

[...]

Kyung Dae Ko¹, Yoojin Hong¹, Gue Su Chang¹, Gaurav Bhardwaj¹, Damian B. van Rossum¹, Randen L. Patterson¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

15 Jun 2008-arXiv: Quantitative Methods

TL;DR: The power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function are illustrated.

...read moreread less

Abstract: The sequence of amino acids in a protein is believed to determine its native state structure, which in turn is related to the functionality of the protein. In addition, information pertaining to evolutionary relationships is contained in homologous sequences. One powerful method for inferring these sequence attributes is through comparison of a query sequence with reference sequences that contain significant homology and whose structure, function, and/or evolutionary relationships are already known. In spite of decades of concerted work, there is no simple framework for deducing structure, function, and evolutionary (SF&E) relationships directly from sequence information alone, especially when the pair-wise identity is less than a threshold figure ~25% [1,2]. However, recent research has shown that sequence identity as low as 8% is sufficient to yield common structure/function relationships and sequence identities as large as 88% may yet result in distinct structure and function [3,4]. Starting with a basic premise that protein sequence encodes information about SF&E, one might ask how one could tease out these measures in an unbiased manner. Here we present a unified framework for inferring SF&E from sequence information using a knowledge-based approach which generates phylogenetic profiles in an unbiased manner. We illustrate the power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function. These data are in excellent accord with published data and new experiments. Our results suggest that there is a wealth of previously unexplored information in protein sequence.

...read moreread less

Posted Content•

Inference of genetic networks from time course expression data using functional regression with lasso penalty

[...]

Heng Lian

04 Apr 2008-arXiv: Quantitative Methods

TL;DR: This work proposes to infer the parameters of the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves by taking advantage of the sparsity of the networks.

...read moreread less

Abstract: Statistical inference of genetic regulatory networks is essential for understanding temporal interactions of regulatory elements inside the cells. For inferences of large networks, identification of network structure is typical achieved under the assumption of sparsity of the networks. When the number of time points in the expression experiment is not too small, we propose to infer the parameters in the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves. For networks with a large number of genes, we take advantage of the sparsity of the networks by penalizing the linear coefficients with a L_1 norm. The ability of the algorithm to infer network structure is demonstrated using the cell-cycle time course data for Saccharomyces cerevisiae.

...read moreread less

Journal Article•DOI•

Simple models for scaling in phylogenetic trees

[...]

Emilio Hernández-García¹, Murat Tugrul¹, E. Alejandro Herrada¹, Víctor M. Eguíluz¹, Konstantin Klemm¹ - Show less +1 more•Institutions (1)

Spanish National Research Council¹

21 Oct 2008-arXiv: Quantitative Methods

TL;DR: A new model is introduced, the activity model, showing analytically and numerically that it also displays a power-law scaling of the depth with tree size at a critical parameter value.

...read moreread less

Abstract: Many processes and models --in biological, physical, social, and other contexts-- produce trees whose depth scales logarithmically with the number of leaves. Phylogenetic trees, describing the evolutionary relationships between biological species, are examples of trees for which such scaling is not observed. With this motivation, we analyze numerically two branching models leading to non-logarithmic scaling of the depth with the number of leaves. For Ford's alpha model, although a power-law scaling of the depth with tree size was established analytically, our numerical results illustrate that the asymptotic regime is approached only at very large tree sizes. We introduce here a new model, the activity model, showing analytically and numerically that it also displays a power-law scaling of the depth with tree size at a critical parameter value.

...read moreread less

Book Chapter•DOI•

Offdiagonal Complexity: A Computationally Quick Network Complexity Measure—Application to Protein Networks and Cell Division

[...]

Jens Christian Claussen¹•Institutions (1)

University of Kiel¹

01 Jan 2008-arXiv: Quantitative Methods

TL;DR: The Offdiagonal Complexity (OdC), a new, and computationally cheap, measure of complexity is defined, based on the node-node link cross-distribution, whose nondiagonal elements characterize the graph structure beyond link distribution, cluster coefficient, and average path length.

...read moreread less

Abstract: Many complex biological, social, and economical networks show topologies drastically differing from random graphs. But what is a complex network, i.e., how can one quantify the complexity of a graph? Here the Offdiagonal Complexity (OdC), a new, and computationally cheap, measure of complexity is defined, based on the node-node link cross-distribution, whose nondiagonal elements characterize the graph structure beyond link distribution, cluster coefficient, and average path length. The OdC approach is applied to the Helicobacter pylori protein interaction network and randomly rewired surrogates thereof. In addition, OdC is used to characterize the spatial complexity of cell aggregates. We investigate the earliest embryo development states of Caenorhabditis elegans. The development states of the premorphogenetic phase are represented by symmetric binary-valued cell connection matrices with dimension growing from 4 to 385. These matrices can be interpreted as adjacency matrices of an undirected graph, or network. The OdC approach allows us to describe quantitatively the complexity of the cell aggregate geometry.

...read moreread less

Journal Article•DOI•

Cluster structure of functional networks estimated from high-resolution EEG data

[...]

Roberta Sinatra, Fabrizio De Vico Fallani, Laura Astolfi, Fabio Babiloni, Febo Cincotti, Vito Latora, Donatella Mattia - Show less +3 more

17 Jun 2008-arXiv: Quantitative Methods

TL;DR: The results indicate large differences between the injured patients and the healthy subjects, and in particular, the networks of spinal cord injured patient exhibited a higher density of efficient clusters.

...read moreread less

Abstract: We study the topological properties of functional connectivity patterns among cortical areas in the frequency domain. The cortical networks were estimated from high-resolution EEG recordings in a group of spinal cord injured patients and in a group of healthy subjects, during the preparation of a limb movement. We first evaluate global and local efficiency, as indicators of the structural connectivity respectively at a global and local scale. Then, we use the Markov Clustering method to analyse the division of the network into community structures. The results indicate large differences between the injured patients and the healthy subjects. In particular, the networks of spinal cord injured patient exhibited a higher density of efficient clusters. In the Alpha (7-12 Hz) frequency band, the two observed largest communities were mainly composed by the cingulate motor areas with the supplementary motor areas, and by the pre-motor areas with the right primary motor area of the foot. This functional separation strengthens the hypothesis of a compensative mechanism due to the partial alteration in the primary motor areas because of the effects of the spinal cord injury.

...read moreread less

Posted Content•

Matrix genetics, part 2: the degeneracy of the genetic code and the octave algebra with two quasi-real units (the genetic octave Yin-Yang-algebra)

[...]

Sergey V. Petoukhov¹•Institutions (1)

Russian Academy of Sciences¹

23 Mar 2008-arXiv: Quantitative Methods

TL;DR: The investigations of the genetic code on the basis of matrix approaches ("matrix genetics") and the results are related with the problem of algebraization of bioinformatics take attention to the question: what is life from the viewpoint of algebra.

...read moreread less

Abstract: Algebraic properties of the genetic code are analyzed. The investigations of the genetic code on the basis of matrix approaches ("matrix genetics") are described. The degeneracy of the vertebrate mitochondria genetic code is reflected in the black-and-white mosaic of the (8*8)-matrix of 64 triplets, 20 amino acids and stop-signals. This mosaic genetic matrix is connected with the matrix form of presentation of the special 8-dimensional Yin-Yang-algebra and of its particular 4-dimensional case. The special algorithm, which is based on features of genetic molecules, exists to transform the mosaic genomatrix into the matrices of these algebras. Two new numeric systems are defined by these 8-dimensional and 4-dimensional algebras: genetic Yin-Yang-octaves and genetic tetrions. Their comparison with quaternions by Hamilton is presented. Elements of new "genovector calculation" and ideas of "genetic mechanics" are discussed. These algebras are considered as models of the genetic code and as its possible pre-code basis. They are related with binary oppositions of the Yin-Yang type and they give new opportunities to investigate evolution of the genetic code. The revealed fact of the relation between the genetic code and these genetic algebras is discussed in connection with the idea by Pythagoras: "All things are numbers". Simultaneously these genetic algebras can be utilized as the algebras of genetic operators in biological organisms. The described results are related with the problem of algebraization of bioinformatics. They take attention to the question: what is life from the viewpoint of algebra?

...read moreread less

Journal Article•DOI•

Efficient stochastic sampling of first-passage times with applications to self-assembly simulations

[...]

Navodit Misra¹, Russell Schwartz¹•Institutions (1)

Carnegie Mellon University¹

02 Apr 2008-arXiv: Quantitative Methods

TL;DR: Two methods for accelerating sampling in SSA models are developed: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound, based on the analysis of the eigenvalues of continuous time Markov models that define the behavior of the SSA.

...read moreread less

Abstract: Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated reaction networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction channels operate on widely varying time scales. In this paper, we develop two methods for accelerating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound. Both methods depend on analysis of the eigenvalues of continuous time Markov model graphs that define the behavior of the SSA. We demonstrate these methods for the specific application of sampling breakage times for multiply-connected bond networks, a class of stiff system important to models of self-assembly processes. We show theoretically and empirically that our eigenvalue methods provide substantially reduced sampling times for a wide range of network breakage models. These techniques are also likely to have broad use in accelerating SSA models so as to apply them to systems and parameter ranges that are currently computationally intractable.

...read moreread less

Posted Content•

Run and tumble chemotaxis in a shear flow: the effect of temporal comparisons and other complications

[...]

J. T. Locsei, T. J. Pedley

15 Apr 2008-arXiv: Quantitative Methods

TL;DR: Calculations of the chemotactic drift velocity vd of an E. coli cell performing chemotaxis in a uniform, steady shear flow, with a weak chemoattractant gradient at right angles to the flow find that a more elongated body shape is advantageous in performing chemosynthesis in a strongShear flow.

...read moreread less

Abstract: Escherichia coli is a motile bacterium that moves up a chemoattractant gradient by performing a biased random walk composed of alternating runs and tumbles. This paper presents calculations of the chemotactic drift velocity vd (the mean velocity up the chemoattractant gradient) of an E. coli cell performing chemotaxis in a uniform, steady shear flow, with a weak chemoattractant gradient at right angles to the flow. Extending earlier models, a combined analytic and numerical approach is used to assess the effect of several complications, namely (i) a cell cannot detect a chemoattractant gradient directly but rather makes temporal comparisons of chemoattractant concentration, (ii) the tumbles exhibit persistence of direction, meaning that the swimming directions before and after a tumble are correlated, (iii) the cell suffers random re-orientations due to rotational Brownian motion, and (iv) the non-spherical shape of the cell affects the way that it is rotated by the shear flow. These complications influence the dependence of vd on the shear rate gamma. When they are all included, it is found that (a) shear disrupts chemotaxis and shear rates beyond gamma = 2/second cause the cell to swim down the chemoattractant gradient rather than up it, (b) in terms of maximising drift velocity, persistence of direction is advantageous in a quiescent fluid but disadvantageous in a shear flow, and (c) a more elongated body shape is advantageous in performing chemotaxis in a strong shear flow.

...read moreread less

Posted Content•

On the Estimation of the Proportion of True Recent Infections Using the BED Capture Enzyme Immunoassay

[...]

Thomas A. McWalter, Alex Welte

07 Jun 2008-arXiv: Quantitative Methods

TL;DR: The assumptions made by McDougal et al (2006), both explicit and implicit, in their estimation of the proportion of "true recent infections" using the BED CEIA are analyzed to derive an identity which shows the relationship between these parameters, allowing the elimination of sensitivity and short term specificity.

...read moreread less

Abstract: In this short note, we analyze the assumptions made by McDougal et al (2006), both explicit and implicit, in their estimation of the proportion of "true recent infections" using the BED CEIA. This enables us to write down expressions for the sensitivity, short term specificity and long term specificity of a test for recent infection defined by a BED ODn below a threshold. We then derive an identity which shows the relationship between these parameters, allowing the elimination of sensitivity and short term specificity from an expression relating the proportion of "true recent infections" to the proportion of seropositive individuals testing below threshold. This has two important consequences. Firstly, the simplified formula is substantially more amenable to calibration. Secondly, naively treating the parameters as independent would lead to an incorrect estimate of uncertainty due to imperfect calibration.

...read moreread less

Book•DOI•

Efficient seeding techniques for protein similarity search

[...]

Mikhail A. Roytberg, Anna Gambin¹, Laurent Noé², Sławomir Lasota¹, Eugenia Furletova, Ewa Szczurek³, Gregory Kucherov² - Show less +3 more•Institutions (3)

University of Warsaw¹, French Institute for Research in Computer Science and Automation², Max Planck Society³

30 Oct 2008-arXiv: Quantitative Methods

TL;DR: While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, the seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

...read moreread less

Abstract: We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

...read moreread less

Posted Content•

Rejection-free kinetic Monte Carlo simulation of multivalent biomolecular interactions

[...]

Jin Yang¹, William S. Hlavacek•Institutions (1)

CAS-MPG Partner Institute for Computational Biology¹

25 Dec 2008-arXiv: Quantitative Methods

TL;DR: The rejection-free method reported here should be useful for simulating a variety of systems in which multisite molecular interactions yield large molecular aggregates, and it is applied to simulate simple models for ligand-receptor interactions.

...read moreread less

Abstract: The system-level dynamics of multivalent biomolecular interactions can be simulated using a rule-based kinetic Monte Carlo method in which a rejection sampling strategy is used to generate reaction events. This method becomes inefficient when simulating aggregation processes with large biomolecular complexes. Here, we present a rejectionfree method for determining the kinetics of multivalent biomolecular interactions, and we apply the method to simulate simple models for ligand-receptor interactions. Simulation results show that performance of the rejection-free method is equal to or better than that of the rejection method over wide parameter ranges, and the rejection-free method is more efficient for simulating systems in which aggregation is extensive. The rejection-free method reported here should be useful for simulating a variety of systems in which multisite molecular interactions yield large molecular aggregates.

...read moreread less

Posted Content•

Structural distance and evolutionary relationship of networks

[...]

Anirban Banerjee¹•Institutions (1)

Max Planck Society¹

20 Jul 2008-arXiv: Quantitative Methods

TL;DR: In this paper, a method to quantify the topological distance between two networks of different sizes, finding that the architectures of the networks are more similar within the same class than the outside of their class, is presented.

...read moreread less

Abstract: Evolutionary mechanism in a self-organized system cause some functional changes that force to adapt new conformation of the interaction pattern between the components of that system. Measuring the structural differences one can retrace the evolutionary relation between two systems. We present a method to quantify the topological distance between two networks of different sizes, finding that the architectures of the networks are more similar within the same class than the outside of their class. With 43 cellular networks of different species, we show that the evolutionary relationship can be elucidated from the structural distances.

...read moreread less