scispace - formally typeset
Search or ask a question
JournalISSN: 1066-5277

Journal of Computational Biology 

Mary Ann Liebert, Inc.
About: Journal of Computational Biology is an academic journal published by Mary Ann Liebert, Inc.. The journal publishes majorly in the area(s): Computer science & Medicine. It has an ISSN identifier of 1066-5277. Over the lifetime, 2751 publications have been published receiving 110315 citations. The journal is also known as: Computational biology.
Topics: Computer science, Medicine, Gene, Biology, Genome


Papers
More filters
Journal ArticleDOI
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

16,859 citations

Journal ArticleDOI
TL;DR: A new greedy alignment algorithm is introduced with particularly good performance and it is shown that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data.
Abstract: For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

4,628 citations

Journal ArticleDOI
TL;DR: A new framework for discovering interactions between genes based on multiple expression measurements is proposed and a method for recovering gene interactions from microarray data is described using tools for learning Bayesian networks.
Abstract: DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).

3,507 citations

Journal ArticleDOI
TL;DR: This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equation, stochastic equations, and so on.
Abstract: The spatiotemporal expression of genes in an organism is determined by regulatory systems that involve a large number of genes connected through a complex network of interactions. As an intuitive understanding of the behavior of these systems is hard to obtain, computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This report reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, ordinary and partial differential equations, stochastic equations, Boolean networks and their generalizations, qualitative differential equations, and rule-based formalisms. In addition, the report discusses how these formalisms have been used in the modeling and simulation of regulatory systems.

2,739 citations

Journal ArticleDOI
TL;DR: The best models out-perform previous probabilistic models in the discrimination of human 5' and 3' splice sites from decoys and mechanistically motivated ways of comparing models are discussed.
Abstract: We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5' (donor) and 3' (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.

1,667 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202345
2022124
2021102
2020189
2019231
2018200