Showing papers in "Journal of Bioinformatics and Computational Biology in 2008"

PDF

Open Access

Journal Article•DOI•

Using indirect protein–protein interactions for protein complex prediction

[...]

Hon Nian Chua¹, Kang Ning¹, Wing-Kin Sung¹, Hon Wai Leong¹, Limsoon Wong¹ - Show less +1 more•Institutions (1)

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association and can be used to improve the precision of clusters predicted by various existing clustering algorithms.

...read moreread less

Abstract: Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein–protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a “partial clique merging” method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein–protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.

...read moreread less

156 citations

Journal Article•DOI•

Protein structure-structure alignment with discrete Fréchet distance.

[...]

Minghui Jiang¹, Ying Xu², Binhai Zhu³•Institutions (3)

Utah State University¹, University of Georgia², Montana State University³

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: New algorithms for matching two polygonal chains in two dimensions to minimize their discrete Fréchet distance under translation and rotation, and an effective heuristic for matching three-dimensional chains in three dimensions are presented.

...read moreread less

Abstract: Matching two geometric objects in two-dimensional (2D) and three-dimensional (3D) spaces is a central problem in computer vision, pattern recognition, and protein structure prediction. In particular, the problem of aligning two polygonal chains under translation and rotation to minimize their distance has been studied using various distance measures. It is well known that the Hausdorff distance is useful for matching two point sets, and that the Frechet distance is a superior measure for matching two polygonal chains. The discrete Frechet distance closely approximates the (continuous) Frechet distance, and is a natural measure for the geometric similarity of the folded 3D structures of biomolecules such as proteins. In this paper, we present new algorithms for matching two polygonal chains in two dimensions to minimize their discrete Frechet distance under translation and rotation, and an effective heuristic for matching two polygonal chains in three dimensions. We also describe our empirical results on the application of the discrete Frechet distance to protein structure-structure alignment.

...read moreread less

90 citations

Journal Article•DOI•

Gene regulatory network reconstruction by bayesian integration of prior knowledge and/or different experimental conditions

[...]

Adriano Velasque Werhli¹, Dirk Husmeier•Institutions (1)

Pontifícia Universidade Católica do Rio Grande do Sul¹

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The proposed coupling scheme is a compromise between learning networks from the different subsets separately, whereby no information between the different experiments is shared and does not provide any mechanism for uncovering differences between the network structures associated with the different experimental conditions.

...read moreread less

Abstract: There have been various attempts to improve the reconstruction of gene regulatory networks from microarray data by the systematic integration of biological prior knowledge. Our approach is based on pioneering work by Imoto et al.11 where the prior knowledge is expressed in terms of energy functions, from which a prior distribution over network structures is obtained in the form of a Gibbs distribution. The hyperparameters of this distribution represent the weights associated with the prior knowledge relative to the data. We have derived and tested a Markov chain Monte Carlo (MCMC) scheme for sampling networks and hyperparameters simultaneously from the posterior distribution, thereby automatically learning how to trade off information from the prior knowledge and the data. We have extended this approach to a Bayesian coupling scheme for learning gene regulatory networks from a combination of related data sets, which were obtained under different experimental conditions and are therefore potentially associated with different active subpathways. The proposed coupling scheme is a compromise between (1) learning networks from the different subsets separately, whereby no information between the different experiments is shared; and (2) learning networks from a monolithic fusion of the individual data sets, which does not provide any mechanism for uncovering differences between the network structures associated with the different experimental conditions. We have assessed the viability of all proposed methods on data related to the Raf signaling pathway, generated both synthetically and in cytometry experiments.

...read moreread less

67 citations

Journal Article•DOI•

CLePAPS: fast pair alignment of protein structures based on conformational letters.

[...]

Sheng Wang¹, Wei-Mou Zheng¹•Institutions (1)

Academia Sinica¹

01 Apr 2008-Journal of Bioinformatics and Computational Biology

TL;DR: CLePAPS distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states, which can be used to superimpose the structure pairs under comparison.

...read moreread less

Abstract: Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Cα pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.

...read moreread less

37 citations

Journal Article•DOI•

Using directed information to build biologically relevant influence networks.

[...]

Arvind Rao, Alfred O. Hero¹, David J. States¹, James Douglas Engel¹•Institutions (1)

University of Michigan¹

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: In this article, the authors propose a network inference methodology based on the directed information (DTI) criterion that incorporates the biology of transcription within the framework so as to enable experimentally verifiable inference.

...read moreread less

Abstract: The systematic inference of biologically relevant influence networks remains a challenging problem in computational biology. Even though the availability of high-throughput data has enabled the use of probabilistic models to infer the plausible structure of such networks, their true interpretation of the biology of the process is questionable. In this work, we propose a network inference methodology, based on the directed information (DTI) criterion, that incorporates the biology of transcription within the framework so as to enable experimentally verifiable inference. We use publicly available embryonic kidney and T-cell microarray datasets to demonstrate our results. We present two variants of network inference via DTI — supervised and unsupervised — and the inferred networks relevant to mammalian nephrogenesis and T-cell activation. Conformity of the obtained interactions with the literature as well as comparison with the coefficient of determination (CoD) method are demonstrated. Apart from network inference, the proposed framework enables the exploration of specific interactions, not just those revealed by data. To illustrate the latter point, a DTI-based framework to resolve interactions between transcription factor modules and target coregulated genes is proposed. Additionally, we show that DTI can be used in conjunction with mutual information to infer higher-order influence networks involving cooperative gene interactions.

...read moreread less

37 citations

Journal Article•DOI•

Prediction of cell wall sorting signals in gram-positive bacteria with a hidden markov model: application to complete genomes

[...]

Zoi I. Litou¹, Pantelis G. Bagos², Pantelis G. Bagos¹, Konstantinos D. Tsirigos¹, Theodore D. Liakopoulos¹, Stavros J. Hamodrakas¹ - Show less +2 more•Institutions (2)

National and Kapodistrian University of Athens¹, University of Central Greece²

01 Apr 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A hidden Markov model (HMM) approach for predicting the LPXTG-anchored cell wall proteins of Gram-positive bacteria was developed and compared against existing methods, finding a number that is significantly higher compared to those obtained by other available methods.

...read moreread less

Abstract: Surface proteins in Gram-positive bacteria are frequently implicated in virulence. We have focused on a group of extracellular cell wall-attached proteins (CWPs), containing an LPXTG motif for clea...

...read moreread less

36 citations

Journal Article•DOI•

Modeling nonlinear gene regulatory networks from time series gene expression data

[...]

André Fujita¹, João Ricardo Sato², Humberto Miguel Garay-Malpartida², Mari Cleide Sogayar², Carlos Eduardo Ferreira², Satoru Miyano¹ - Show less +2 more•Institutions (2)

University of Tokyo¹, University of São Paulo²

01 Oct 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The NVAR model is applied to estimate nonlinear gene regulatory networks based entirely on gene expression profiles obtained from DNA microarray experiments and the results obtained are shown.

...read moreread less

Abstract: In cells, molecular networks such as gene regulatory networks are the basis of biological complexity. Therefore, gene regulatory networks have become the core of research in systems biology. Understanding the processes underlying the several extracellular regulators, signal transduction, protein-protein interactions, and differential gene expression processes requires detailed molecular description of the protein and gene networks involved. To understand better these complex molecular networks and to infer new regulatory associations, we propose a statistical method based on vector autoregressive models and Granger causality to estimate nonlinear gene regulatory networks from time series microarray data. Most of the models available in the literature assume linearity in the inference of gene connections; moreover, these models do not infer directionality in these connections. Thus, a priori biological knowledge is required. However, in pathological cases, no a priori biological information is available. To overcome these problems, we present the nonlinear vector autoregressive (NVAR) model. We have applied the NVAR model to estimate nonlinear gene regulatory networks based entirely on gene expression profiles obtained from DNA microarray experiments. We show the results obtained by NVAR through several simulations and by the construction of three actual gene regulatory networks (p53, NF-kappaB, and c-Myc) for HeLa cells.

...read moreread less

30 citations

Journal Article•DOI•

Tali: local alignment of protein structures using backbone torsion angles

[...]

Xijiang Miao¹, Peter J. Waddell¹, Homayoun Valafar¹•Institutions (1)

University of South Carolina¹

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The inference of the evolutionary tree of class II aminoacyl-tRNA synthetase shows the potential for TALI in estimating protein structural evolution and in identifying structural divergence among homologous structures.

...read moreread less

Abstract: Torsion angle alignment (TALI) is a novel approach to local structural motif alignment, based on backbone torsion angles (ϕ, ψ) rather than the more traditional atomic distance matrices. Representa...

...read moreread less

29 citations

Journal Article•DOI•

Complexities and algorithms for glycan sequencing using tandem mass spectrometry.

[...]

Baozhen Shan¹, Bin Ma¹, Kaizhong Zhang¹, Gilles A. Lajoie¹•Institutions (1)

University of Western Ontario¹

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: It is shown in this paper that glycan de novo sequencing is NP-hard, and a heuristic algorithm is provided and a software program is developed to solve the problem in practical cases.

...read moreread less

Abstract: Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan sequencing, which determines the primary structure of a glycan using tandem mass spectrometry (MS/MS), remains one of the most important tasks in proteomics. Analogous to peptide de novo sequencing, glycan de novo sequencing determines the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.

...read moreread less

28 citations

Journal Article•DOI•

Design and analysis of quantitative differential proteomics investigations using lc-ms technology

[...]

Yury V. Bukhman¹, Moyez Dharsee, Rob M. Ewing, Peter Chu, Thodoros Topaloglou², Thierry Le Bihan, Theo Goh¹, Henry S. Duewel³, Ian I. Stewart², Jacek R. Wisniewski⁴, Nancy F. L. Ng¹ - Show less +7 more•Institutions (4)

University Health Network¹, University of Toronto², Sigma-Aldrich³, Max Planck Society⁴

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A novel statistical method for inferring the relative abundance of related members of protein families from tryptic peptide intensities is implemented, and this pipeline has been used to analyze quantitative LC-MS data from multiple biomarker discovery projects.

...read moreread less

Abstract: Liquid chromatography-mass spectrometry (LC-MS)-based proteomics is becoming an increasingly important tool in characterizing the abundance of proteins in biological samples of various types and across conditions. Effects of disease or drug treatments on protein abundance are of particular interest for the characterization of biological processes and the identification of biomarkers. Although state-of-the-art instrumentation is available to make high-quality measurements and commercially available software is available to process the data, the complexity of the technology and data presents challenges for bioinformaticians and statisticians. Here, we describe a pipeline for the analysis of quantitative LC-MS data. Key components of this pipeline include experimental design (sample pooling, blocking, and randomization) as well as deconvolution and alignment of mass chromatograms to generate a matrix of molecular abundance profiles. An important challenge in LC-MS-based quantitation is to be able to accurately identify and assign abundance measurements to members of protein families. To address this issue, we implement a novel statistical method for inferring the relative abundance of related members of protein families from tryptic peptide intensities. This pipeline has been used to analyze quantitative LC-MS data from multiple biomarker discovery projects. We illustrate our pipeline here with examples from two of these studies, and show that the pipeline constitutes a complete workable framework for LC-MS-based differential quantitation. Supplementary material is available at http://iec01.mie.utoronto.ca/~thodoros/Bukhman/.

...read moreread less

25 citations

Journal Article•DOI•

Compactness determines protein folding type.

[...]

Oxana V. Galzitskaya¹, Natalya S. Bogatyreva¹, Dmitry N. Ivankov¹•Institutions (1)

Russian Academy of Sciences¹

01 Aug 2008-Journal of Bioinformatics and Computational Biology

TL;DR: It is demonstrated here that protein compactness, which is defined as the ratio of the accessible surface area of a protein to that of the ideal sphere of the same volume, is one of the factors determining the mechanism of protein folding.

...read moreread less

Abstract: We have demonstrated here that protein compactness, which we define as the ratio of the accessible surface area of a protein to that of the ideal sphere of the same volume, is one of the factors determining the mechanism of protein folding. Proteins with multi-state kinetics, on average, are more compact (compactness is 1.49+/-0.02 for proteins within the size range of 101-151 amino acid residues) than proteins with two-state kinetics (compactness is 1.59+/-0.03 for proteins within the same size range of 101-151 amino acid residues). We have shown that compactness for homologous proteins can explain both the difference in folding rates and the difference in folding mechanisms.

...read moreread less

Journal Article•DOI•

Using formal concept analysis for microarray data comparison

[...]

V. Choi¹, Yi-Wen Huang², Vy Lam³, D. Potter³, Reinhard Laubenbacher³, Karen Duca³ - Show less +2 more•Institutions (3)

Virginia Tech¹, Rutgers University², Virginia Bioinformatics Institute³

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The feasibility of using formal concept analysis (FCA) as a tool for microarray data analysis is investigated and the preliminary results show the promise of the method as a tools for micro array data analysis.

...read moreread less

Abstract: Microarray technologies, which can measure tens of thousands of gene expression values simultaneously in a single experiment, have become a common research method for biomedical researchers. Computational tools to analyze microarray data for biological discovery are needed. In this paper, we investigate the feasibility of using formal concept analysis (FCA) as a tool for microarray data analysis. The method of FCA builds a (concept) lattice from the experimental data together with additional biological information. For microarray data, each vertex of the lattice corresponds to a subset of genes that are grouped together according to their expression values and some biological information related to gene function. The lattice structure of these gene sets might reflect biological relationships in the dataset. Similarities and differences between experiments can then be investigated by comparing their corresponding lattices according to various graph measures. We apply our method to microarray data derived from influenza-infected mouse lung tissue and healthy controls. Our preliminary results show the promise of our method as a tool for microarray data analysis.

...read moreread less

Journal Article•DOI•

Deriving topology and sequence alignment for the helix skeleton in low-resolution protein density maps.

[...]

Yonggang Lu¹, Jing He¹, Charlie E. M. Strauss²•Institutions (2)

New Mexico State University¹, Los Alamos National Laboratory²

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: This work developed a method to predict the topology and sequence alignment for the skeleton helices of protein complexes using the Rosetta ab initio structure prediction method, and analyzed the use of the skeletons as a clustering tool for the decoy structures generated by Rosetta.

...read moreread less

Abstract: Cryoelectron microscopy (cryoEM) is an experimental technique to determine the three-dimensional (3D) structure of large protein complexes. Currently, this technique is able to generate protein den...

...read moreread less

Journal Article•DOI•

A survey on haplotyping algorithms for tightly linked markers

[...]

Jing Li¹, Tao Jiang²•Institutions (2)

Case Western Reserve University¹, University of California, Riverside²

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples for tightly linked markers such as SNPs.

...read moreread less

Abstract: Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.

...read moreread less

Journal Article•DOI•

Fitting protein chains to cubic lattice is NP-complete.

[...]

Ján Manuch¹, Daya Ram Gaur²•Institutions (2)

Simon Fraser University¹, University of Lethbridge²

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: It is shown that given a three-dimensional fold of a protein chain, the closest lattice approximation of this fold is found, which is NP-complete for the cubic lattice with side close to 3.8 A and coordinate root mean square deviation.

...read moreread less

Abstract: It is known that folding a protein chain into a cubic lattice is an NP-complete problem. We consider a seemingly easier problem: given a three-dimensional (3D) fold of a protein chain (coordinates of its Cα atoms), we want to find the closest lattice approximation of this fold. This problem has been studied under names such as "lattice approximation of a protein chain", "the protein chain fitting problem", and "building of protein lattice models". We show that this problem is NP-complete for the cubic lattice with side close to 3.8 A and coordinate root mean square deviation.

...read moreread less

Journal Article•DOI•

An integrative domain-based approach to predicting protein-protein interactions.

[...]

Thanh Phuong Nguyen¹, Tu Bao Ho•Institutions (1)

The Microsoft Research - University of Trento Centre for Computational and Systems Biology¹

01 Dec 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A novel integrative domain-based method for predicting PPIs using inductive logic programming (ILP), which predicts PPIs better than other computational methods in terms of typical performance measures and can be applied to predict DDIs with high sensitivity and specificity.

...read moreread less

Abstract: Protein-protein interactions (PPIs) are intrinsic to almost all cellular processes. Different computational methods offer new chances to study PPIs. To predict PPIs, while the integrative methods use multiple data sources instead of a single source, the domain-based methods often use only protein domain features. Integration of both protein domain features and genomic/proteomic features from multiple databases can more effectively predict PPIs. Moreover, it allows discovering the reciprocal relationships between PPIs and biological features of their interacting partners. We developed a novel integrative domain-based method for predicting PPIs using inductive logic programming (ILP). Two principal domain features used were domain fusions and domain-domain interactions (DDIs). Various relevant features of proteins were exploited from five popular genomic and proteomic databases. By integrating these features, we constructed biologically significant ILP background knowledge of more than 278,000 ground facts. The experimental results through multiple 10-fold cross-validations demonstrated that our method predicts PPIs better than other computational methods in terms of typical performance measures. The proposed ILP framework can be applied to predict DDIs with high sensitivity and specificity. The induced ILP rules gave us many interesting, biologically reciprocal relationships among PPIs, protein domains, and PPI-related genomic/proteomic features. Supplementary material is available at (http://www.jaist.ac.jp/~s0560205/PPIandDDI/).

...read moreread less

Journal Article•DOI•

Kinetic model of phosphofructokinase-1 from Escherichia coli.

[...]

Kirill Peskov, Igor Goryanin¹, Oleg Demin², Oleg Demin³•Institutions (3)

University of Edinburgh¹, Moscow State University², Institute for Systems Biology³

01 Aug 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A complete catalytic cycle has been reconstructed based on available information on the oligomeric structure of the enzyme and kinetic mechanism of its monomer and the model developed can be used in the kinetic modeling of biochemical pathways containing phosphofructokinase-1.

...read moreread less

Abstract: This paper presents a kinetic model of phosphofructokinase-1 from Escherichia coli. A complete catalytic cycle has been reconstructed based on available information on the oligomeric structure of the enzyme and kinetic mechanism of its monomer. Applying the generalization of the Monod–Wyman–Changeux approach proposed by Popova and Sel'kov35–37 to the reconstructed catalytic cycle rate equation has been derived. Dependence of the reaction rate on pH, magnesium, and effectors has been taken into account. Kinetic parameters have been estimated via fitting the rate equation against experimentally measured dependencies of initial rate on substrates, products, effectors, and pH available from the literature. The model of phosphofructokinase-1 predicts (1) cooperativity of binding both fructose-6-phosphate and ATPMg2-, (2) significant inhibition of the enzyme resulting from an increase in total concentration of ATP under the condition of fixed concentration of Mg2+ ions, and (3) dual effect of ADP consisting of ...

...read moreread less

Journal Article•DOI•

Clustering of main orthologs for multiple genomes.

[...]

Zheng Fu¹, Tao Jiang¹•Institutions (1)

University of California, Riverside¹

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: This paper extends MSOAR to multiple (closely related) genomes and proposes an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes, which gives more detailed and accurate orthology information.

...read moreread less

Abstract: The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication has been proposed in Fu et al.1 MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario, minimizing the number of genome rearrangement and (postspeciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwise genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse, and human genomes, and val...

...read moreread less

Journal Article•DOI•

Duplicated RNA genes in teleost fish genomes.

[...]

Dominic Rose¹, Julian Jöris¹, Jörg Hackermüller², Jörg Hackermüller¹, Kristin Reiche², Kristin Reiche¹, Qiang Li³, Qiang Li¹, Peter F. Stadler - Show less +5 more•Institutions (3)

Leipzig University¹, Fraunhofer Society², Fudan University³

01 Dec 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A computational survey of structured non-coding RNAs in teleost genomes focuses on the fate of fish-specific duplicates, finding evidence of a large number of structured RNAs, most of which are clade-specific or evolve so fast that their tetrapod homologs cannot be detected.

...read moreread less

Abstract: Teleost fishes share a duplication of their entire genomes. We report here on a computational survey of structured non-coding RNAs (ncRNAs) in teleost genomes, focusing on the fate of fish-specific duplicates. As in other metazoan groups, we find evidence of a large number (11,543) of structured RNAs, most of which (~86%) are clade-specific or evolve so fast that their tetrapod homologs cannot be detected. In surprising contrast to protein-coding genes, the fish-specific genome duplication did not lead to a large number of paralogous ncRNAs: only 188 candidates, mostly microRNAs, appear in a larger copy number in teleosts than in tetrapods, suggesting that large-scale gene duplications do not play a major role in the expansion of the vertebrate ncRNA inventory.

...read moreread less

Journal Article•DOI•

Modeling of Glycerol-3-Phosphate Transporter Suggests a Potential ‘Tilt’ Mechanism involved in its Function

[...]

Igor F. Tsigelny¹, Jerry P. Greenberg¹, Valentina L. Kouznetsova¹, Sanjay K. Nigam¹•Institutions (1)

University of California, San Diego¹

01 Oct 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The results suggest that transport mechanisms in this transporter family should probably not be assumed to be conserved simply based on standard structural homology considerations, and raise the possibility that, while the "rocker switch" may apply to certain MFS transporters, intermediate "tilted" states may exist under certain circumstances or as transitional structures.

...read moreread less

Abstract: Many major facilitator superfamily (MFS) transporters have similar 12-transmembrane alpha-helical topologies with two six-helix halves connected by a long loop. In humans, these transporters participate in key physiological processes and are also, as in the case of members of the organic anion transporter (OAT) family, of pharmaceutical interest. Recently, crystal structures of two bacterial representatives of the MFS family--the glycerol-3-phosphate transporter (GlpT) and lac-permease (LacY)--have been solved and, because of assumptions regarding the high structural conservation of this family, there is hope that the results can be applied to mammalian transporters as well. Based on crystallography, it has been suggested that a major conformational "switching" mechanism accounts for ligand transport by MFS proteins. This conformational switch would then allow periodic changes in the overall transporter configuration, resulting in its cyclic opening to the periplasm or cytoplasm. Following this lead, we have modeled a possible "switch" mechanism in GlpT, using the concept of rotation of protein domains as in the DynDom program17 and membranephilic constraints predicted by the MAPAS program.(23) We found that the minima of energies of intersubunit interactions support two alternate positions consistent with their transport properties. Thus, for GlpT, a "tilt" of 9 degrees -10 degrees rotation had the most favorable energetics of electrostatic interaction between the two halves of the transporter; moreover, this confirmation was sufficient to suggest transport of the ligand across the membrane. We conducted steered molecular dynamics simulations of the GlpT-ligand system to explore how glycerol-3-phosphate would be handled by the "tilted" structure, and obtained results generally consistent with experimental mutagenesis data. While biochemical data remain most consistent with a single-site alternating access model, our results raise the possibility that, while the "rocker switch" may apply to certain MFS transporters, intermediate "tilted" states may exist under certain circumstances or as transitional structures. Although wet lab experimental confirmation is required, our results suggest that transport mechanisms in this transporter family should probably not be assumed to be conserved simply based on standard structural homology considerations. Furthermore, steered molecular dynamics elucidating energetic interactions of ligands with amino acid residues in an appropriately modeled transporter may have predictive value in understanding the impact of mutations and/or polymorphisms on transporter function.

...read moreread less

Journal Article•DOI•

Feature selection in validating mass spectrometry database search results

[...]

Jianwen Fang¹, Yinghua Dong¹, Todd D. Williams¹, Gerald H. Lushington¹•Institutions (1)

University of Kansas¹

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: It is demonstrated that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8.

...read moreread less

Abstract: Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.

...read moreread less

Journal Article•DOI•

PPiClust: efficient clustering of 3D protein-protein interaction interfaces.

[...]

Zeyar Aung¹, Soon-Heng Tan¹, See-Kiong Ng¹, Kian-Lee Tan²•Institutions (2)

Institute for Infocomm Research Singapore¹, National University of Singapore²

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: PPiClust is presented, to systematically encode, cluster, and analyze similar 3D interface patterns in protein complexes efficiently, and is effective in discovering visually consistent and statistically significant clusters of interfaces, and sufficiently time-efficient to be performed on a single computer.

...read moreread less

Abstract: The biological mechanisms through which proteins interact with one another are best revealed by studying the structural interfaces between interacting proteins. Protein–protein interfaces can be extracted from three-dimensional (3D) structural data of protein complexes and then clustered to derive biological insights. However, conventional protein interface clustering methods lack computational scalability and statistical support. In this work, we present a new method named "PPiClust" to systematically encode, cluster, and analyze similar 3D interface patterns in protein complexes efficiently. Experimental results showed that our method is effective in discovering visually consistent and statistically significant clusters of interfaces, and at the same time sufficiently time-efficient to be performed on a single computer. The interface clusters are also useful for uncovering the structural basis of protein interactions. Analysis of the resulting interface clusters revealed groups of structurally diverse proteins having similar interface patterns. We also found, in some of the interface clusters, the presence of well-known linear binding motifs which were noncontiguous in the primary sequences. These results suggest that PPiClust can discover not only statistically significant, but also biologically significant, protein interface clusters from protein complex structural data.

...read moreread less

Journal Article•DOI•

Ubiquitous reassortments in influenza A viruses

[...]

Xiu-Feng Wan¹, Mufit Ozden¹, Guohui Lin²•Institutions (2)

Miami University¹, University of Alberta²

01 Oct 2008-Journal of Bioinformatics and Computational Biology

TL;DR: This paper presents a reassortment identification method based on distance measurement using complete composition vector (CCV) and segment clustering using a minimum spanning tree (MST) algorithm that identified 34 potential reassortment clusters among 2,641 PB2 segments of influenza A viruses.

...read moreread less

Abstract: The influenza A virus is a negative-stranded RNA virus composed of eight segmented RNA molecules, including polymerases (PB2, PB1, PA), hemagglutinin (HA), nucleoprotein (NP), neuraminidase (NA), matrix protein (MP), and nonstructure gene (NS). The influenza A viruses are notorious for rapid mutations, frequent reassortments, and possible recombinations. Among these evolutionary events, reassortments refer to exchanges of discrete RNA segments between co-infected influenza viruses, and they have facilitated the generation of pandemic and epidemic strains. Thus, identification of reassortments will be critical for pandemic and epidemic prevention and control. This paper presents a reassortment identification method based on distance measurement using complete composition vector (CCV) and segment clustering using a minimum spanning tree (MST) algorithm. By applying this method, we identified 34 potential reassortment clusters among 2,641 PB2 segments of influenza A viruses. Among the 83 serotypes tested, at least 56 (67.46%) exchanged their fragments with another serotype of influenza A viruses. These identified reassortments involve 1,957 H2N1 and 1,968 H3N2 influenza pandemic strains as well as H5N1 avian influenza virus isolates, which have generated the potential for a future pandemic threat. More frequent reassortments were found to occur in wild birds, especially migratory birds. This MST clustering program is written in Java and will be available upon request.

...read moreread less

Journal Article•DOI•

Development of an affinity evaluation and prediction system by using the shape complementarity characteristic between proteins.

[...]

Koki Tsukamoto¹, Tatsuya Yoshikawa¹, Tatsuya Yoshikawa², Yuichiro Hourai¹, Kazuhiko Fukui¹, Yutaka Akiyama¹ - Show less +2 more•Institutions (2)

National Institute of Advanced Industrial Science and Technology¹, Osaka University²

01 Dec 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The ultimate goal is to construct an affinity database that will provide crucial information obtained using the affinity evaluation and prediction system to cell biologists and drug designers.

...read moreread less

Abstract: A system was developed to evaluate and predict the interaction between protein pairs by using the widely used shape complementarity search method as the algorithm for docking simulations between the proteins. This system, which we call the affinity evaluation and prediction (AEP) system, was used to evaluate the interaction between 20 protein pairs. The system first executes a "round robin" shape complementarity search of the target protein group, and evaluates the interaction of the complex structures obtained by shape complementarity search. These complex structures are selected by using a statistical procedure that we developed called "grouping". At a low prevalence of 5.0%, our AEP system predicted protein–protein interaction with 65.0% recall, 15.1% precision, 80.0% accuracy, and had an area under the curve (AUC) of 0.74. By optimizing the grouping process, our AEP system successfully predicted 13 protein pairs (among 20 pairs) that were biologically significant combinations. Our ultimate goal is to construct an affinity database that will provide crucial information obtained using our AEP system to cell biologists and drug designers.

...read moreread less

Journal Article•DOI•

ClusFCM: AN ALGORITHM FOR PREDICTING PROTEIN FUNCTIONS USING HOMOLOGIES AND PROTEIN INTERACTIONS

[...]

Cao D. Nguyen¹, Michael V. Mannino², Katheleen Gardiner², Krzysztof J. Cios³, Krzysztof J. Cios⁴, Krzysztof J. Cios¹ - Show less +2 more•Institutions (4)

Virginia Commonwealth University¹, University of Colorado Denver², University of Colorado Boulder³, Polish Academy of Sciences⁴

01 Feb 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A new algorithm, called ClusFCM, is introduced, which combines techniques of clustering and fuzzy cognitive maps (FCM) for prediction of protein functions, and predicts protein functions with high recall while not lowering precision.

...read moreread less

Abstract: We introduce a new algorithm, called ClusFCM, which combines techniques of clustering and fuzzy cognitive maps (FCM) for prediction of protein functions. ClusFCM takes advantage of protein homologies and protein interaction network topology to improve low recall predictions associated with existing prediction methods. ClusFCM exploits the fact that proteins of known function tend to cluster together and deduce functions not only through their direct interaction with other proteins, but also from other proteins in the network. We use ClusFCM to annotate protein functions for Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), and Drosophila melanogaster (fly) using protein–protein interaction data from the General Repository for Interaction Datasets (GRID) database and functional labels from Gene Ontology (GO) terms. The algorithm's performance is compared with four state-of-the-art methods for function prediction — Majority, χ2 statistics, Markov random field (MRF), and FunctionalFlow — using measures of Matthews correlation coefficient, harmonic mean, and area under the receiver operating characteristic (ROC) curves. The results indicate that ClusFCM predicts protein functions with high recall while not lowering precision. Supplementary information is available at .

...read moreread less

Journal Article•DOI•

Testing differential expression in nonoverlapping gene pairs: a new perspective for the empirical Bayes method.

[...]

Lev B. Klebanov¹, Xing Qiu², Andrei Yakovlev²•Institutions (2)

Charles University in Prague¹, University of Rochester²

01 Apr 2008-Journal of Bioinformatics and Computational Biology

TL;DR: The proposed modification of the empirical Bayes method leads to significant improvements in its performance and the new paradigm arising from the existence of the delta-sequence in biological data offers considerable scope for future developments.

...read moreread less

Abstract: The currently practiced methods of significance testing in microarray gene expression profiling are highly unstable and tend to be very low in power. These undesirable properties are due to the nature of multiple testing procedures, as well as extremely strong and long-ranged correlations between gene expression levels. In an earlier publication, we identified a special structure in gene expression data that produces a sequence of weakly dependent random variables. This structure, termed the δ-sequence, lies at the heart of a new methodology for selecting differentially expressed genes in nonoverlapping gene pairs. The proposed method has two distinct advantages: (1) it leads to dramatic gains in terms of the mean numbers of true and false discoveries, and in the stability of the results of testing; and (2) its outcomes are entirely free from the log-additive array-specific technical noise. We demonstrate the usefulness of this approach in conjunction with the nonparametric empirical Bayes method. The proposed modification of the empirical Bayes method leads to significant improvements in its performance. The new paradigm arising from the existence of the δ-sequence in biological data offers considerable scope for future developments in this area of methodological research.

...read moreread less

Journal Article•DOI•

On preprocessing and antisymmetry in de novo peptide sequencing: improving efficiency and accuracy.

[...]

Kang Ning¹, Nan Ye¹, Hon Wai Leong¹•Institutions (1)

National University of Singapore¹

01 Jun 2008-Journal of Bioinformatics and Computational Biology

TL;DR: In this article, a general preprocessing scheme for peptide sequencing is proposed, which performs binning, pseudo-peak introduction, and noise removal, and present theoretical and experimental analyses on each of the components.

...read moreread less

Abstract: Peptide sequencing plays a fundamental role in proteomics. Tandem mass spectrometry, being sensitive and efficient, is one of the most commonly used techniques in peptide sequencing. Many computational models and algorithms have been developed for peptide sequencing using tandem mass spectrometry. In this paper, we investigate general issues in de novo sequencing, and present results that can be used to improve current de novo sequencing algorithms. We propose a general preprocessing scheme that performs binning, pseudo-peak introduction, and noise removal, and present theoretical and experimental analyses on each of the components. Then, we study the antisymmetry problem and current assumptions related to it, and propose a more realistic way to handle the antisymmetry problem based on analysis of some datasets. We integrate our findings on preprocessing and the antisymmetry problem with some current models for peptide sequencing. Experimental results show that our findings help to improve accuracies for de novo sequencing.

...read moreread less

Journal Article•DOI•

Orthofocus: program for identification of orthologs in multiple genomes in family-focused studies

[...]

Alexander E. Ivliev¹, Marina G. Sergeeva¹•Institutions (1)

Moscow State University¹

01 Aug 2008-Journal of Bioinformatics and Computational Biology

TL;DR: A program OrthoFocus is developed, which employs an extended reciprocal best hit approach to quickly search for orthologs in a pair of genomes and generates a multiple alignment of orthologics so that it can further be used in phylogenetic analysis.

...read moreread less

Abstract: The identification of orthologs to a set of known genes is often the starting point for evolutionary studies focused on gene families of interest. To date, the existing orthology detection tools (C...

...read moreread less

Journal Article•DOI•

Prediction of loop regions in protein sequence.

[...]

Nikita V. Dovidchenko¹, Natalya S. Bogatyreva¹, Oxana V. Galzitskaya¹•Institutions (1)

Russian Academy of Sciences¹

01 Oct 2008-Journal of Bioinformatics and Computational Biology

TL;DR: An algorithm is suggested that inputs a protein sequence and outputs a decomposition of the protein chain into a regular part including secondary structures and a nonregular part corresponding to loop regions that can be used to find patterns of rigid and flexible loops as possible candidates to play a structure/function role as well as a role of antigenic determinants.

...read moreread less

Abstract: We suggest an algorithm that inputs a protein sequence and outputs a decomposition of the protein chain into a regular part including secondary structures and a nonregular part corresponding to loo...

...read moreread less

Journal Article•DOI•

Wave packet motions coupled to electron transfer in reaction centers of Chloroflexus aurantiacus.

[...]

A. G. Yakovlev¹, Tatiana A. Shkuropatova², L. G. Vasilieva³, Anatoli Ya. Shkuropatov³, Vladimir A. Shuvalov¹, Vladimir A. Shuvalov³ - Show less +2 more•Institutions (3)

Moscow State University¹, Leiden University², Russian Academy of Sciences³

01 Aug 2008-Journal of Bioinformatics and Computational Biology

TL;DR: It was found that the nuclear wave packet motion induced on the potential energy surface of the excited state of the primary electron donor P* by approximately 20 fs excitation leads to a coherent formation of the states P+Phi(B)(-) and P-A)(-) (B(A) is a bacteriochlorophyll monomer in the A-branch of cofactors).

...read moreread less

Abstract: Transient absorption difference spectroscopy with ~20 femtosecond (fs) resolution was applied to study the time and spectral evolution of low-temperature (90 K) absorbance changes in isolated react...

...read moreread less