Journal•ISSN: 0219-7200
Journal of Bioinformatics and Computational Biology
About: Journal of Bioinformatics and Computational Biology is an academic journal. The journal publishes majorly in the area(s): Cluster analysis & Gene regulatory network. It has an ISSN identifier of 0219-7200. Over the lifetime, 1115 publication(s) have been published receiving 17435 citation(s).
Papers published on a yearly basis
Papers
More filters
TL;DR: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes.
Abstract: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their ...
1,590 citations
TL;DR: This work focuses on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL 3H, V l3P, and Vl3E.
Abstract: Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 +/- 1.4%, 85.3 +/- 1.4%, and 85.2 +/- 1.5%. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 +/- 1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6 +/- 1.3%) and VL2 (80.9 +/- 1.4%). The new disorder predictors with the corresponding datasets are freely accessible through the web server at http://www.ist.temple.edu/disprot.
374 citations
TL;DR: Extending the single optimized spaced seed of PatternHunter to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the Lack of speed of Smith-Waterman, for homology search.
Abstract: Extending the single optimized spaced seed of PatternHunter(20) to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search methodology research back to a full circle.
297 citations
TL;DR: To overcome the limitations of hard clustering, this work applied soft clustering which offers several advantages for researchers, including more noise robust and a priori pre-filtering of genes can be avoided.
Abstract: Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy c-means algorithm. Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.
280 citations
TL;DR: Dizzy, a software tool for stochastically and deterministically modeling the spatially homogeneous kinetics of integrated large-scale genetic, metabolic, and signaling networks, is described.
Abstract: We describe Dizzy, a software tool for stochastically and deterministically modeling the spatially homogeneous kinetics of integrated large-scale genetic, metabolic, and signaling networks. Notable features include a modular simulation framework, reusable modeling elements, complex kinetic rate laws, multi-step reaction processes, steady-state noise estimation, and spatial compartmentalization.
265 citations