Showing papers in "Journal of Bioinformatics and Computational Biology in 2019"

PDF

Open Access

Journal Article•DOI•

Convolutional neural network approach to lung cancer classification integrating protein interaction network and gene expression profiles

[...]

Teppei Matsubara¹, Tomoshiro Ochiai², Morihiro Hayashida, Tatsuya Akutsu³, Jose C. Nacher¹ - Show less +1 more•Institutions (3)

Toho University¹, Otsuma Women's University², Kyoto University³

01 Jun 2019-Journal of Bioinformatics and Computational Biology

TL;DR: The developed spectral-convolutional neural network based method achieves success in integrating protein interaction network data and gene expression profiles to classify lung cancer.

...read moreread less

Abstract: Deep learning technologies are permeating every field from image and speech recognition to computational and systems biology. However, the application of convolutional neural networks (CCNs) to "omics" data poses some difficulties, such as the processing of complex networks structures as well as its integration with transcriptome data. Here, we propose a CNN approach that combines spectral clustering information processing to classify lung cancer. The developed spectral-convolutional neural network based method achieves success in integrating protein interaction network data and gene expression profiles to classify lung cancer. The performed computational experiments suggest that in terms of accuracy the predictive performance of our proposed method was better than those of other machine learning methods such as SVM or Random Forest. Moreover, the computational results also indicate that the underlying protein network structure assists to enhance the predictions. Data and CNN code can be downloaded from the link: https://sites.google.com/site/nacherlab/analysis.

...read moreread less

19 citations

Journal Article•DOI•

Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method.

[...]

Chun Fang¹, Yoshitaka Moriwaki², Aikui Tian¹, Caihong Li¹, Kentaro Shimizu² - Show less +1 more•Institutions (2)

Shandong University of Technology¹, University of Tokyo²

14 Mar 2019-Journal of Bioinformatics and Computational Biology

TL;DR: The cutting-edge machine learning method of artificial intelligence is adopted to develop a powerful model for improving MoRFs prediction, and the accuracy of the novel proposed method was comparable with that of state-of-the-art methods.

...read moreread less

Abstract: Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .

...read moreread less

14 citations

Journal Article•DOI•

Fast and memory efficient approach for mapping NGS reads to a reference genome.

[...]

Sanjeev Kumar, Suneeta Agarwal, Ranvijay

01 Apr 2019-Journal of Bioinformatics and Computational Biology

TL;DR: Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing and other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2.

...read moreread less

Abstract: New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core s...

...read moreread less

14 citations

Journal Article•DOI•

Protein function prediction from dynamic protein interaction network using gene expression data.

[...]

Sovan Saha¹, Abhimanyu Prasad¹, Piyali Chatterjee², Subhadip Basu³, Mita Nasipuri² - Show less +1 more•Institutions (3)

Dr. Sudhir Chandra Sur Degree Engineering College¹, Netaji Subhash Engineering College², Jadavpur University³

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: This work has used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins, and achieves an average precision, recall and F-Score of 0.61, significantly higher than the reported state-of-the-art methods.

...read moreread less

Abstract: Computational prediction of functional annotation of proteins is an uphill task. There is an ever increasing gap between functional characterization of protein sequences and deluge of protein sequences generated by large-scale sequencing projects. The dynamic nature of protein interactions is frequently observed which is mostly influenced by any new change of state or change in stimuli. Functional characterization of proteins can be inferred from their interactions with each other, which is dynamic in nature. In this work, we have used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins. During progression of a particular function, it has also been observed that not all the proteins are active at all time points. For unannotated active proteins, our proposed methodology explores the dynamic PPIN consisting of level-1 and level-2 neighboring proteins at different time points, filtered by Damerau-Levenshtein edit distance to estimate the similarity between two protein sequences and coefficient variation methods to assess the strength of an edge in a network. Finally, from the filtered dynamic PPIN, at each time point, functional annotations of the level-2 proteins are assigned to the unknown and unannotated active proteins through the level-1 neighbor, following a bottom-up strategy. Our proposed methodology achieves an average precision, recall and F-Score of 0.59, 0.76 and 0.61 respectively, which is significantly higher than the reported state-of-the-art methods.

...read moreread less

13 citations

Journal Article•DOI•

Metatox - Web application for generation of metabolic pathways and toxicity estimation.

[...]

A. V. Rudik, V. M. Bezhentsev, Alexander V. Dmitriev, Alexey Lagunin¹, Dmitry Filimonov, Vladimir Poroikov - Show less +2 more•Institutions (1)

Russian National Research Medical University¹

14 Mar 2019-Journal of Bioinformatics and Computational Biology

TL;DR: MetaTox as discussed by the authors is a web-application for the generation of xenobiotics metabolic pathways in the human organism, which is based on the fragments datasets, which describe transformations of substrates structures to a metabolites structure.

...read moreread less

Abstract: Xenobiotics biotransformation in humans is a process of the chemical modifications, which may lead to the formation of toxic metabolites. The prediction of such metabolites is very important for drug development and ecotoxicology studies. We created the web-application MetaTox ( http://way2drug.com/mg ) for the generation of xenobiotics metabolic pathways in the human organism. For each generated metabolite, the estimations of the acute toxicity (based on GUSAR software prediction), organ-specific carcinogenicity and adverse effects (based on PASS software prediction) are performed. Generation of metabolites by MetaTox is based on the fragments datasets, which describe transformations of substrates structures to a metabolites structure. We added three new classes of biotransformation reactions: Dehydrogenation, Glutathionation, and Hydrolysis, and now metabolite generation for 15 most frequent classes of xenobiotic's biotransformation reactions are available. MetaTox calculates the probability of formation of generated metabolite - it is the integrated assessment of the biotransformation reactions probabilities and their sites using the algorithm of PASS ( http://way2drug.com/passonline ). The prediction accuracy estimated by the leave-one-out cross-validation (LOO-CV) procedure calculated separately for the probabilities of biotransformation reactions and their sites is about 0.9 on the average for all reactions.

...read moreread less

13 citations

Journal Article•DOI•

Bioinformatic identification of differentially expressed genes associated with prognosis of locally advanced lymph node-positive prostate cancer.

[...]

Anna V. Kudryavtseva¹, Elena N. Lukyanova¹, Sergey L. Kharitonov¹, Kirill M. Nyushko, Alexey A. Krasheninnikov, Elena A. Pudova¹, Zulfiya G Guvatova¹, Boris Alekseev, Marina V. Kiseleva, Andrey Kaprin, Alexey A. Dmitriev¹, Anastasiya V. Snezhkina¹, George S. Krasnov¹ - Show less +9 more•Institutions (1)

Engelhardt Institute of Molecular Biology¹

01 Feb 2019-Journal of Bioinformatics and Computational Biology

TL;DR: Bioinformatic analysis of the RNA-seq data deposited in The Cancer Genome Atlas consortium database revealed at least six genes that could serve as a prognostic marker of locally advanced lymph node-positive PCa.

...read moreread less

Abstract: Prostate cancer (PCa) is one of the primary causes of cancer-related mortality in men worldwide. Patients with locally advanced PCa with metastases in regional lymph nodes are usually marked as a h...

...read moreread less

12 citations

Journal Article•DOI•

Using two-dimensional convolutional neural networks for identifying GTP binding sites in Rab proteins.

[...]

Nguyen Quoc Khanh Le¹, Nguyen Quoc Khanh Le², Quang-Thai Ho¹, Yu-Yen Ou¹•Institutions (2)

Yuan Ze University¹, Nanyang Technological University²

01 Feb 2019-Journal of Bioinformatics and Computational Biology

TL;DR: An effective model for predicting GTP binding sites in Rab proteins is provided and a basis for further research that can apply deep learning in bioinformatics, especially in nucleotide binding site prediction.

...read moreread less

Abstract: Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requiremen...

...read moreread less

12 citations

Journal Article•DOI•

Refined template selection and combination algorithm significantly improves template-based modeling accuracy

[...]

Ashish Runthala¹, Shibasish Chowdhury¹•Institutions (1)

Birla Institute of Technology and Science¹

01 Apr 2019-Journal of Bioinformatics and Computational Biology

TL;DR: It has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model.

...read moreread less

Abstract: In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.

...read moreread less

12 citations

Journal Article•DOI•

Measuring consistency among gene set analysis methods: A systematic study.

[...]

Farhad Maleki¹, Katie Ovens¹, Daniel J. Hogan¹, Elham Rezaei², Alan M. Rosenberg², Anthony Kusalik¹ - Show less +2 more•Institutions (2)

University of Saskatchewan¹, Royal University Hospital²

20 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: It is observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods, especially when expanding to the 100 most statistically significant reported gene sets.

...read moreread less

Abstract: Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

...read moreread less

11 citations

Journal Article•DOI•

Deep learning with evolutionary and genomic profiles for identifying cancer subtypes.

[...]

Chun-Yu Lin¹, Peiying Ruan², Ruiming Li¹, Jinn-Moon Yang³, Simon See², Jiangning Song⁴, Tatsuya Akutsu¹ - Show less +3 more•Institutions (4)

Kyoto University¹, Nvidia², National Chiao Tung University³, Monash University⁴

01 Jun 2019-Journal of Bioinformatics and Computational Biology

TL;DR: Comprehensive analysis of eight cancer types demonstrates that the evolutionary conservation-based models represent a valid and helpful approach for identifying cancer sub types and the core gene set offers distinguishable clues of cancer subtypes.

...read moreread less

Abstract: Cancer subtype identification is an unmet need in precision diagnosis. Recently, evolutionary conservation has been indicated to contain informative signatures for functional significance in cancers. However, the importance of evolutionary conservation in distinguishing cancer subtypes remains largely unclear. Here, we identified the evolutionarily conserved genes (i.e. core genes) and observed that they are primarily involved in cellular pathways relevant to cell growth and metabolisms. By using these core genes, we developed two novel strategies, namely a feature-based strategy (FES) and an image-based strategy (IMS) by integrating their evolutionary and genomic profiles with the deep learning algorithm. In comparison with the FES using the random set and the strategy using the PAM50 classifier, the core gene set-based FES achieved a higher accuracy for identifying breast cancer subtypes. The IMS and FES using the core gene set yielded better performances than the other strategies, in terms of classifying both breast cancer subtypes and multiple cancer types. Moreover, the IMS is reproducible even using different gene expression data (i.e. RNA-seq and microarray). Comprehensive analysis of eight cancer types demonstrates that our evolutionary conservation-based models represent a valid and helpful approach for identifying cancer subtypes and the core gene set offers distinguishable clues of cancer subtypes.

...read moreread less

10 citations

Journal Article•DOI•

Bioinformatics research at BGRS \ SB-2018.

[...]

Yuriy L. Orlov, Ralf Hofestädt¹, Tatiana V. Tatarinova²•Institutions (2)

Bielefeld University¹, University of La Verne²

01 Feb 2019-Journal of Bioinformatics and Computational Biology

Journal Article•DOI•

PyPredT6: A python-based prediction tool for identification of Type VI effector proteins.

[...]

Rishika Sen¹, Losiana Nayak¹, Rajat K. De¹•Institutions (1)

Indian Statistical Institute¹

09 Jul 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A Python-based standalone tool, called PyPredT6, is designed and performed in silico prediction of T6 effector proteins in Vibrio cholerae and Yersinia pestis to establish the applicability of PypredT6.

...read moreread less

Abstract: Prediction of effector proteins is of paramount importance due to their crucial role as first-line invaders while establishing a pathogen-host interaction, often leading to infection of the host. Prediction of T6 effector proteins is a new challenge since the discovery of T6 Secretion System and the unique nature of the particular secretion system. In this paper, we have first designed a Python-based standalone tool, called PyPredT6, to predict T6 effector proteins. A total of 873 unique features has been extracted from the peptide and nucleotide sequences of the experimentally verified effector proteins. Based on these features and using machine learning algorithms, we have performed in silico prediction of T6 effector proteins in Vibrio cholerae and Yersinia pestis to establish the applicability of PyPredT6. PyPredT6 is available at http://projectphd.droppages.com/PyPredT6.html .

...read moreread less

Journal Article•DOI•

MoRFPred_en: Sequence-based prediction of MoRFs using an ensemble learning strategy.

[...]

Chun Fang¹, Yoshitaka Moriwaki², Caihong Li¹, Kentaro Shimizu²•Institutions (2)

Shandong University of Technology¹, University of Tokyo²

01 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: This study proposes an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences, which combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multich channel) based model.

...read moreread less

Abstract: Molecular recognition features (MoRFs) usually act as "hub" sites in the interaction networks of intrinsically disordered proteins (IDPs). Because an increasing number of serious diseases have been found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we propose an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences. This approach combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multichannel) based model, two deep two-dimensional convolutional neural network (DCNN_2D) based models, and a support vector machine (SVM) based model. When compared with other methods on the same datasets, the MoRFPred_en approach produced better results than existing state-of-the-art MoRF prediction methods, achieving an AUC of 0.762 on the VALIDATION419 dataset, 0.795 on the TEST45 dataset, and 0.776 on the TEST49 dataset. Availability: http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php.

...read moreread less

Journal Article•DOI•

Integrating network topology, gene expression data and GO annotation information for protein complex prediction.

[...]

Wei Zhang¹, Jia Xu¹, Yuanyuan Li², Xiufen Zou³•Institutions (3)

East China Jiaotong University¹, Wuhan Institute of Technology², Wuhan University³

14 Mar 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A new method is proposed, Identification of Protein Complex based on Refined Protein Interaction Network (IPC-RPIN), which integrates the topology, gene expression profiles and GO functional annotation information to predict protein complexes from the reconstructed networks.

...read moreread less

Abstract: The prediction of protein complexes based on the protein interaction network is a fundamental task for the understanding of cellular life as well as the mechanisms underlying complex disease. A great number of methods have been developed to predict protein complexes based on protein-protein interaction (PPI) networks in recent years. However, because the high throughput data obtained from experimental biotechnology are incomplete, and usually contain a large number of spurious interactions, most of the network-based protein complex identification methods are sensitive to the reliability of the PPI network. In this paper, we propose a new method, Identification of Protein Complex based on Refined Protein Interaction Network (IPC-RPIN), which integrates the topology, gene expression profiles and GO functional annotation information to predict protein complexes from the reconstructed networks. To demonstrate the performance of the IPC-RPIN method, we evaluated the IPC-RPIN on three PPI networks of Saccharomycescerevisiae and compared it with four state-of-the-art methods. The simulation results show that the IPC-RPIN achieved a better result than the other methods on most of the measurements and is able to discover small protein complexes which have traditionally been neglected.

...read moreread less

Journal Article•DOI•

Numerical algorithm for morphogen synthesis region identification with indirect image-type measurement data.

[...]

Alexey Penenko¹, Ulyana Zubairova, Zhadyra Mukatova¹, S. A. Nikolaev•Institutions (1)

Novosibirsk State University¹

01 Feb 2019-Journal of Bioinformatics and Computational Biology

TL;DR: The sensitivity operator, composed of the independent adjoint problem solutions ensemble, allows transforming the inverse problem to the family of nonlinear ill-posed operator equations, and is applied to the morphogen synthesis region identification problem for the model of regulation of the renewing zone size in biological tissue.

...read moreread less

Abstract: Diffusion-reaction models are used to describe development processes in the framework of morphogen theory. The images of the concentration fields for the subset of the interacting morphogens are available. In order to interpret this data in terms of the model parameters, the inverse source problem is stated. The sensitivity operator, composed of the independent adjoint problem solutions ensemble, allows transforming the inverse problem to the family of nonlinear ill-posed operator equations. The equations are solved with the Newton-Kantorovich-type algorithm. The approach is applied to the morphogen synthesis region identification problem for the model of regulation of the renewing zone size in biological tissue.

...read moreread less

Journal Article•DOI•

Toxicity prediction of small drug molecules of androgen receptor using multilevel ensemble model.

[...]

Vishan Kumar Gupta¹, Prashant Singh Rana¹•Institutions (1)

Thapar University¹

20 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: In this study, efforts are created to develop a quantitative structure-activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development.

...read moreread less

Abstract: In this study, efforts are created to develop a quantitative structure–activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, ...

...read moreread less

Journal Article•DOI•

Predicting drug synergy for precision medicine using network biology and machine learning.

[...]

Ali Cuvitoglu¹, Joseph X. Zhou², Sui Huang², Zerrin Isik¹•Institutions (2)

Dokuz Eylül University¹, Institute for Systems Biology²

01 Apr 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A new classification model is presented to identify more effective anti-cancer drug pairs using in silico network biology approach based on the hypotheses that the drug synergy comes from the collective effects on the biological network.

...read moreread less

Abstract: Identification of effective drug combinations for patients is an expensive and time-consuming procedure, especially for in vitro experiments. To accelerate the synergistic drug discovery process, we present a new classification model to identify more effective anti-cancer drug pairs using in silico network biology approach. Based on the hypotheses that the drug synergy comes from the collective effects on the biological network, therefore, we developed six network biology features, including overlap and distance of drug perturbation network, that were derived by using individual drug-perturbed transcriptome profiles and the relevant biological network analysis. Using publicly available drug synergy databases and three machine-learning (ML) methods, the model was trained to discriminate the positive (synergistic) and negative (nonsynergistic) drug combinations. The proposed models were evaluated on the test cases to predict the most promising network biology feature, which is the network degree activity, i.e. the synergistic effect between drug pairs is mainly accounted by the complementary signaling pathways or molecular networks from two drugs.

...read moreread less

Journal Article•DOI•

GEREDB: Gene expression regulation database curated by mining abstracts from literature.

[...]

Tinghua Huang¹, Xiali Huang¹, Bomei Shi¹, Min Yao¹•Institutions (1)

Yangtze University¹

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: GEREDB is a publicly available, manually curated biological database that stores the data regarding relationships between expression and regulation of human genes and has the ability to analyze user-supplied gene expression data in a causal analysis oriented manner using the GEREA bioinformatics tool.

...read moreread less

Abstract: Understanding how genes are expressed and regulated in different biological processes are fundamental and challenging issues. Considerable progress has been made in studying the relationship betwee...

...read moreread less

Journal Article•DOI•

Inference of genetic networks using random forests: Assigning different weights for gene expression data.

[...]

Shuhei Kimura¹, Masato Tokuhisa¹, Mariko Okada²•Institutions (2)

Tottori University¹, Osaka University²

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A new inference method is developed by modifying the existing random-forest-based inference method to take advantage of its ability to analyze both time-series and static gene expression data, which can be similarly applied to many of the other existing inference methods.

...read moreread less

Abstract: In using gene expression levels for genetic network inference, we believe that two measurements that are similar to each other are less informative than two measurements that differ from each other. Given, for example, that gene expression levels measured at two adjacent time points in a time-series experiment are often similar to each other, we assume that each measurement in the time-series experiment will be less informative than each measurement in a steady-state experiment. Based on this idea, we propose a new inference method that relies heavily on informative gene expression data. Through numerical experiments, we prove that the quality of an inferred genetic network is slightly improved by heavily weighting informative gene expression data. In this study, we develop a new method by modifying the existing random-forest-based inference method to take advantage of its ability to analyze both time-series and static gene expression data. The idea we propose can be similarly applied to many of the other existing inference methods, as well.

...read moreread less

Journal Article•DOI•

A new LSTM-based gene expression prediction model: L-GEPM.

[...]

Huiqing Wang¹, Chun Li¹, Jianhui Zhang¹, Jingjing Wang¹, Yue Ma¹, Yuanyuan Lian¹ - Show less +2 more•Institutions (1)

Taiyuan University of Technology¹

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A gene expression prediction model, L-GEPM, based on long short-term memory (LSTM) neural networks, which captures the nonlinear features affecting gene expression and uses learned features to predict the target genes is proposed.

...read moreread less

Abstract: Molecular biology combined with in silico machine learning and deep learning has facilitated the broad application of gene expression profiles for gene function prediction, optimal crop breeding, disease-related gene discovery, and drug screening. Although the acquisition cost of genome-wide expression profiles has been steadily declining, the requirement generates a compendium of expression profiles using thousands of samples remains high. The Library of Integrated Network-Based Cellular Signatures (LINCS) program used approximately 1000 landmark genes to predict the expression of the remaining target genes by linear regression; however, this approach ignored the nonlinear features influencing gene expression relationships, limiting the accuracy of the experimental results. We herein propose a gene expression prediction model, L-GEPM, based on long short-term memory (LSTM) neural networks, which captures the nonlinear features affecting gene expression and uses learned features to predict the target genes. By comparing and analyzing experimental errors and fitting the effects of different prediction models, the LSTM neural network-based model, L-GEPM, can achieve low error and a superior fitting effect.

...read moreread less

Journal Article•DOI•

PVsiRNAPred: Prediction of plant exclusive virus-derived small interfering RNAs by deep convolutional neural network

[...]

Bifang He¹, Bifang He², Jian Huang², Heng Chen¹•Institutions (2)

Guizhou University¹, University of Electronic Science and Technology of China²

01 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: PVsiRNAPred is the first bioinformatics algorithm for predicting plant vsiRNAs based on vsiRNA sequence composition and has favorable generalization capabilities, which are hoped to allow efficient discovery of new vsi RNAs.

...read moreread less

Abstract: Plant exclusive virus-derived small interfering RNAs (vsiRNAs) regulate various biological processes, especially important in antiviral immunity. The identification of plant vsiRNAs is important fo...

...read moreread less

Journal Article•DOI•

Glycine-induced formation and druggability score prediction of protein surface pockets.

[...]

Pietro Bongini¹, Neri Niccolai², Monica Bianchini²•Institutions (2)

University of Florence¹, University of Siena²

11 Jun 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A new idea is proposed for the realization of mutated proteins, on the surface of which more spacious transient pockets are formed and, therefore, are more suitable for hosting drugs.

...read moreread less

Abstract: Nowadays, it is well established that most of the human diseases which are not related to pathogen infections have their origin from DNA disorders. Thus, DNA mutations, waiting for the availability...

...read moreread less

Journal Article•DOI•

TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network.

[...]

Jamshid Pirgazi¹, Alireza Khanteymoori¹, Maryam Jalilkhani¹•Institutions (1)

University of Zanjan¹

01 Jun 2019-Journal of Bioinformatics and Computational Biology

TL;DR: The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.

...read moreread less

Abstract: In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to i...

...read moreread less

Journal Article•DOI•

From molecular energy landscapes to equilibrium dynamics via landscape analysis and markov state models.

[...]

Kazi Lutful Kabir¹, Nasrin Akhter¹, Amarda Shehu¹•Institutions (1)

George Mason University¹

01 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: The hypothesis that basins, directly tied to stable and semi-stable states, lead to better models of dynamics lead to MSMs of better quality and thus can be useful to further advance this widely-used technology for summarization of molecular equilibrium dynamics is evaluated.

...read moreread less

Abstract: Molecular dynamics (MD) simulation software allows probing the equilibrium structural dynamics of a molecule of interest, revealing how a molecule navigates its structure space one structure at a time. To obtain a broader view of dynamics, typically one needs to launch many such simulations, obtaining many trajectories. A summarization of the equilibrium dynamics requires integrating the information in the various trajectories, and Markov State Models (MSM) are increasingly being used for this task. At its core, the task involves organizing the structures accessed in simulation into structural states, and then constructing a transition probability matrix revealing the transitions between states. While now considered a mature technology and widely used to summarize equilibrium dynamics, the underlying computational process in the construction of an MSM ignores energetics even though the transition of a molecule between two nearby structures in an MD trajectory is governed by the corresponding energies. In this paper, we connect theory with simulation and analysis of equilibrium dynamics. A molecule navigates the energy landscape underlying the structure space. The structural states that are identified via off-the-shelf clustering algorithms need to be connected to thermodynamically-stable and semi-stable (macro)states among which transitions can then be quantified. Leveraging recent developments in the analysis of energy landscapes that identify basins in the landscape, we evaluate the hypothesis that basins, directly tied to stable and semi-stable states, lead to better models of dynamics. Our analysis indicates that basins lead to MSMs of better quality and thus can be useful to further advance this widely-used technology for summarization of molecular equilibrium dynamics.

...read moreread less

Journal Article•DOI•

IntaRNAhelix-composing RNA-RNA interactions from stable inter-molecular helices boosts bacterial sRNA target prediction.

[...]

Rick Gelhausen¹, Sebastian Will², Ivo L. Hofacker², Rolf Backofen¹, Martin Raden¹ - Show less +1 more•Institutions (2)

University of Freiburg¹, University of Vienna²

20 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: IntaRNAhelix, a dynamic programming algorithm that length-restricts the runs of consecutive inter-molecular base pairs (perfect canonical stackings), which is hypothesize to implicitly model the steric and kinetic effects of interaction prediction models compared to the current state-of-the-art approach, is implemented.

...read moreread less

Abstract: Efficient computational tools for the identification of putative target RNAs regulated by prokaryotic sRNAs rely on thermodynamic models of RNA secondary structures. While they typically predict RN...

...read moreread less

Journal Article•DOI•

Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods.

[...]

Fang Yuan¹, Gan Liu², Xiwen Yang², Shunfang Wang², Xueren Wang² - Show less +1 more•Institutions (2)

Kunming Medical University¹, Yunnan University²

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: Using this method to predict the categories of the 6 major types of enzymes effectively improves its prediction accuracy to 94.54%, indicating that this method has general applicability to other protein problems.

...read moreread less

Abstract: Oxidoreductase is an enzyme that widely exists in organisms. It plays an important role in cellular energy metabolism and biotransformation processes. Oxidoreductases have many subclasses with different functions, creating an important classification task in bioinformatics. In this paper, a dataset of 2640 oxidoreductase sequences was used to perform an analysis and comparison. The idea of dipeptides was introduced to process the Position Specific Score Matrix (PSSM), since each dipeptide consists of two amino acids and each column of PSSM corresponds to the information of one amino acid. Two kinds of dipeptide scores were proposed, the Standardization Normal Distribution PSSM (SND-PSSM) and the Correlation Coefficient PSSM (CC-PSSM). Recursive Feature Elimination (RFE) is used to extract features from the SND-PSSM and CC-PSSM, and the two sets of extracted features are combined to form a new feature matrix, the RFE-SND-CC-PSSM. The results show that, with the proposed method and a kernel-based nonlinear SVM classifier, the accuracy can reach 95.56% by the Jackknife test. Our method greatly improves the accuracy of oxidoreductase subclass prediction. Using this method to predict the categories of the 6 major types of enzymes effectively improves its prediction accuracy to 94.54%, indicating that this method has general applicability to other protein problems. The results show that our method is effective and universally applicable, and might be complementary to the existing methods.

...read moreread less

Journal Article•DOI•

Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data.

[...]

Yaxing Zhao¹, Andrew C.-H. Sue¹, Wilson Wen Bin Goh²•Institutions (2)

Tianjin University¹, Nanyang Technological University²

05 May 2019-Journal of Bioinformatics and Computational Biology

TL;DR: While FCS is a powerful approach, blind reliance on its non-objective p -value is ill-advised and it is found that FCS works best with big complexes.

...read moreread less

Abstract: Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from n...

...read moreread less

Journal Article•DOI•

AUSPP: A universal short-read pre-processing package.

[...]

Lei Gao¹, Cong Wu¹, Lin Liu¹•Institutions (1)

Shenzhen University¹

01 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: This pipeline encompasses quality control, adaptor trimming, collapsing of reads, structural RNA removal, length selection, read mapping, and normalized wiggle file creation and is therefore a powerful tool for the steps before meta-analysis.

...read moreread less

Abstract: There are many short-read aligners that can map short reads to a reference genome/sequence, and most of them can directly accept a FASTQ file as the input query file. However, the raw data usually need to be pre-processed. Few software programs specialize in pre-processing raw data generated by a variety of next-generation sequencing (NGS) technologies. Here, we present AUSPP, a Perl script-based pipeline for pre-processing and automatic mapping of NGS short reads. This pipeline encompasses quality control, adaptor trimming, collapsing of reads, structural RNA removal, length selection, read mapping, and normalized wiggle file creation. It facilitates the processing from raw data to genome mapping and is therefore a powerful tool for the steps before meta-analysis. Most importantly, since AUSPP has default processing pipeline settings for many types of NGS data, most of the time, users will simply need to provide the raw data and genome. AUSPP is portable and easy to install, and the source codes are freely available at https://github.com/highlei/AUSPP.

...read moreread less

Journal Article•DOI•

Graph kernels combined with the neural network on protein classification.

[...]

Jiang Qiang-rong¹, Qiu Guang¹•Institutions (1)

Beijing University of Technology¹

20 Dec 2019-Journal of Bioinformatics and Computational Biology

TL;DR: A novel graph kernel named vertex-edge similarity kernel (VES kernel) based on mixed matrix is proposed, the innovation point of which is taking the adjacency matrix of the graph as the sample vector of each vertex and calculating kernel values by finding the most similar vertex pair of two graphs.

...read moreread less

Abstract: At present, most of the researches on protein classification are based on graph kernels. The essence of graph kernels is to extract the substructure and use the similarity of substructures as the kernel values. In this paper, we propose a novel graph kernel named vertex-edge similarity kernel (VES kernel) based on mixed matrix, the innovation point of which is taking the adjacency matrix of the graph as the sample vector of each vertex and calculating kernel values by finding the most similar vertex pair of two graphs. In addition, we combine the novel kernel with the neural network and the experimental results show that the combination is better than the existing advanced methods.

...read moreread less

Journal Article•DOI•

A new algorithm for DNA motif discovery using multiple sample sequence sets.

[...]

Qiang Yu¹, Xiang Zhao¹, Hongwei Huo¹•Institutions (1)

Xidian University¹

16 Oct 2019-Journal of Bioinformatics and Computational Biology

TL;DR: This paper proposes a new DNA motif discovery algorithm that has better time performance for large datasets and better accuracy of identifying infrequent motifs than the compared algorithms, and designs a new initial motif generation method with the utilization of the entire dataset.

...read moreread less

Abstract: DNA motif discovery plays an important role in understanding the mechanisms of gene regulation. Most existing motif discovery algorithms can identify motifs in an efficient and effective manner when dealing with small datasets. However, large datasets generated by high-throughput sequencing technologies pose a huge challenge: it is too time-consuming to process the entire dataset, but if only a small sample sequence set is processed, it is difficult to identify infrequent motifs. In this paper, we propose a new DNA motif discovery algorithm: first divide the input dataset into multiple sample sequence sets, then refine initial motifs of each sample sequence set with the expectation maximization method, and finally combine all the results from each sample sequence set. Besides, we design a new initial motif generation method with the utilization of the entire dataset, which helps to identify infrequent motifs. The experimental results on the simulated data show that the proposed algorithm has better time performance for large datasets and better accuracy of identifying infrequent motifs than the compared algorithms. Also, we have verified the validity of the proposed algorithm on the real data.

...read moreread less