scispace - formally typeset
Search or ask a question

Showing papers in "IEEE/ACM Transactions on Computational Biology and Bioinformatics in 2022"


Journal ArticleDOI
TL;DR: Matchmaker as discussed by the authors predicts drug synergy scores using drug chemical structure information and gene expression profiles of cell lines in a deep learning framework and utilizes the largest known drug combination dataset to date, DrugComb.
Abstract: Drug combination therapies have been a viable strategy for the treatment of complex diseases such as cancer due to increased efficacy and reduced side effects. However, experimentally validating all possible combinations for synergistic interaction even with high-throughout screens is intractable due to vast combinatorial search space. Computational techniques can reduce the number of combinations to be evaluated experimentally by prioritizing promising candidates. We present MatchMaker that predicts drug synergy scores using drug chemical structure information and gene expression profiles of cell lines in a deep learning framework. For the first time, our model utilizes the largest known drug combination dataset to date, DrugComb. We compare the performance of MatchMaker with the state-of-the-art models and observe up to ∼ 15% correlation and ∼ 33% mean squared error (MSE) improvements over the next best method. We investigate the cell types and drug pairs that are relatively harder to predict and present novel candidate pairs. MatchMaker is built and available at https://github.com/tastanlab/matchmaker.

29 citations


Journal ArticleDOI
TL;DR: In this paper , a Damping Multi-Verse Optimizer (DMVO) algorithm is proposed to construct a set of DNA coding, which is used as the non-payload.
Abstract: At present, huge amounts of data are being produced every second, a situation that will gradually overwhelm current storage technology. DNA is a storage medium that features high storage density and long-term stability and is now considered to be a feasible storage solution. Errors are easily made during the sequencing and synthesis of DNA, however. In order to reduce the error rate, novel uncorrelated address constrain are reported, and a Damping Multi-Verse Optimizer (DMVO)algorithm is proposed to construct a set of DNA coding, which is used as the non-payload. The DMVO algorithm exchanges objects through black/white holes in order to achieve a stable state and adds damping factors as disturbances. Compared with previous work, the coding set obtained by the DMVO algorithm is larger in size and of higher quality. The results of this study reveal that the size of the DNA storage coding set obtained by the DMVO algorithm increased by 4-16 percent, and the variance of the melting temperature decreased by 3-18 percent.

26 citations


Journal ArticleDOI
TL;DR: GraphDRP as mentioned in this paper represented drugs in molecular graphs directly capturing the bonds among atoms, meanwhile cell lines were depicted as binary vectors of genomic aberrations, and the response value of each drug-cell line pair was predicted by a fully-connected neural network.
Abstract: Background: Drug response prediction is an important problem in computational personalized medicine. Many machine-learning-based methods, especially deep learning-based ones, have been proposed for this task. However, these methods often represent the drugs as strings, which are not a natural way to depict molecules. Also, interpretation (e.g., what are the mutation or copy number aberration contributing to the drug response) has not been considered thoroughly. Methods: In this study, we propose a novel method, GraphDRP, based on graph convolutional network for the problem. In GraphDRP, drugs were represented in molecular graphs directly capturing the bonds among atoms, meanwhile cell lines were depicted as binary vectors of genomic aberrations. Representative features of drugs and cell lines were learned by convolution layers, then combined to represent for each drug-cell line pair. Finally, the response value of each drug-cell line pair was predicted by a fully-connected neural network. Four variants of graph convolutional networks were used for learning the features of drugs. Results: We found that GraphDRP outperforms tCNNS in all performance measures for all experiments. Also, through saliency maps of the resulting GraphDRP models, we discovered the contribution of the genomic aberrations to the responses. Conclusion: Representing drugs as graphs can improve the performance of drug response prediction. Availability of data and materials: Data and source code can be downloaded athttps://github.com/hauldhut/GraphDRP.

25 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel algorithm (aka TANMF) to detect dynamic modules in cancer temporal attributed networks, which integrates the temporal networks and gene attributes.
Abstract: Tracking the dynamic modules (modules change over time) during cancer progression is essential for studying cancer pathogenesis, diagnosis, and therapy. However, current algorithms only focus on detecting dynamic modules from temporal cancer networks without integrating the heterogeneous genomic data, thereby resulting in undesirable performance. To attack this issue, we propose a novel algorithm (aka TANMF) to detect dynamic modules in cancer temporal attributed networks, which integrates the temporal networks and gene attributes. To obtain the dynamic modules, the temporality and gene attributed are incorporated into an overall objective function, which transforms the dynamic module detection into an optimization problem. TANMF jointly decomposes the snapshots at two subsequent time steps to obtain the latent features of dynamic modules, where the attributes are fused via regulations. Furthermore, the L1 constraint is imposed to improve the robustness. Experimental results demonstrate that TANMF is more accurate than state-of-the-art methods in terms of accuracy. By applying TANMF to breast cancer data, the obtained dynamic modules are more enriched by the known pathways and associated with patients' survival time. The proposed model and algorithm provide an effective way for the integrative analysis of heterogeneous omics.

24 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed a deep learning framework with dual-net neural architecture to find potential LPIs (LPI-DLDN), which integrated various biological features, designed a novel deep learning-based LPI identification framework, and selected the optimal LPI feature subset based on feature importance ranking.
Abstract: The identification of lncRNA-protein interactions (LPIs) is important to understand the biological functions and molecular mechanisms of lncRNAs. However, most computational models are evaluated on a unique dataset, thereby resulting in prediction bias. Furthermore, previous models have not uncovered potential proteins (or lncRNAs) interacting with a new lncRNA (or protein). Finally, the performance of these models can be improved. In this study, we develop a Deep Learning framework with Dual-net Neural architecture to find potential LPIs (LPI-DLDN). First, five LPI datasets are collected. Second, the features of lncRNAs and proteins are extracted by Pyfeat and BioTriangle, respectively. Third, these features are concatenated as a vector after dimension reduction. Finally, a deep learning model with dual-net neural architecture is designed to classify lncRNA-protein pairs. LPI-DLDN is compared with six state-of-the-art LPI prediction methods (LPI-XGBoost, LPI-HeteSim, LPI-NRLMF, PLIPCOM, LPI-CNNCP, and Capsule-LPI) under four cross validations. The results demonstrate the powerful LPI classification performance of LPI-DLDN. Case study analyses show that there may be interactions between RP11-439E19.10 and Q15717, and between RP11-196G18.22 and Q9NUL5. The novelty of LPI-DLDN remains, integrating various biological features, designing a novel deep learning-based LPI identification framework, and selecting the optimal LPI feature subset based on feature importance ranking.

23 citations


Journal ArticleDOI
TL;DR: This study combines multi-kernel learning and transfer learning, and proposes a feature-level multi-modality fusion model with insufficient training samples that performs better on most scenarios.
Abstract: With the development of sensors, more and more multimodal data are accumulated, especially in biomedical and bioinformatics fields. Therefore, multimodal data analysis becomes very important and urgent. In this study, we combine multi-kernel learning and transfer learning, and propose a feature-level multi-modality fusion model with insufficient training samples. To be specific, we firstly extend kernel Ridge regression to its multi-kernel version under the lp-norm constraint to explore complementary patterns contained in multimodal data. Then we use marginal probability distribution adaption to minimize the distribution differences between the source domain and the target domain to solve the problem of insufficient training samples. Based on epilepsy EEG data provided by the University of Bonn, we construct 12 multi-modality & transfer scenarios to evaluate our model. Experimental results show that compared with baselines, our model performs better on most scenarios.

19 citations


Journal ArticleDOI
TL;DR: In this paper , a feature-level multi-modality fusion model with insufficient training samples was proposed to explore complementary patterns contained in multimodal data, and the marginal probability distribution adaption was used to minimize the distribution differences between the source domain and the target domain.
Abstract: With the development of sensors, more and more multimodal data are accumulated, especially in biomedical and bioinformatics fields. Therefore, multimodal data analysis becomes very important and urgent. In this study, we combine multi-kernel learning and transfer learning, and propose a feature-level multi-modality fusion model with insufficient training samples. To be specific, we firstly extend kernel Ridge regression to its multi-kernel version under the lp-norm constraint to explore complementary patterns contained in multimodal data. Then we use marginal probability distribution adaption to minimize the distribution differences between the source domain and the target domain to solve the problem of insufficient training samples. Based on epilepsy EEG data provided by the University of Bonn, we construct 12 multi-modality & transfer scenarios to evaluate our model. Experimental results show that compared with baselines, our model performs better on most scenarios.

19 citations


Journal ArticleDOI
TL;DR: In this paper , a graph-in-graph neural network with an attention mechanism is proposed to address the changes in target representation because of the binding effects, where a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex.
Abstract: Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA)problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.

18 citations


Journal ArticleDOI
TL;DR: A comprehensive review of state-of-the-art computational methods falling into three categories: literature-based extraction methods, machine learning-based prediction methods and pharmacovigilance-based data mining methods is presented in this article .
Abstract: The detection of drug-drug interactions (DDIs) is a crucial task for drug safety surveillance, which provides effective and safe co-prescriptions of multiple drugs. Since laboratory researches are often complicated, costly and time-consuming, it's urgent to develop computational approaches to detect drug-drug interactions. In this paper, we conduct a comprehensive review of state-of-the-art computational methods falling into three categories: literature-based extraction methods, machine learning-based prediction methods and pharmacovigilance-based data mining methods. Literature-based extraction methods detect DDIs from published literature using natural language processing techniques; machine learning-based prediction methods build prediction models based on the known DDIs in databases and predict novel ones; pharmacovigilance-based data mining methods usually apply statistical techniques on various electronic data to detect drug-drug interaction signals. We first present the taxonomy of drug-drug interaction detection methods and provide the outlines of three categories of methods. Afterwards, we respectively introduce research backgrounds and data sources of three categories, and illustrate their representative approaches as well as evaluation metrics. Finally, we discuss the current challenges of existing methods and highlight potential opportunities for future directions.

18 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper used an automatic encoder to denoise multiple lncRNA feature information and multiple disease feature information, respectively, and then the matrix decomposition algorithm was employed to predict the potential lnc RNA-disease associations.
Abstract: It has been proved that long noncoding RNA (lncRNA) plays critical roles in many human diseases. Therefore, inferring associations between lncRNAs and diseases can contribute to disease diagnosis, prognosis and treatment. To overcome the limitation of traditional experimental methods such as expensive and time-consuming, several computational methods have been proposed to predict lncRNA-disease associations by fusing different biological data. However, the prediction performance of lncRNA-disease associations identification needs to be improved. In this study, we propose a computational model (named LDICDL) to identify lncRNA-disease associations based on collaborative deep learning. It uses an automatic encoder to denoise multiple lncRNA feature information and multiple disease feature information, respectively. Then, the matrix decomposition algorithm is employed to predict the potential lncRNA-disease associations. In addition, to overcome the limitation of matrix decomposition, the hybrid model is developed to predict associations between new lncRNA (or disease) and diseases (or lncRNA). The ten-fold cross validation and de novo test are applied to evaluate the performance of method. The experimental results show LDICDL outperforms than other state-of-the-art methods in prediction performance.

18 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network.
Abstract: Cancer progression is dynamic, and tracking dynamic modules is promising for cancer diagnosis and therapy. Accumulated genomic data provide us an opportunity to investigate the underlying mechanisms of cancers. However, as far as we know, no algorithm has been designed for dynamic modules by integrating heterogeneous omics data. To address this issue, we propose an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network. To remove the heterogeneity of genomic data, we divide the samples of expression profiles into groups to construct gene co-expression networks. To characterize the dynamics of modules, the temporal smoothness framework is adopted, in which the gene co-expression network at the previous stage and protein interaction network are incorporated into the objective function of DrNMF via regularization. The experimental results demonstrate that DrNMF is superior to state-of-the-art methods in terms of accuracy. For breast cancer data, the obtained dynamic modules are more enriched by the known pathways, and can be used to predict the stages of cancers and survival time of patients. The proposed model and algorithm provide an effective integrative analysis of heterogeneous genomic data for cancer progression.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a graph convolutional autoencoder and generative adversarial network (GAN)-based method, GANDTI, to predict novel drug-target interactions.
Abstract: The computational prediction of novel drug-target interactions (DTIs) may effectively speed up the process of drug repositioning and reduce its costs. Most previous methods integrated multiple kinds of connections about drugs and targets by constructing shallow prediction models. These methods failed to deeply learn the low-dimension feature vectors for drugs and targets and ignored the distribution of these feature vectors. We proposed a graph convolutional autoencoder and generative adversarial network (GAN)-based method, GANDTI, to predict DTIs. We constructed a drug-target heterogeneous network to integrate various connections related to drugs and targets, i.e., the similarities and interactions between drugs or between targets and the interactions between drugs and targets. A graph convolutional autoencoder was established to learn the network embeddings of the drug and target nodes in a low-dimensional feature space, and the autoencoder deeply integrated different kinds of connections within the network. A GAN was introduced to regularize the feature vectors of nodes into a Gaussian distribution. Severe class imbalance exists between known and unknown DTIs. Thus, we constructed a classifier based on an ensemble learning model, LightGBM, to estimate the interaction propensities of drugs and targets. This classifier completely exploited all unknown DTIs and counteracted the negative effect of class imbalance. The experimental results indicated that GANDTI outperforms several state-of-the-art methods for DTI prediction. Additionally, case studies of five drugs demonstrated the ability of GANDTI to discover the potential targets for drugs.

Journal ArticleDOI
TL;DR: In this article , a new version of the Marine Predator algorithm (called QRSS-MPA) is proposed to increase the lower bound of the coding set while satisfying the specific combination of constraints.
Abstract: With the advent of the era of massive data, the increase of storage demand has far exceeded current storage capacity. DNA molecules provide a reliable solution for big data storage by virtue of their large capacity, high density, and long-term stability. To reduce errors in storing procedures, constructing a sufficient set of constraint encoding is critical for achieving DNA storage. A new version of the Marine Predator algorithm (called QRSS-MPA) is proposed in this paper to increase the lower bound of the coding set while satisfying the specific combination of constraints. In order to demonstrate the effectiveness of the improvement, the classical CEC-05 test function is used to test and compare the mean, variance, scalability, and significance. In terms of storage, the lower bound of construction is compared with previous works, and the result is found to be significantly improved. In order to prevent the emergence of a secondary structure that leads to sequencing failure, we give a more stringent lower bound for the constraint coding set, which is of great significance for reducing the error rate of DNA storage amidst its rapid development.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end deep learning method (called MHSADTI) to predict drug-target interactions (DTIs) based on the graph attention network and multi-head self-attention mechanism.
Abstract: Identifying drug-target interactions (DTIs) is an important step in the process of new drug discovery and drug repositioning. Accurate predictions for DTIs can improve the efficiency in the drug discovery and development. Although rapid advances in deep learning technologies have generated various computational methods, it is still appealing to further investigate how to design efficient networks for predicting DTIs. In this study, we propose an end-to-end deep learning method (called MHSADTI) to predict DTIs based on the graph attention network and multi-head self-attention mechanism. First, the characteristics of drugs and proteins are extracted by the graph attention network and multi-head self-attention mechanism, respectively. Then, the attention scores are used to consider which amino acid subsequence in a protein is more important for the drug to predict its interactions. Finally, we predict DTIs by a fully connected layer after obtaining the feature vectors of drugs and proteins. MHSADTI takes advantage of self-attention mechanism for obtaining long-dependent contextual relationship in amino acid sequences and predicting DTI interpretability. More effective molecular characteristics are also obtained by the attention mechanism in graph attention networks. Multiple cross validation experiments are adopted to assess the performance of our MHSADTI. The experiments on four datasets, human, C.elegans, DUD-E and DrugBank show our method outperforms the state-of-the-art methods in terms of AUC, Precision, Recall, AUPR and F1-score. In addition, the case studies further demonstrate that our method can provide effective visualizations to interpret the prediction results from biological insights.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel method to predict drug-drug interaction based on the integrated similarity and semi-supervised learning (DDI-IS-SL), which integrates the drug chemical, biological and phenotype data to calculate the feature similarity of drugs with the cosine similarity method.
Abstract: A drug-drug interaction (DDI) is defined as an association between two drugs where the pharmacological effects of a drug are influenced by another drug. Positive DDIs can usually improve the therapeutic effects of patients, but negative DDIs cause the major cause of adverse drug reactions and even result in the drug withdrawal from the market and the patient death. Therefore, identifying DDIs has become a key component of the drug development and disease treatment. In this study, we propose a novel method to predict DDIs based on the integrated similarity and semi-supervised learning (DDI-IS-SL). DDI-IS-SL integrates the drug chemical, biological and phenotype data to calculate the feature similarity of drugs with the cosine similarity method. The Gaussian Interaction Profile kernel similarity of drugs is also calculated based on known DDIs. A semi-supervised learning method (the Regularized Least Squares classifier) is used to calculate the interaction possibility scores of drug-drug pairs. In terms of the 5-fold cross validation, 10-fold cross validation and de novo drug validation, DDI-IS-SL can achieve the better prediction performance than other comparative methods. In addition, the average computation time of DDI-IS-SL is shorter than that of other comparative methods. Finally, case studies further demonstrate the performance of DDI-IS-SL in practical applications.

Journal ArticleDOI
TL;DR: DeepAtom as discussed by the authors proposes a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity, which can automatically extract binding related atomic interaction patterns from the voxelized complex structure.
Abstract: Computational drug design relies on the calculation of binding strength between two biological counterparts especially a chemical compound, i.e., a ligand, and a protein. Predicting the affinity of protein-ligand binding with reasonable accuracy is crucial for drug discovery, and enables the optimization of compounds to achieve better interaction with their target protein. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. We carried out validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set. We demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other baseline scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 $pK$pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed an unsupervised LGE-CMR segmentation algorithm by using multiple style transfer networks for data augmentation using balanced-Steady State Free Precession (bSSFP)-CMR images.
Abstract: Accurate segmentation of ventricle and myocardium from the late gadolinium enhancement (LGE) cardiac magnetic resonance (CMR) is an important tool for myocardial infarction (MI) analysis. However, the complex enhancement pattern of LGE-CMR and the lack of labeled samples make its automatic segmentation difficult to be implemented. In this paper, we propose an unsupervised LGE-CMR segmentation algorithm by using multiple style transfer networks for data augmentation. It adopts two different style transfer networks to perform style transfer of the easily available annotated balanced-Steady State Free Precession (bSSFP)-CMR images. Then, multiple sets of synthetic LGE-CMR images are generated by the style transfer networks and used as the training data for the improved U-Net. The entire implementation of the algorithm does not require the labeled LGE-CMR. Validation experiments demonstrate the effectiveness and advantages of the proposed algorithm.

Journal ArticleDOI
TL;DR: An unsupervised LGE-CMR segmentation algorithm is proposed by using multiple style transfer networks for data augmentation by performing style transfer of the easily available annotated balanced-Steady State Free Precession -CMR images.
Abstract: Accurate segmentation of ventricle and myocardium from the late gadolinium enhancement (LGE) cardiac magnetic resonance (CMR) is an important tool for myocardial infarction (MI) analysis. However, the complex enhancement pattern of LGE-CMR and the lack of labeled samples make its automatic segmentation difficult to be implemented. In this paper, we propose an unsupervised LGE-CMR segmentation algorithm by using multiple style transfer networks for data augmentation. It adopts two different style transfer networks to perform style transfer of the easily available annotated balanced-Steady State Free Precession (bSSFP)-CMR images. Then, multiple sets of synthetic LGE-CMR images are generated by the style transfer networks and used as the training data for the improved U-Net. The entire implementation of the algorithm does not require the labeled LGE-CMR. Validation experiments demonstrate the effectiveness and advantages of the proposed algorithm.

Journal ArticleDOI
TL;DR: In this article , the semantic similarity between the attribute information of proteins is calculated and integrated into a well-established fuzzy clustering model together with the network topology, and a momentum method is adopted to accelerate the clustering procedure.
Abstract: Protein complexes are of great significance to provide valuable insights into the mechanisms of biological processes of proteins. A variety of computational algorithms have thus been proposed to identify protein complexes in a protein-protein interaction network. However, few of them can perform their tasks by taking into account both network topology and protein attribute information in a unified fuzzy-based clustering framework. Since proteins in the same complex are similar in terms of their attribute information and the consideration of fuzzy clustering can also make it possible for us to identify overlapping complexes, we target to propose such a novel fuzzy-based clustering framework, namely FCAN-PCI, for an improved identification accuracy. To do so, the semantic similarity between the attribute information of proteins is calculated and we then integrate it into a well-established fuzzy clustering model together with the network topology. After that, a momentum method is adopted to accelerate the clustering procedure. FCAN-PCI finally applies a heuristical search strategy to identify overlapping protein complexes. A series of extensive experiments have been conducted to evaluate the performance of FCAN-PCI by comparing it with state-of-the-art identification algorithms and the results demonstrate the promising performance of FCAN-PCI.

Journal ArticleDOI
TL;DR: In this article , a semi-supervised open set domain adaptation (SODA) method is proposed to align the data distributions across different domains in the general domain space and also in the common subspace of source and target data.
Abstract: Due to the shortage of COVID-19 viral testing kits, radiology imaging is used to complement the screening process. Deep learning based methods are promising in automatically detecting COVID-19 disease in chest x-ray images. Most of these works first train a Convolutional Neural Network (CNN) on an existing large-scale chest x-ray image dataset and then fine-tune the model on the newly collected COVID-19 chest x-ray dataset, often at a much smaller scale. However, simple fine-tuning may lead to poor performance for the CNN model due to two issues, first the large domain shift present in chest x-ray datasets and second the relatively small scale of the COVID-19 chest x-ray dataset. In an attempt to address these two important issues, we formulate the problem of COVID-19 chest x-ray image classification in a semi-supervised open set domain adaptation setting and propose a novel domain adaptation method, S emi-supervised O pen set D omain A dversarial network (SODA). SODA is designed to align the data distributions across different domains in the general domain space and also in the common subspace of source and target data. In our experiments, SODA achieves a leading classification performance compared with recent state-of-the-art models in separating COVID-19 with common pneumonia. We also present initial results showing that SODA can produce better pathology localizations in the chest x-rays.

Journal ArticleDOI
TL;DR: In this article , a new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DNA-binding proteins, RBPs and DRBPs.
Abstract: DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.

Journal ArticleDOI
TL;DR: DSG-DTI as mentioned in this paper uses a heterogeneous graph autoencoder and heterogeneous attention network-based matrix completion to predict drug-target interactions and can generalize to newly registered drugs and targets with slight performance degradation.
Abstract: Drug target interaction prediction is a crucial stage in drug discovery. However, brute-force search over a compound database is financially infeasible. We have witnessed the increasing measured drug-target interactions records in recent years, and the rich drug/protein-related information allows the usage of graph machine learning. Despite the advances in deep learning-enabled drug-target interaction, there are still open challenges: (1) rich and complex relationship between drugs and proteins can be explored; (2) the intermediate node is not calibrated in the heterogeneous graph. To tackle with above issues, this paper proposed a framework named DSG-DTI. Specifically, DSG-DTI has the heterogeneous graph autoencoder and heterogeneous attention network-based Matrix Completion. Our framework ensures that the known types of nodes (e.g., drug, target, side effects, diseases) are precisely embedded into high-dimensional space with our pretraining skills. Also, the attention-based heterogeneous graph-based matrix completion achieves highly competitive results via effective long-range dependencies extraction. We verify our model on two public benchmarks. The result of two publicly available benchmark application programs show that the proposed scheme effectively predicts drug-target interactions and can generalize to newly registered drugs and targets with slight performance degradation, outperforming the best accuracy compared with other baselines.

Journal ArticleDOI
TL;DR: This work formalizes the flow decomposition with subpath constraints problem, gives the first algorithms for it, and studies its usefulness for recovering ground truth decompositions.
Abstract: Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPT algorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.

Journal ArticleDOI
TL;DR: PANDA as mentioned in this paper uses graph auto-encoder to learn the representation of the nodes' features and edges, and then applies a Neural Network to predict potentially interesting novel edges.
Abstract: LncRNAs are intermediate molecules that participate in the most diverse biological processes in humans, such as gene expression control and X-chromosome inactivation. Numerous researches have associated lncRNAs with a wide range of diseases, such as breast cancer, leukemia, and many other conditions. In this work, we propose a graph-based method named PANDA. This method treats the prediction of new associations between lncRNAs and diseases as a link prediction problem in a graph. We start by building a heterogeneous graph that contains the known associations between lncRNAs and diseases and additional information such as gene expression levels and symptoms of diseases. We then use a Graph Auto-encoder to learn the representation of the nodes’ features and edges, finally applying a Neural Network to predict potentially interesting novel edges. The experimental results indicate that PANDA achieved a 0.976 AUC-ROC, surpassing state-of-the-art methods for the same problem, showing that PANDA could be a promising approach to generate embeddings to predict potentially novel lncRNA-disease associations.

Journal ArticleDOI
TL;DR: In this article , the authors proposed an ensemble deep RVFL+ with LUPI framework (edRVFL+), which optimizes a single network and generates an ensemble via optimization at different levels of random projections of the data.
Abstract: In this paper, deep RVFL and its ensembles are enabled to incorporate privileged information, however, the standard RVFL model and its deep models are unable to use privileged information. Privileged information-based approach commonly seen in human learning. To fill this gap, we incorporate learning using privileged information (LUPI) in deep RVFL model and propose deep RVFL with LUPI framework (dRVFL+). Privileged information is available while training the models. To make the model more robust, we propose ensemble deep RVFL+ with LUPI framework (edRVFL+). Unlike traditional ensemble approach wherein multiple base learners are trained, the proposed edRVFL+ optimises a single network and generates an ensemble via optimization at different levels of random projections of the data. Both dRVFL+ and edRVFL+ efficiently utilise the privileged information which results in better generalization performance. In LUPI framework, half of the available features are used as normal features and rest as the privileged features. However, we propose a novel approach for generating the privileged information. To the best of our knowledge, this is first time that a separate privileged information is generated. The proposed models are employed for the diagnosis of Alzheimer's disease. Experimental results show the promising performance of both the proposed models.

Journal ArticleDOI
TL;DR: Netpro2vec as discussed by the authors proposes a neural embedding framework based on probability distribution representations of graphs, which can look at basic node descriptions other than the degree, such as those induced by the Transition Matrix and Node Distance Distribution.
Abstract: The ever-increasing importance of structured data in different applications, especially in the biomedical field, has driven the need for reducing its complexity through projections into a more manageable space. The latest methods for learning features on graphs focus mainly on the neighborhood of nodes and edges. Methods capable of providing a representation that looks beyond the single node neighborhood are kernel graphs. However, they produce handcrafted features unaccustomed with a generalized model. To reduce this gap, in this work we propose a neural embedding framework, based on probability distribution representations of graphs, named Netpro2vec. The goal is to look at basic node descriptions other than the degree, such as those induced by the Transition Matrix and Node Distance Distribution. Netpro2vec provides embeddings completely independent from the task and nature of the data. The framework is evaluated on synthetic and various real biomedical network datasets through a comprehensive experimental classification phase and is compared to well-known competitors.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a real-time medical data processing method based on federated learning, which divides the process into the model stage and the exemplar stage, and fuse the old and new models to mitigate the catastrophic forgetting problem of the new model.
Abstract: Computer-aided diagnosis (CAD) has always been an important research topic for applying artificial intelligence in smart healthcare. Sufficient medical data are one of the most critical factors in CAD research. However, medical data are usually obtained in chronological order and cannot be collected all at once, which poses difficulties for the application of deep learning technology in the medical field. The traditional batch learning method consumes considerable time and space resources for real-time medical data, and the incremental learning method often leads to catastrophic forgetting. To solve these problems, we propose a real-time medical data processing method based on federated learning. We divide the process into the model stage and the exemplar stage. In the model stage, we use the federated learning method to fuse the old and new models to mitigate the catastrophic forgetting problem of the new model. In the exemplar stage, we use the most representative exemplars selected from the old data to help the new model review the old knowledge, which further mitigates the catastrophic forgetting problem of the new model. We use this method to conduct experiments on a simulated medical real-time data stream. The experimental results show that our method can learn a disease diagnosis model from a continuous medical real-time data stream. As the amount of data increases, the performance of the disease diagnosis model continues to improve, and the catastrophic forgetting problem has been effectively mitigated. Compared with the traditional batch learning method, our method can significantly save time and space resources.

Journal ArticleDOI
TL;DR: Liu et al. as discussed by the authors proposed Inductive Matrix Completion with Heterogeneous Graph Attention Network approach (IMCHGAN) for predicting drug-target interactions (DTIs), which adopts a two-level neural attention mechanism approach to learn drug and target latent feature representations from the DTI heterogeneous network respectively.
Abstract: Identification of targets among known drugs plays an important role in drug repurposing and discovery. Computational approaches for prediction of drug-target interactions (DTIs)are highly desired in comparison to traditional biological experiments as its fast and low price. Moreover, recent advances of systems biology approaches have generated large-scale heterogeneous, biological information networks data, which offer opportunities for machine learning-based identification of DTIs. We present a novel Inductive Matrix Completion with Heterogeneous Graph Attention Network approach (IMCHGAN)for predicting DTIs. IMCHGAN first adopts a two-level neural attention mechanism approach to learn drug and target latent feature representations from the DTI heterogeneous network respectively. Then, the learned latent features are fed into the Inductive Matrix Completion (IMC)prediction score model which computes the best projection from drug space onto target space and output DTI score via the inner product of projected drug and target feature representations. IMCHGAN is an end-to-end neural network learning framework where the parameters of both the prediction score model and the feature representation learning model are simultaneously optimized via backpropagation under supervising of the observed known drug-target interactions data. We compare IMCHGAN with other state-of-the-art baselines on two real DTI experimental datasets. The results show that our method is superior to existing methods in term of AUC and AUPR. Moreover, IMCHGAN also shows it has strong predictive power for novel (unknown)DTIs. All datasets and code can be obtained from https://github.com/ljatynu/IMCHGAN/.

Journal ArticleDOI
TL;DR: A network-based structural learning nonnegative matrix factorization algorithm (aka SLNMF) is proposed for the identification of cell types in scRNA-seq, which is transformed into a constrained optimization problem and significantly improves performance of algorithms.
Abstract: Single-cell RNA sequencing (scRNA-seq) measures expression profiles at the single-cell level, which sheds light on revealing the heterogeneity and functional diversity among cell populations. The vast majority of current algorithms identify cell types by directly clustering transcriptional profiles, which ignore indirect relations among cells, resulting in an undesirable performance on cell type discovery and trajectory inference. Therefore, there is a critical need for inferring cell types and trajectories by exploiting the interactions among cells. In this study, we propose a network-based structural learning nonnegative matrix factorization algorithm (aka SLNMF) for the identification of cell types in scRNA-seq, which is transformed into a constrained optimization problem. SLNMF first constructs the similarity network for cells and then extracts latent features of the cells by exploiting the topological structure of the cell-cell network. To improve the clustering performance, the structural constraint is imposed on the model to learn the latent features of cells by preserving the structural information of the networks, thereby significantly improving the performance of algorithms. Finally, we track the trajectory of cells by exploring the relationships among cell types. Fourteen scRNA-seq datasets are adopted to validate the performance of algorithms with the number of single cells varying from 49 to 26,484. The experimental results demonstrate that SLNMF significantly outperforms fifteen state-of-the-art methods with 15.32% improvement in terms of accuracy, and it accurately identifies the trajectories of cells. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. (The software is coded using matlab, and is freely available for academic https://github.com/xkmaxidian/SLNMF).

Journal ArticleDOI
TL;DR: In this article , a novel hybrid approach combining deep neural network (DNN) and extreme gradient boosting classifier (XGB) is employed for predicting protein-protein interactions (PPI) is essential.
Abstract: Understanding the behavioral process of life and disease-causing mechanism, knowledge regarding protein-protein interactions (PPI) is essential. In this paper, a novel hybrid approach combining deep neural network (DNN) and extreme gradient boosting classifier (XGB) is employed for predicting PPI. The hybrid classifier (DNN-XGB) uses a fusion of three sequence-based features, amino acid composition (AAC), conjoint triad composition (CT), and local descriptor (LD) as inputs. The DNN extracts the hidden information through a layer-wise abstraction from the raw features that are passed through the XGB classifier. The 5-fold cross-validation accuracy for intraspecies interactions dataset of Saccharomyces cerevisiae (core subset), Helicobacter pylori, Saccharomyces cerevisiae, and Human are 98.35, 96.19, 97.37, and 99.74 percent respectively. Similarly, accuracies of 98.50 and 97.25 percent are achieved for interspecies interaction dataset of Human- Bacillus Anthracis and Human- Yersinia pestis datasets, respectively. The improved prediction accuracies obtained on the independent test sets and network datasets indicate that the DNN-XGB can be used to predict cross-species interactions. It can also provide new insights into signaling pathway analysis, predicting drug targets, and understanding disease pathogenesis. Improved performance of the proposed method suggests that the hybrid classifier can be used as a useful tool for PPI prediction. The datasets and source codes are available at: https://github.com/SatyajitECE/DNN-XGB-for-PPI-Prediction.