scispace - formally typeset
Search or ask a question

Showing papers in "IEEE/ACM Transactions on Computational Biology and Bioinformatics in 2019"


Journal ArticleDOI
TL;DR: This study proposes a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer and shows that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches.
Abstract: Breast cancer is a highly aggressive type of cancer with very low median survival. Accurate prognosis prediction of breast cancer can spare a significant number of patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Previous work relies mostly on selected gene expression data to create a predictive model. The emergence of deep learning methods and multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of breast cancer and therefore can improve diagnosis, treatment, and prevention. In this study, we propose a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer. The novelty of the method lies in the design of our method's architecture and the fusion of multi-dimensional data. The comprehensive performance evaluation results show that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches. The source code implemented by TensorFlow 1.0 deep learning library can be downloaded from the Github: https://github.com/USTC-HIlab/MDNNMD.

174 citations


Journal ArticleDOI
Leyi Wei1, Pengwei Xing1, Gaotao Shi1, Zhiliang Ji2, Quan Zou1 
TL;DR: A novel random-forest-based predictor called MePred-RF is proposed, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique, which remarkably outperforms other state-of-the-art predictors.
Abstract: Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.

160 citations


Journal ArticleDOI
TL;DR: Deep learning is used with brain network and clinical relevant text information to make early diagnosis of Alzheimer's Disease and reveals discriminative brain network features effectively and provides a reliable classifier for AD detection.
Abstract: Computerized healthcare has undergone rapid development thanks to the advances in medical imaging and machine learning technologies. Especially, recent progress on deep learning opens a new era for multimedia based clinical decision support. In this paper, we use deep learning with brain network and clinical relevant text information to make early diagnosis of Alzheimer's Disease (AD). The clinical relevant text information includes age, gender, and $ApoE$ gene of the subject. The brain network is constructed by computing the functional connectivity of brain regions using resting-state functional magnetic resonance imaging (R-fMRI) data. A targeted autoencoder network is built to distinguish normal aging from mild cognitive impairment, an early stage of AD. The proposed method reveals discriminative brain network features effectively and provides a reliable classifier for AD detection. Compared to traditional classifiers based on R-fMRI time series data, about 31.21 percent improvement of the prediction accuracy is achieved by the proposed deep learning method, and the standard deviation reduces by 51.23 percent in the best case that means our prediction model is more stable and reliable compared to the traditional methods. Our work excavates deep learning's advantages of classifying high-dimensional multimedia data in medical services, and could help predict and prevent AD at an early stage.

151 citations


Journal ArticleDOI
TL;DR: A sequence-based predictor named “iPro70-PseZNC” was designed for identifying sigma70 promoters in prokaryote and studies showed that the performance of PseZNC is better than it of multi-window Z-curve composition.
Abstract: Promoters are DNA regulatory elements located directly upstream or at the 5’ end of the transcription initiation site (TSS), which are in charge of gene transcription initiation. With the completion of a large number of microorganism genomics, it is urgent to predict promoters accurately in bacteria by using the computational method. In this work, a sequence-based predictor named “iPro70-PseZNC” was designed for identifying sigma70 promoters in prokaryote. In the predictor, the samples of DNA sequences are formulated by a novel pseudo nucleotide composition, called PseZNC, into which the multi-window Z-curve composition and six local DNA structural properties are incorporated. In the 5-fold cross-validation, the area under the curve of receiver operating characteristic of 0.909 was obtained on our benchmark dataset, indicating that the proposed predictor is promising and will provide an important guide in this area. Further studies showed that the performance of PseZNC is better than it of multi-window Z-curve composition. For the sake of convenience for researchers, a user-friendly online service was established and can be freely accessible at http://lin.uestc.edu.cn/server/iPro70-PseZNC. The PseZNC approach can be also extended to other DNA-related problems.

148 citations


Journal ArticleDOI
TL;DR: This work reconstructs a miRNA functional similarity network using the following biological information: the miRNA family information, miRNA cluster information, experimentally valid miRNA—target association and disease—miRNA information, and reconstructing a disease similarity networks using disease functional information and disease semantic information.
Abstract: MicroRNAs (miRNAs) play critical roles in regulating gene expression at post-transcriptional levels. Numerous experimental studies indicate that alterations and dysregulations in miRNAs are associated with important complex diseases, especially cancers. Predicting potential miRNA—disease association is beneficial not only to explore the pathogenesis of diseases, but also to understand biological processes. In this work, we propose two methods that can effectively predict potential miRNA—disease associations using our reconstructed miRNA and disease similarity networks, which are based on the latest experimental data. We reconstruct a miRNA functional similarity network using the following biological information: the miRNA family information, miRNA cluster information, experimentally valid miRNA—target association and disease—miRNA information. We also reconstruct a disease similarity network using disease functional information and disease semantic information. We present Katz with specific weights and Katz with machine learning, on the comprehensive heterogeneous network. These methods, which achieve corresponding AUC values of 0.897 and 0.919, exhibit performance superior to the existing methods. Comprehensive data networks and reasonable considerations guarantee the high performance of our methods. Contrary to several methods, which cannot work in such situations, the proposed methods also predict associations for diseases without any known related miRNAs. A web service for the download and prediction of relationships between diseases and miRNAs is available at http://lab.malab.cn/soft/MDPredict/ .

114 citations


Journal ArticleDOI
TL;DR: A transfer learning procedure for cancer classification, which uses feature selection and normalization techniques in conjunction with s sparse auto-encoders on gene expression data and statistically outperforms several generally used cancer classification approaches.
Abstract: The emergence of deep learning has impacted numerous machine learning based applications and research. The reason for its success lies in two main advantages: 1) it provides the ability to learn very complex non-linear relationships between features and 2) it allows one to leverage information from unlabeled data that does not belong to the problem being handled. This paper presents a transfer learning procedure for cancer classification, which uses feature selection and normalization techniques in conjunction with s sparse auto-encoders on gene expression data. While classifying any two tumor types, data of other tumor types were used in unsupervised manner to improve the feature representation. The performance of our algorithm was tested on 36 two-class benchmark datasets from the GEMLeR repository. On performing statistical tests, it is clearly ascertained that our algorithm statistically outperforms several generally used cancer classification approaches. The deep learning based molecular disease classification can be used to guide decisions made on the diagnosis and treatment of diseases, and therefore may have important applications in precision medicine.

107 citations


Journal ArticleDOI
TL;DR: A new global network-based framework, LncRDNetFlow, to prioritize disease-related lncRNAs and performs significantly better than the existing state-of-the-art approaches in cross-validation and is used to identify the related lnc RNAs for ovarian cancer, glioma, and cervical cancer.
Abstract: Accumulating experimental evidence has indicated that long non-coding RNAs (lncRNAs) are critical for the regulation of cellular biological processes implicated in many human diseases. However, only relatively few experimentally supported lncRNA-disease associations have been reported. Developing effective computational methods to infer lncRNA-disease associations is becoming increasingly important. Current network-based algorithms typically use a network representation to identify novel associations between lncRNAs and diseases. But these methods are concentrated on specific entities of interest (lncRNAs and diseases) and they do not allow to consider networks with more than two types of entities. Considering the limitations in previous computational methods, we develop a new global network-based framework, LncRDNetFlow, to prioritize disease-related lncRNAs. LncRDNetFlow utilizes a flow propagation algorithm to integrate multiple networks based on a variety of biological information including lncRNA similarity, protein-protein interactions, disease similarity, and the associations between them to infer lncRNA-disease associations. We show that LncRDNetFlow performs significantly better than the existing state-of-the-art approaches in cross-validation. To further validate the reproducibility of the performance, we use the proposed method to identify the related lncRNAs for ovarian cancer, glioma, and cervical cancer. The results are encouraging. Many predicted lncRNAs in the top list have been verified by the biological studies.

105 citations


Journal ArticleDOI
TL;DR: Parallel-PC is developed, a fast and memory efficient PC algorithm that is suitable for personal computers and does not require end users’ parallel computing knowledge beyond their competency in using the PC algorithm, and integrated into a causal inference method for inferring miRNA-mRNA regulatory relationships.
Abstract: Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g., gene expression datasets. On another note, the advancement of computer hardware in the last decade has resulted in the widespread availability of multi-core personal computers. There is a significant motivation for designing a parallelized PC algorithm that is suitable for personal computers and does not require end users’ parallel computing knowledge beyond their competency in using the PC algorithm. In this paper, we develop parallel-PC, a fast and memory efficient PC algorithm using the parallel computing technique. We apply our method to a range of synthetic and real-world high dimensional datasets. Experimental results on a dataset from the DREAM 5 challenge show that the original PC algorithm could not produce any results after running more than 24 hours; meanwhile, our parallel-PC algorithm managed to finish within around 12 hours with a 4-core CPU computer, and less than six hours with a 8-core CPU computer. Furthermore, we integrate parallel-PC into a causal inference method for inferring miRNA-mRNA regulatory relationships. The experimental results show that parallel-PC helps improve both the efficiency and accuracy of the causal inference algorithm.

104 citations


Journal ArticleDOI
TL;DR: A deep learning framework called DeepLabeler is presented to automatically assign ICD-9 codes and it is found that the convolutional neural network is the most effective component in the network and the ‘Document to Vector’ technique is also necessary for enhancing classification performance since it extracts well-recognized global features.
Abstract: ICD-9 (the Ninth Revision of International Classification of Diseases) is widely used to describe a patient's diagnosis. Accurate automated ICD-9 coding is important because manual coding is expensive, time-consuming, and inefficient. Inspired by the recent successes of deep learning, in this study, we present a deep learning framework called DeepLabeler to automatically assign ICD-9 codes. DeepLabeler combines the convolutional neural network with the ‘Document to Vector’ technique to extract and encode local and global features. Our proposed DeepLabeler demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 0.335 micro F-measure on MIMIC-II dataset and 0.408 micro F-measure on MIMIC-III dataset. It outperforms classical hierarchy-based SVM and flat-SVM both on these two datasets by at least 14 percent. Furthermore, we analyze the deep neural network structure to discover the vital elements in the success of DeepLabeler. We find that the convolutional neural network is the most effective component in our network and the ‘Document to Vector’ technique is also necessary for enhancing classification performance since it extracts well-recognized global features. Extensive experimental results demonstrate that the great promise of deep learning techniques in the field of text multi-label classification and automated medical coding.

101 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a review of recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping, which includes diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies.
Abstract: This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often, better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives.

100 citations


Journal ArticleDOI
TL;DR: Results show that the drug-induced hepatotoxicity can be predicted with high accuracy and efficiency using the proposed predictive model and a new feature selection method, named MEMO, to deal with the high-dimensional toxicogenomics data.
Abstract: Drug-induced hepatotoxicity may cause acute and chronic liver disease, leading to great concern for patient safety. It is also one of the main reasons for drug withdrawal from the market. Toxicogenomics data has been widely used in hepatotoxicity prediction. In our study, we proposed a multi-dose computational model to predict the drug-induced hepatotoxicity based on gene expression and toxicity data. The dose/concentration information after drug treatment is fully utilized in our study based on the dose-response curve, thus a more informative representative of the dose-response relationship is considered. We also proposed a new feature selection method, named MEMO, which is also one important aspect of our multi-dose model in our study, to deal with the high-dimensional toxicogenomics data. We validated the proposed model using the TG-GATEs, which is a large database recording toxicogenomics data from multiple views. The experimental results show that the drug-induced hepatotoxicity can be predicted with high accuracy and efficiency using the proposed predictive model.

Journal ArticleDOI
TL;DR: A novel method is proposed for accurate recognition and classification of cardiac arrhythmia appearing with the presence of abnormal heart electrical activity and has a better performance by combining proposed features than by using the ECG morphology or ECG segment features separately.
Abstract: In this work, arrhythmia appearing with the presence of abnormal heart electrical activity is efficiently recognized and classified. A novel method is proposed for accurate recognition and classification of cardiac arrhythmias. Firstly, P-QRS-T waves is segmented from ECG waveform; secondly, morphological features are extracted from P-QRS-T waves, and ECG segment features are extracted from the selected ECG segment by using PCA and dynamic time warping(DTW); finally, SVM is applied to the features and automatic diagnosis results is presented. ECG data set used is derived from the MIT-BIH in which ECG signals are divided into the four classes: normal beats(N), supraventricular ectopic beats (SVEBs), ventricular ectopic beats (VEBs) and fusion of ventricular and normal (F). Our proposed method can distinguish N, SVEBs, VEBs and F with an accuracy of 97.80 percent. The sensitivities for the classes N, SVEBs, VEBs and F are 99.27, 87.47, 94.71, and 73.88 percent and the positive predictivities are 98.48, 95.25, 95.22 and 86.09 percent respectively. The detection sensitivity of SVEBs and VEBs has a better performance by combining proposed features than by using the ECG morphology or ECG segment features separately. The proposed method is compared with four selected peer algorithms and delivers solid results.

Journal ArticleDOI
TL;DR: A bipartite network based on known lncRNA-disease associations is constructed and a novel model for inferring potential lncRNAs associations is proposed, which significantly outperformed previous state-of-the-art models.
Abstract: An increasing number of studies have indicated that long-non-coding RNAs (lncRNAs) play critical roles in many important biological processes. Predicting potential lncRNA-disease associations can improve our understanding of the molecular mechanisms of human diseases and aid in finding biomarkers for disease diagnosis, treatment, and prevention. In this paper, we constructed a bipartite network based on known lncRNA-disease associations; based on this work, we proposed a novel model for inferring potential lncRNA-disease associations. Specifically, we analyzed the properties of the bipartite network and found that it closely followed a power-law distribution. Moreover, to evaluate the performance of our model, a leave-one-out cross-validation (LOOCV) framework was implemented, and the simulation results showed that our computational model significantly outperformed previous state-of-the-art models, with AUCs of 0.8825, 0.9004, and 0.9292 for known lncRNA-disease associations obtained from the LncRNADisease database, Lnc2Cancer database, and MNDR database, respectively. Thus, our approach may be an excellent addition to the biomedical research field in the future.

Journal ArticleDOI
TL;DR: A high-order convolutional neural network architecture (HOCNN) is proposed, which employs a high- order encoding method to build high-Order dependencies among nucleotides, and a multi-scale Convolutional layer to capture the motif features of different length.
Abstract: Although Deep learning algorithms have outperformed conventional methods in predicting the sequence specificities of DNA-protein binding, they lack to consider the dependencies among nucleotides and the diverse binding lengths for different transcription factors (TFs). To address the above two limitations simultaneously, in this paper, we propose a high-order convolutional neural network architecture (HOCNN), which employs a high-order encoding method to build high-order dependencies among nucleotides, and a multi-scale convolutional layer to capture the motif features of different length. The experimental results on real ChIP-seq datasets show that the proposed method outperforms the state-of-the-art deep learning method (DeepBind) in the motif discovery task. In addition, we provide further insights about the importance of introducing additional convolutional kernels and the degeneration problem of importing high-order in the motif discovery task.

Journal ArticleDOI
TL;DR: Experimental results on five large protein interaction networks demonstrated that compared to state-of-the-art protein complex detection algorithms, the proposed algorithm outperformed them in terms of both effectiveness and efficiency.
Abstract: Protein complexes are crucial in improving our understanding of the mechanisms employed by proteins. Various computational algorithms have thus been proposed to detect protein complexes from protein interaction networks. However, given massive protein interactome data obtained by high-throughput technologies, existing algorithms, especially those with additionally consideration of biological information of proteins, either have low efficiency in performing their tasks or suffer from limited effectiveness. For addressing this issue, this work proposes to detect protein complexes from a protein interaction network with high efficiency and effectiveness. To do so, the original detection task is first formulated into an optimization problem according to the intuitive properties of protein complexes. After that, the framework of alternating direction method of multipliers is applied to decompose this optimization problem into several subtasks, which can be subsequently solved in a separate and parallel manner. An algorithm for implementing this solution is then developed. Experimental results on five large protein interaction networks demonstrated that compared to state-of-the-art protein complex detection algorithms, our algorithm outperformed them in terms of both effectiveness and efficiency. Moreover, as number of parallel processes increases, one can expect an even higher computational efficiency for the proposed algorithm with no compromise on effectiveness.

Journal ArticleDOI
TL;DR: This work introduces MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns.
Abstract: The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo's practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee.

Journal ArticleDOI
TL;DR: SAFETY is a hybrid framework, which can securely perform genome-wide association studies on federated genomic datasets using homomorphic encryption and recently introduced secure hardware component of Intel Software Guard Extensions to ensure high efficiency and privacy at the same time.
Abstract: Recent studies demonstrate that effective healthcare can benefit from using the human genomic information. Consequently, many institutions are using statistical analysis of genomic data, which are mostly based on genome-wide association studies (GWAS). GWAS analyze genome sequence variations in order to identify genetic risk factors for diseases. These studies often require pooling data from different sources together in order to unravel statistical patterns, and relationships between genetic variants and diseases. Here, the primary challenge is to fulfill one major objective: accessing multiple genomic data repositories for collaborative research in a privacy-preserving manner. Due to the privacy concerns regarding the genomic data, multi-jurisdictional laws and policies of cross-border genomic data sharing are enforced among different countries. In this article, we present SAFETY, a hybrid framework, which can securely perform GWAS on federated genomic datasets using homomorphic encryption and recently introduced secure hardware component of Intel Software Guard Extensions to ensure high efficiency and privacy at the same time. Different experimental settings show the efficacy and applicability of such hybrid framework in secure conduction of GWAS. To the best of our knowledge, this hybrid use of homomorphic encryption along with Intel SGX is not proposed to this date. SAFETY is up to 4.82 times faster than the best existing secure computation technique.

Journal ArticleDOI
Zuping Zhang1, Jingpu Zhang1, Chao Fan1, Yongjun Tang1, Lei Deng1 
TL;DR: A global network-based method, KATZLGO, to predict the functions of human lncRNAs at large scale and significantly outperforms state-of-the-art computational method both in maximum F-measure and coverage.
Abstract: Aggregating evidences have shown that long non-coding RNAs (lncRNAs) generally play key roles in cellular biological processes such as epigenetic regulation, gene expression regulation at transcriptional and post-transcriptional levels, cell differentiation, and others. However, most lncRNAs have not been functionally characterized. There is an urgent need to develop computational approaches for function annotation of increasing available lncRNAs. In this article, we propose a global network-based method, KATZLGO, to predict the functions of human lncRNAs at large scale. A global network is constructed by integrating three heterogeneous networks: lncRNA-lncRNA similarity network, lncRNA-protein association network, and protein-protein interaction network. The KATZ measure is then employed to calculate similarities between lncRNAs and proteins in the global network. We annotate lncRNAs with Gene Ontology (GO) terms of their neighboring protein-coding genes based on the KATZ similarity scores. The performance of KATZLGO is evaluated on a manually annotated lncRNA benchmark and a protein-coding gene benchmark with known function annotations. KATZLGO significantly outperforms state-of-the-art computational method both in maximum F-measure and coverage. Furthermore, we apply KATZLGO to predict functions of human lncRNAs and successfully map 12,318 human lncRNA genes to GO terms.

Journal ArticleDOI
TL;DR: A method to predict miRNA-disease associations based on dynamic neighborhood regularized logistic matrix factorization, which outperforms the state-of-art method PBMDA and can predict potential diseases for new miRNAs and case studies illustrate that DNRLMF-MDA is an effective method.
Abstract: MicroRNAs (miRNAs) are a class of non-coding RNAs about $\sim$ 22nt nucleotides. Studies have proven that miRNAs play key roles in many human complex diseases. Therefore, discovering miRNA-disease associations is beneficial to understanding disease mechanisms, developing drugs, and treating complex diseases. It is well known that it is a time-consuming and expensive process to discover the miRNA-disease associations via biological experiments. Alternatively, computational models could provide a low-cost and high-efficiency way for predicting miRNA-disease associations. In this study, we propose a method (called DNRLMF-MDA) to predict miRNA-disease associations based on dynamic neighborhood regularized logistic matrix factorization. DNRLMF-MDA integrates known miRNA-disease associations, functional similarity and Gaussian Interaction Profile (GIP) kernel similarity of miRNAs, and functional similarity and GIP kernel similarity of diseases. Especially, positive observations (known miRNA-disease associations) are assigned higher importance levels than negative observations (unknown miRNA-disease associations).DNRLMF-MDA computes the probability that a miRNA would interact with a disease by a logistic matrix factorization method, where latent vectors of miRNAs and diseases represent the properties of miRNAs and diseases, respectively, and further improve prediction performance via dynamic neighborhood regularized. The 5-fold cross validation is adopted to assess the performance of our DNRLMF-MDA, as well as other competing methods for comparison. The computational experiments show that DNRLMF-MDA outperforms the state-of-art method PBMDA. The AUC values of DNRLMF-MDA on three datasets are 0.9357, 0.9411, and 0.9416, respectively, which are superior to the PBMDA's results of 0.9218, 0.9187, and 0.9262. The average computation times per 5-fold cross validation of DNRLMF-MDA on three datasets are 38, 46, and 50 seconds, which are shorter than the PBMDA's average computation times of 10869, 916, and 8448 seconds, respectively. DNRLMF-MDA also can predict potential diseases for new miRNAs. Furthermore, case studies illustrate that DNRLMF-MDA is an effective method to predict miRNA-disease associations.

Journal ArticleDOI
TL;DR: In this paper, a 3D laser scanner mounted as the robot's payload captures the surface point cloud data of the plant from multiple views, and an efficient 3D reconstruction algorithm is used, by which multiple scans are aligned together to obtain a threeD mesh of the plants, followed by surface area and volume computations.
Abstract: Machine vision for plant phenotyping is an emerging research area for producing high throughput in agriculture and crop science applications. Since 2D based approaches have their inherent limitations, 3D plant analysis is becoming state of the art for current phenotyping technologies. We present an automated system for analyzing plant growth in indoor conditions. A gantry robot system is used to perform scanning tasks in an automated manner throughout the lifetime of the plant. A 3D laser scanner mounted as the robot's payload captures the surface point cloud data of the plant from multiple views. The plant is monitored from the vegetative to reproductive stages in light/dark cycles inside a controllable growth chamber. An efficient 3D reconstruction algorithm is used, by which multiple scans are aligned together to obtain a 3D mesh of the plant, followed by surface area and volume computations. The whole system, including the programmable growth chamber, robot, scanner, data transfer, and analysis is fully automated in such a way that a naive user can, in theory, start the system with a mouse click and get back the growth analysis results at the end of the lifetime of the plant with no intermediate intervention. As evidence of its functionality, we show and analyze quantitative results of the rhythmic growth patterns of the dicot Arabidopsis thaliana (L.), and the monocot barley ( Hordeum vulgare L.) plants under their diurnal light/dark cycles.

Journal ArticleDOI
TL;DR: Both gene set enrichment analysis and predicted results demonstrate that dgSeq can effectively predict new disease genes, indicating its superiority to other three competing methods.
Abstract: Disease gene prediction is a challenging task that has a variety of applications such as early diagnosis and drug development. The existing machine learning methods suffer from the imbalanced sample issue because the number of known disease genes (positive samples) is much less than that of unknown genes which are typically considered to be negative samples. In addition, most methods have not utilized clinical data from patients with a specific disease to predict disease genes. In this study, we propose a disease gene prediction algorithm (called dgSeq) by combining protein-protein interaction (PPI) network, clinical RNA-Seq data, and Online Mendelian Inheritance in Man (OMIN) data. Our dgSeq constructs differential networks based on rewiring information calculated from clinical RNA-Seq data. To select balanced sets of non-disease genes (negative samples), a disease-gene network is also constructed from OMIM data. After features are extracted from the PPI networks and differential networks, the logistic regression classifiers are trained. Our dgSeq obtains AUC values of 0.88, 0.83, and 0.80 for identifying breast cancer genes, thyroid cancer genes, and Alzheimer's disease genes, respectively, which indicates its superiority to other three competing methods. Both gene set enrichment analysis and predicted results demonstrate that dgSeq can effectively predict new disease genes.

Journal ArticleDOI
TL;DR: A Bayesian Inverse Reinforcement Learning (BIRL) approach is developed to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert.
Abstract: Control of gene regulatory networks (GRNs) to shift gene expression from undesirable states to desirable ones has received much attention in recent years. Most of the existing methods assume that the cost of intervention at each state and time point, referred to as the immediate cost function, is fully known. In this paper, we employ the Partially-Observed Boolean Dynamical System (POBDS) signal model for a time sequence of noisy expression measurement from a Boolean GRN and develop a Bayesian Inverse Reinforcement Learning (BIRL) approach to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert. The Boolean Kalman Smoother (BKS) algorithm is used for optimally mapping the available gene-expression data into a sequence of Boolean states, and then the BIRL method is efficiently combined with the Q-learning algorithm for quantification of the immediate cost function. The performance of the proposed methodology is investigated by applying a state-feedback controller to two GRN models: a melanoma WNT5A Boolean network and a p53-MDM2 negative feedback loop Boolean network, when the cost of the undesirable states, and thus the identity of the undesirable genes, is learned using the proposed methodology.

Journal ArticleDOI
TL;DR: A machine learning model is developed that can quickly and accurately flag compounds which effectively disrupt vascular networks from images taken before and after drug application in vitro.
Abstract: Likely drug candidates which are identified in traditional pre-clinical drug screens often fail in patient trials, increasing the societal burden of drug discovery. A major contributing factor to this phenomenon is the failure of traditional in vitro models of drug response to accurately mimic many of the more complex properties of human biology. We have recently introduced a new microphysiological system for growing vascularized, perfused microtissues that more accurately models human physiology and is suitable for large drug screens. In this work, we develop a machine learning model that can quickly and accurately flag compounds which effectively disrupt vascular networks from images taken before and after drug application in vitro. The system is based on a convolutional neural network and achieves near perfect accuracy while committing potentially no expensive false negatives.

Journal ArticleDOI
TL;DR: This paper proposes a new method for constructing refined PIN by using gene expression profiles and subcellular location information and shows that all of the 10 network-based methods achieve better results when being applied on TS-PIN than thatbeing applied on S-PIN and NF-APIN.
Abstract: Identification of essential proteins based on protein interaction network (PIN) is a very important and hot topic in the post genome era. Up to now, a number of network-based essential protein discovery methods have been proposed. Generally, a static protein interaction network was constructed by using the protein-protein interactions obtained from different experiments or databases. Unfortunately, most of the network-based essential protein discovery methods are sensitive to the reliability of the constructed PIN. In this paper, we propose a new method for constructing refined PIN by using gene expression profiles and subcellular location information. The basic idea behind refining the PIN is that two proteins should have higher possibility to physically interact with each other if they appear together at the same subcellular location and are active together at least at a time point in the cell cycle. The original static PIN is denoted by S-PIN while the final PIN refined by our method is denoted by TS-PIN. To evaluate whether the constructed TS-PIN is more suitable to be used in the identification of essential proteins, 10 network-based essential protein discovery methods (DC, EC, SC, BC, CC, IC, LAC, NC, BN, and DMNC) are applied on it to identify essential proteins. A comparison of TS-PIN and two other networks: S-PIN and NF-APIN (a noise-filtered active PIN constructed by using gene expression data and S-PIN) is implemented on the prediction of essential proteins by using these ten network-based methods. The comparison results show that all of the 10 network-based methods achieve better results when being applied on TS-PIN than that being applied on S-PIN and NF-APIN.

Journal ArticleDOI
TL;DR: A deep learning framework to detect prostate cancer in the sequential CEUS images uniformly extracts features from both the spatial and the temporal dimensions by performing three-dimensional convolution operations, which captures the dynamic information of the perfusion process encoded in multiple adjacent frames for prostate cancer detection.
Abstract: The important role of angiogenesis in cancer development has driven many researchers to investigate the prospects of noninvasive cancer diagnosis based on the technology of contrast-enhanced ultrasound (CEUS) imaging. This paper presents a deep learning framework to detect prostate cancer in the sequential CEUS images. The proposed method uniformly extracts features from both the spatial and the temporal dimensions by performing three-dimensional convolution operations, which captures the dynamic information of the perfusion process encoded in multiple adjacent frames for prostate cancer detection. The deep learning models were trained and validated against expert delineations over the CEUS images recorded using two types of contrast agents, i.e., the anti-PSMA based agent targeted to prostate cancer cells and the non-targeted blank agent. Experiments showed that the deep learning method achieved over 91 percent specificity and 90 percent average accuracy over the targeted CEUS images for prostate cancer detection, which was superior ( $p p 0 . 05 ) than previously reported approaches and implementations.

Journal ArticleDOI
TL;DR: This paper proposes an approach, which use Biweight Midcorrelation to measure the correlation between factors and make use of Nonconvex Penalty based sparse regression for Gene Regulatory Network inference (BMNPGRN), which incorporates multi-omics data and their interactions in gene regulatory network model.
Abstract: Underlying a cancer phenotype is a specific gene regulatory network that represents the complex regulatory relationships between genes It remains, however, a challenge to find cancer-related gene regulatory network because of insufficient sample sizes and complex regulatory mechanisms in which gene is influenced by not only other genes but also other biological factors With the development of high-throughput technologies and the unprecedented wealth of multi-omics data it gives us a new opportunity to design machine learning method to investigate underlying gene regulatory network In this paper, we propose an approach, which use Biweight Midcorrelation to measure the correlation between factors and make use of Nonconvex Penalty based sparse regression for Gene Regulatory Network inference (BMNPGRN) BMNCGRN incorporates multi-omics data (including DNA methylation and copy number variation) and their interactions in gene regulatory network model The experimental results on synthetic datasets show that BMNPGRN outperforms popular and state-of-the-art methods (including DCGRN, ARACNE, and CLR) under false positive control Furthermore, we applied BMNPGRN on breast cancer (BRCA) data from The Cancer Genome Atlas database and provided gene regulatory network

Journal ArticleDOI
TL;DR: A new deep residual inception network architecture, called DeepRIN, is proposed for the prediction of Psi-Phi angles, which enables effective encoding of local and global interatcions between amino acids in a protein sequence to achieve accruacte prediction.
Abstract: Prediction of protein backbone torsion angles (Psi and Phi) can provide important information for protein structure prediction and sequence alignment. Existing methods for Psi-Phi angle prediction have significant room for improvement. In this paper, a new deep residual inception network architecture, called DeepRIN, is proposed for the prediction of Psi-Phi angles. The input to DeepRIN is a feature matrix representing a composition of physico-chemical properties of amino acids, a 20-dimensional position-specific substitution matrix (PSSM) generated by PSI-BLAST, a 30-dimensional hidden Markov Model sequence profile generated by HHBlits, and predicted eight-state secondary structure features. DeepRIN is designed based on inception networks and residual networks that have performed well on image classification and text recognition. The architecture of DeepRIN enables effective encoding of local and global interatcions between amino acids in a protein sequence to achieve accruacte prediction. Extensive experimental results show that DeepRIN outperformed the best existing tools significantly. Compared to the recently released state-of-the-art tool, SPIDER3, DeepRIN reduced the Psi angle prediction error by more than 5 degrees and the Phi angle prediction error by more than 2 degrees on average. The executable tool of DeepRIN is available for download at http://dslsrv8.cs.missouri.edu/~cf797/MUFoldAngle/.

Journal ArticleDOI
TL;DR: This study proposes an efficient approach, Random Walk on a Heterogeneous Network for Drug Repositioning (RWHNDR), to prioritize candidate drugs for diseases based on multi-source data, which can achieve better performance compared with other state-of-the-art approaches.
Abstract: Drug repositioning is an efficient and promising strategy to identify new indications for existing drugs, which can improve the productivity of traditional drug discovery and development. Rapid advances in high-throughput technologies have generated various types of biomedical data over the past decades, which lay the foundations for furthering the development of computational drug repositioning approaches. Although many researches have tried to improve the repositioning accuracy by integrating information from multiple sources and different levels, it is still appealing to further investigate how to efficiently exploit valuable data for drug repositioning. In this study, we propose an efficient approach, Random Walk on a Heterogeneous Network for Drug Repositioning (RWHNDR), to prioritize candidate drugs for diseases. First, an integrated heterogeneous network is constructed by combining multiple sources including drugs, drug targets, diseases and disease genes data. Then, a random walk model is developed to capture the global information of the heterogeneous network. RWHNDR takes advantage of drug targets and disease genes data more comprehensively for drug repositioning. The experiment results show that our approach can achieve better performance, compared with other state-of-the-art approaches which prioritized candidate drugs based on multi-source data.

Journal ArticleDOI
TL;DR: The GenP system proposed is an ensemble that combines multiple texture features (both handcrafted and learned descriptors) for superior and generalizable discriminative power and obtains a boosting of performance by combining local features, dense sampling features, and deep learning features.
Abstract: Bioimage classification is increasingly becoming more important in many biological studies including those that require accurate cell phenotype recognition, subcellular localization, and histopathological classification. In this paper, we present a new General Purpose (GenP) bioimage classification method that can be applied to a large range of classification problems. The GenP system we propose is an ensemble that combines multiple texture features (both handcrafted and learned descriptors) for superior and generalizable discriminative power. Our ensemble obtains a boosting of performance by combining local features, dense sampling features, and deep learning features. Each descriptor is used to train a different Support Vector Machine that is then combined by sum rule. We evaluate our method on a diverse set of bioimage classification tasks each represented by a benchmark database, including some of those available in the IICBU 2008 database. Each bioimage classification task represents a typical subcellular, cellular, and tissue level classification problem. Our evaluation on these datasets demonstrates that the proposed GenP bioimage ensemble obtains state-of-the-art performance without any ad-hoc dataset tuning of the parameters (thereby avoiding any risk of overfitting/overtraining). To reproduce the experiments reported in this paper, the MATLAB code of all the descriptors is available at https://github.com/LorisNanni and https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0.

Journal ArticleDOI
TL;DR: This article presents a new method using abductive reasoning principles inferring the minimal causal topological actions leading to an expected behavior at stable state of disease-induced molecular perturbations and drug actions on molecular networks leading to cell phenotype reprogramming.
Abstract: Complex diseases such as Cancer or Alzheimer's are caused by multiple molecular perturbations leading to pathological cellular behavior. However, the identification of disease-induced molecular perturbations and subsequent development of efficient therapies are challenged by the complexity of the genotype-phenotype relationship. Accordingly, a key issue is to develop frameworks relating molecular perturbations and drug effects to their consequences on cellular phenotypes. Such framework would aim at identifying the sets of causal molecular factors leading to phenotypic reprogramming. In this article, we propose a theoretical framework, called Boolean Control Networks, where disease-induced molecular perturbations and drug actions are seen as topological perturbations/actions on molecular networks leading to cell phenotype reprogramming. We present a new method using abductive reasoning principles inferring the minimal causal topological actions leading to an expected behavior at stable state. Then, we compare different implementations of the algorithm and finally, show a proof-of-concept of the approach on a model of network regulating the proliferation/apoptosis switch in breast cancer by automatically discovering driver genes and their synthetic lethal drug target partner.