scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization

TL;DR: Two matrix factorization methods that use graph regularization in order to learn low-dimensional non-linear manifolds are proposed and developed, which achieved better results than three other state-of-the-art methods in most cases.
Abstract: Experimental determination of drug-target interactions is expensive and time-consuming. Therefore, there is a continuous demand for more accurate predictions of interactions using computational techniques. Algorithms have been devised to infer novel interactions on a global scale where the input to these algorithms is a drug-target network (i.e., a bipartite graph where edges connect pairs of drugs and targets that are known to interact). However, these algorithms had difficulty predicting interactions involving new drugs or targets for which there are no known interactions (i.e., “orphan” nodes in the network). Since data usually lie on or near to low-dimensional non-linear manifolds, we propose two matrix factorization methods that use graph regularization in order to learn such manifolds. In addition, considering that many of the non-occurring edges in the network are actually unknown or missing cases, we developed a preprocessing step to enhance predictions in the “new drug” and “new target” cases by adding edges with intermediate interaction likelihood scores. In our cross validation experiments, our methods achieved better results than three other state-of-the-art methods in most cases. Finally, we simulated some “new drug” and “new target” cases and found that GRMF predicted the left-out interactions reasonably well.
Citations
More filters
Journal ArticleDOI
TL;DR: DeepConv-DTI as mentioned in this paper proposes a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs, which can detect binding sites of proteins for DTIs.
Abstract: Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI.

253 citations

Journal ArticleDOI
TL;DR: A method is presented called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions and outperform the previously reported models across the studied datasets.
Abstract: Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug–target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug–target pairs are not differentiated from inactive cases, and that predicted levels of activity depend on pre-defined binarization thresholds. In this paper, we present a method called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions. Additionally, we propose a version of the method called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity, thus defining the Applicability Domain metrics explicitly. We evaluate SimBoost and SimBoostQuant on two established drug–target interaction benchmark datasets and one new dataset that we propose to use as a benchmark for read-across cheminformatics applications. We demonstrate that our methods outperform the previously reported models across the studied datasets.

228 citations

Journal ArticleDOI
TL;DR: This article describes the data used in computational DTI prediction efforts, and categorize and elaborate the state-of-the-art methods for predicting DTIs, discussing the advantages and disadvantages of each method.
Abstract: Computational prediction of drug-target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.

201 citations


Cites methods from "Drug-Target Interaction Prediction ..."

  • ...WGRMF’s objective is given as: minA;B jjW ðY AB>Þjj2F þ klðjjAjj 2 F þ jjBjj 2 FÞ þ kdTrðA> ~‘d AÞ þ ktTrðB> ~‘t BÞ: (30) where Trð Þ is the trace of a matrix, and ~‘d and ~‘t are the normalized graph Laplacians that are obtained from Sd and St, respectively....

    [...]

  • ...Weighted Graph Regularized Matrix Factorization Weighted Graph Regularized Matrix Factorization (WGRMF) [58] is similar to CMF with the exception that WGRMF alternatively uses graph regularization terms to learn a manifold for label propagation....

    [...]

  • ...[51], NRWRH [52], PSL [53], DASPfind [54] Network diffusion methods investigate graph-based techniques to predict new interactions Matrix factorization KBMF2K [55], PMF [56], CMF [57], WGRMF [58], NRLMF [59], DNILMF [60] Matrix factorization finds two latent feature matrices that, when multiplied together, reconstruct the interaction matrix Feature-based classification He et al....

    [...]

  • ...Weighted Graph Regularized Matrix Factorization (WGRMF) [58] is similar to CMF with the exception that WGRMF alternatively uses graph regularization terms to learn a manifold for label propagation....

    [...]

  • ...Categories Methods Category description Neighborhood Nearest Profile and Weighted Profile [22], SRP [45] Neighborhood methods use relatively simple similarity functions to perform predictions BLMs Bleakley et al. [46], LapRLS [47], RLS-avg and RLS-kron [48], BLM-NII [49] BLMs perform two sets of predictions, one from the drug side and one from the target side, and then aggregates these predictions to give the final prediction scores Network diffusion NBI [50], Wang et al. [51], NRWRH [52], PSL [53], DASPfind [54] Network diffusion methods investigate graph-based techniques to predict new interactions Matrix factorization KBMF2K [55], PMF [56], CMF [57], WGRMF [58], NRLMF [59], DNILMF [60] Matrix factorization finds two latent feature matrices that, when multiplied together, reconstruct the interaction matrix Feature-based classification He et al. [61], Yu et al. [62], Fuzzy KNN [63], Ezzat et al. [64], EnsemDT [65], SITAR [66], RFDT [78], PDTPS [81], ER-Tree [83], SCCA [84], MH-L1SVM [86] Feature-based classification methods are those that need the drug–target pairs to be explicitly represented as fixed-length feature vectors Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bby002/4824712 by National University of Singapore user on 26 January 2018 Specifically, assuming a bipartite DTI network, the algorithm tries to predict whether the edge eij exists between drug di and target tj....

    [...]

Journal ArticleDOI
TL;DR: The data required for the task of DTI prediction is described followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs.
Abstract: The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug-target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.

192 citations

Journal ArticleDOI
Ruolan Chen1, Xiangrong Liu1, Shuting Jin1, Jiawei Lin1, Juan Liu1 
TL;DR: A hierarchical classification scheme is adopted and several representative methods of each category of drug-target interaction prediction are introduced, especially the recent state-of-the-art methods.
Abstract: Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview First, we summarize a brief list of databases frequently used in drug discovery Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods In addition, we compare the advantages and limitations of methods in each category Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers

162 citations


Additional excerpts

  • ...[59], employed two matrix factorization methods (i....

    [...]

References
More filters
Journal ArticleDOI
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations


"Drug-Target Interaction Prediction ..." refers background in this paper

  • ..., [19], [20], [21]) that data usually lies on (or near to) a manifold, learning...

    [...]

Journal ArticleDOI
22 Dec 2000-Science
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Abstract: Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs-30,000 auditory nerve fibers or 10(6) optic nerve fibers-a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.

13,652 citations


"Drug-Target Interaction Prediction ..." refers background in this paper

  • ..., [19], [20], [21]) that data usually lies on (or near to) a manifold, learning...

    [...]

Journal ArticleDOI
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

10,262 citations

Proceedings ArticleDOI
25 Jun 2006
TL;DR: It is shown that a deep connection exists between ROC space and PR space, such that a curve dominates in R OC space if and only if it dominates in PR space.
Abstract: Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.

5,063 citations


"Drug-Target Interaction Prediction ..." refers background or methods in this paper

  • ...By observing the results of our proposed methods in Tables 2 and 3, modeling the manifold structures of the drug and target spaces (via the drug and target graph regularization terms, respectively) was shown to improve prediction performance in terms of AUPR, indicating the effectiveness of the proposed graph regularization....

    [...]

  • ...In previous studies (e.g., [11], [14], [15]), the Area Under the Precision-Recall curve (AUPR) [29] was employed as the main metric for performance evaluation....

    [...]

  • ...In addition, it seems that the same drug-to-target ratio caused the opposite to take place under CVt (where target information is more important)—all methods achieved higher AUPR for the IC and E datasets than for the NR and GPCR datasets....

    [...]

  • ...For example, in both the NR and GPCR dataset, the drug-to-target ratio is higher than in the IC and E datasets (see Table 1); we believe this is why all methods achieved higher AUPR for the NR and GPCR datasets than for the IC and E datasets under CVd (where drug information is more important)....

    [...]

  • ..., the AUPR metric heavily punishes highly ranked false positives [29])....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering.
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami operator on a manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered.

4,557 citations


"Drug-Target Interaction Prediction ..." refers background in this paper

  • ..., [19], [20], [21]) that data usually lies on (or near to) a manifold, learning...

    [...]