scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources

TL;DR: Case studies further demonstrated the feasibility of the method to discover potential miRNA-disease associations and highlighted three limitations commonly associated with previous computational methods.
Abstract: Since the discovery of the regulatory function of microRNA (miRNA), increased attention has focused on identifying the relationship between miRNA and disease. It has been suggested that computational method is an efficient way to identify potential disease-related miRNAs for further confirmation using biological experiments. In this paper, we first highlighted three limitations commonly associated with previous computational methods. To resolve these limitations, we established disease similarity subnetwork and miRNA similarity subnetwork by integrating multiple data sources, where the disease similarity is composed of disease semantic similarity and disease functional similarity, and the miRNA similarity is calculated using the miRNA-target gene and miRNA-lncRNA (long non-coding RNA) associations. Then, a heterogeneous network was constructed by connecting the disease similarity subnetwork and the miRNA similarity subnetwork using the known miRNA-disease associations. We extended random walk with restart to predict miRNA-disease associations in the heterogeneous network. The leave-one-out cross-validation achieved an average area under the curve (AUC) of $0.8049$ across $341$ diseases and $476$ miRNAs. For five-fold cross-validation, our method achieved an AUC from $0.7970$ to $0.9249$ for $15$ human diseases. Case studies further demonstrated the feasibility of our method to discover potential miRNA-disease associations. An online service for prediction is freely available at http://ifmda.aliapp.com .
Citations
More filters
Journal ArticleDOI
TL;DR: A novel model of Inductive Matrix Completion for MiRNA‐Disease Association prediction (IMCMDA) to complete the missing miRNA‐disease association based on the known associations and the integrated miRNA similarity and disease similarity.
Abstract: Motivation It has been shown that microRNAs (miRNAs) play key roles in variety of biological processes associated with human diseases. In Consideration of the cost and complexity of biological experiments, computational methods for predicting potential associations between miRNAs and diseases would be an effective complement. Results This paper presents a novel model of Inductive Matrix Completion for MiRNA-Disease Association prediction (IMCMDA). The integrated miRNA similarity and disease similarity are calculated based on miRNA functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity. The main idea is to complete the missing miRNA-disease association based on the known associations and the integrated miRNA similarity and disease similarity. IMCMDA achieves AUC of 0.8034 based on leave-one-out-cross-validation and improved previous models. In addition, IMCMDA was applied to five common human diseases in three types of case studies. In the first type, respectively, 42, 44, 45 out of top 50 predicted miRNAs of Colon Neoplasms, Kidney Neoplasms, Lymphoma were confirmed by experimental reports. In the second type of case study for new diseases without any known miRNAs, we chose Breast Neoplasms as the test example by hiding the association information between the miRNAs and Breast Neoplasms. As a result, 50 out of top 50 predicted Breast Neoplasms-related miRNAs are verified. In the third type of case study, IMCMDA was tested on HMDD V1.0 to assess the robustness of IMCMDA, 49 out of top 50 predicted Esophageal Neoplasms-related miRNAs are verified. Availability and implementation The code and dataset of IMCMDA are freely available at https://github.com/IMCMDAsourcecode/IMCMDA. Supplementary information Supplementary data are available at Bioinformatics online.

362 citations

Journal ArticleDOI
TL;DR: It is shown that Bayesian models are able to use prior information and model measurements with various distributions, and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Abstract: Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

333 citations


Cites methods from "Inferring MicroRNA-Disease Associat..."

  • ...Random walk methods have been applied on either two-relational heterogeneous networks (such as gene–phenotype associations [88], drug–target interactions [89] and miRNA–disease associations [90, 91]) or multi-relational heterogeneous networks (for example, drug–disease associations [92]) to infer novel candidate relations....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors describe the principles of data integration and discuss current methods and available implementations, as well as current challenges in biomedical integrative methods and their perspective on the future development of the field.

212 citations

Journal ArticleDOI
TL;DR: The principles of data integration are described and current methods and available implementations are discussed and examples of successful data integration in biology and medicine are provided.
Abstract: New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

149 citations

Journal ArticleDOI
TL;DR: A novel machine learning-based predictor called M6APred-EL, expected to be a practical and effective tool for the investigation of m6A functional mechanisms, is developed and compared with other state-of-the-art methods of benchmarking datasets.
Abstract: N6-methyladenosine (m6A) modification is the most abundant RNA methylation modification and involves various biological processes, such as RNA splicing and degradation. Recent studies have demonstrated the feasibility of identifying m6A peaks using high-throughput sequencing techniques. However, such techniques cannot accurately identify specific methylated sites, which is important for a better understanding of m6A functions. In this study, we develop a novel machine learning-based predictor called M6APred-EL for the identification of m6A sites. To predict m6A sites accurately within genomic sequences, we trained an ensemble of three support vector machine classifiers that explore the position-specific information and physical chemical information from position-specific k-mer nucleotide propensity, physical-chemical properties, and ring-function-hydrogen-chemical properties. We examined and compared the performance of our predictor with other state-of-the-art methods of benchmarking datasets. Comparative results showed that the proposed M6APred-EL performed more accurately for m6A site identification. Moreover, a user-friendly web server that implements the proposed M6APred-EL is well established and is currently available at http://server.malab.cn/M6APred-EL/. It is expected to be a practical and effective tool for the investigation of m6A functional mechanisms.

144 citations

References
More filters
Journal ArticleDOI
03 Dec 1993-Cell
TL;DR: Two small lin-4 transcripts of approximately 22 and 61 nt were identified in C. elegans and found to contain sequences complementary to a repeated sequence element in the 3' untranslated region (UTR) of lin-14 mRNA, suggesting that lin- 4 regulates lin- 14 translation via an antisense RNA-RNA interaction.

11,932 citations


"Inferring MicroRNA-Disease Associat..." refers background in this paper

  • ...in 1993 [1], many miRNAs have been discovered [2], [3], [4], [5], [6] and miRBase [7] has now accumulated more than 28;000 of them....

    [...]

Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations


"Inferring MicroRNA-Disease Associat..." refers background in this paper

  • ...have indicated that many miRNAs are associated with the development of various cancers [17], [18], [19], [20], [21]....

    [...]

Journal ArticleDOI
TL;DR: MiRNA-expression profiling of human tumours has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment and has been exploited to identify miRNA genes that might represent downstream targets of activated oncogenic pathways, or that target protein-coding genes involved in cancer.
Abstract: MicroRNA (miRNA ) alterations are involved in the initiation and progression of human cancer. The causes of the widespread differential expression of miRNA genes in malignant compared with normal cells can be explained by the location of these genes in cancer-associated genomic regions, by epigenetic mechanisms and by alterations in the miRNA processing machinery. MiRNA-expression profiling of human tumours has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment. In addition, profiling has been exploited to identify miRNA genes that might represent downstream targets of activated oncogenic pathways, or that target protein- coding genes involved in cancer.

6,345 citations

Journal ArticleDOI
24 Feb 2000-Nature
TL;DR: It is shown that let-7 is a heterochronic switch gene that encodes a temporally regulated 21-nucleotide RNA that is complementary to elements in the 3′ untranslated regions of the heteroch chronic genes lin-14, lin-28, Lin-41, lin -42 and daf-12, indicating that expression of these genes may be directly controlled by let- 7.
Abstract: The C. elegans heterochronic gene pathway consists of a cascade of regulatory genes that are temporally controlled to specify the timing of developmental events1. Mutations in heterochronic genes cause temporal transformations in cell fates in which stage-specific events are omitted or reiterated2. Here we show that let-7 is a heterochronic switch gene. Loss of let-7 gene activity causes reiteration of larval cell fates during the adult stage, whereas increased let-7 gene dosage causes precocious expression of adult fates during larval stages. let-7 encodes a temporally regulated 21-nucleotide RNA that is complementary to elements in the 3′ untranslated regions of the heterochronic genes lin-14, lin-28, lin-41, lin-42 and daf-12, indicating that expression of these genes may be directly controlled by let-7. A reporter gene bearing the lin-41 3′ untranslated region is temporally regulated in a let-7-dependent manner. A second regulatory RNA, lin-4, negatively regulates lin-14 and lin-28 through RNA–RNA interactions with their 3′ untranslated regions3,4. We propose that the sequential stage-specific expression of the lin-4 and let-7 regulatory RNAs triggers transitions in the complement of heterochronic regulatory proteins to coordinate developmental timing.

4,821 citations

Journal ArticleDOI
TL;DR: An update of the miRBase database is described, including the collation and use of deep sequencing data sets to assign levels of confidence to miR base entries, and a high confidence subset of miR Base entries are provided, based on the pattern of mapped reads.
Abstract: We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information.

4,705 citations


"Inferring MicroRNA-Disease Associat..." refers background or methods in this paper

  • ...in 1993 [1], many miRNAs have been discovered [2], [3], [4], [5], [6] and miRBase [7] has now accumulated more than 28;000 of them....

    [...]

  • ...The correct disease name and miRNA name are obtained from the National Library of Medicine and the miRBase database [7], respectively....

    [...]