Prediction and Validation of Disease Genes Using HeteSim Scores

doi:10.1109/TCBB.2016.2520947

Home
/
Papers
/
Prediction and Validation of Disease Genes Using HeteSim Scores

Journal Article•DOI•

Prediction and Validation of Disease Genes Using HeteSim Scores

Xiangxiang Zeng, Yuanlu Liao, Yuansheng Liu¹, Quan Zou²•Institutions (2)

Dalian University of Technology¹, Tianjin University²

01 May 2017-IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE)-Vol. 14, Iss: 3, pp 687-695

TL;DR: A novel relevance measure, called HeteSim, is used, to prioritize candidate disease genes, and it is found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases.

read less

Abstract: Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp .

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

MicroRNAs and complex diseases: from experimental results to computational models.

[...]

Xing Chen¹, Di Xie², Qi Zhao², Zhu-Hong You³•Institutions (3)

China University of Mining and Technology¹, Liaoning University², Chinese Academy of Sciences³

20 May 2021-Briefings in Bioinformatics

TL;DR: Twenty state-of-the-art computational models of predicting miRNA-disease associations from different perspectives are reviewed, including five feasible and important research schemas, and future directions for further development of computational models are summarized.

...read moreread less

Abstract: Circular RNAs (circRNAs) are a class of single-stranded, covalently closed RNA molecules with a variety of biological functions. Studies have shown that circRNAs are involved in a variety of biological processes and play an important role in the development of various complex diseases, so the identification of circRNA-disease associations would contribute to the diagnosis and treatment of diseases. In this review, we summarize the discovery, classifications and functions of circRNAs and introduce four important diseases associated with circRNAs. Then, we list some significant and publicly accessible databases containing comprehensive annotation resources of circRNAs and experimentally validated circRNA-disease associations. Next, we introduce some state-of-the-art computational models for predicting novel circRNA-disease associations and divide them into two categories, namely network algorithm-based and machine learning-based models. Subsequently, several evaluation methods of prediction performance of these computational models are summarized. Finally, we analyze the advantages and disadvantages of different types of computational models and provide some suggestions to promote the development of circRNA-disease association identification from the perspective of the construction of new computational models and the accumulation of circRNA-related data.

...read moreread less

473 citations

Journal Article•DOI•

Machine Learning for Drug-Target Interaction Prediction

[...]

Ruolan Chen¹, Xiangrong Liu¹, Shuting Jin¹, Jiawei Lin¹, Juan Liu¹ - Show less +1 more•Institutions (1)

Xiamen University¹

31 Aug 2018-Molecules

TL;DR: A hierarchical classification scheme is adopted and several representative methods of each category of drug-target interaction prediction are introduced, especially the recent state-of-the-art methods.

...read moreread less

Abstract: Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview First, we summarize a brief list of databases frequently used in drug discovery Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods In addition, we compare the advantages and limitations of methods in each category Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers

...read moreread less

162 citations

Cites methods from "Prediction and Validation of Diseas..."

...Computational methods have achieved favorable performance in many related bioinformatics fields, such as disease-related miRNA prediction [7–9], disease genes prediction [10], protein-protein interaction prediction [11] and protein subcellular location prediction [12]....
[...]

Journal Article•DOI•

Gene Expression Value Prediction Based on XGBoost Algorithm.

[...]

Wei Li¹, Yanbin Yin², Xiongwen Quan¹, Han Zhang¹•Institutions (2)

Nankai University¹, University of Nebraska–Lincoln²

12 Nov 2019-Frontiers in Genetics

TL;DR: An algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability and outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

...read moreread less

Abstract: Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

...read moreread less

127 citations

Cites background from "Prediction and Validation of Diseas..."

...Gene expression profiling is a vital biological tool commonly used to capture the response of cells to disease or drug treatments (Celis et al., 2000; Mclachlan et al., 2005; Wang et al., 2006; Mallick et al., 2009; Zeng et al., 2016)....
[...]

Journal Article•DOI•

A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization

[...]

Wuritu Yang¹, Xiao-Juan Zhu¹, Jian Huang¹, Hui Ding¹, Hao Lin¹ - Show less +1 more•Institutions (1)

University of Electronic Science and Technology of China¹

07 Mar 2019-Current Bioinformatics

TL;DR: The benchmark dataset, feature extraction, machine learning method and published results were summarized and the perspective of machine learning methods in protein sub-Golgi apparatus localization prediction was pointed out.

...read moreread less

Abstract: The location of proteins in a cell can provide important clues to their functions in various biological processes. Thus, the application of machine learning method in the prediction of protein subcellular localization has become a hotspot in bioinformatics. As one of key organelles, the Golgi apparatus is in charge of protein storage, package, and distribution.The identification of protein location in Golgi apparatus will provide in-depth insights into their functions. Thus, the machine learning-based method of predicting protein location in Golgi apparatus has been extensively explored. The development of protein sub-Golgi apparatus localization prediction should be reviewed for providing a whole background for the fields.The benchmark dataset, feature extraction, machine learning method and published results were summarized.We briefly introduced the recent progresses in protein sub-Golgi apparatus localization prediction using machine learning methods and discussed their advantages and disadvantages.We pointed out the perspective of machine learning methods in protein sub-Golgi localization prediction.

...read moreread less

113 citations

Journal Article•DOI•

PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.

[...]

Yi Xiong¹, Qiankun Wang¹, Junchen Yang¹, Xiaolei Zhu², Dong-Qing Wei¹ - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, Anhui Agricultural University²

26 Oct 2018-Frontiers in Microbiology

TL;DR: A stacked ensemble model PredT4SE-Stack was developed to predict T4SEs, which utilized an ensemble of base-classifiers implemented by various machine learning algorithms, such as support vector machine, gradient boosting machine, and extremely randomized trees, to generate outputs for the meta-classifier in the classification system.

...read moreread less

Abstract: Gram-negative bacteria use various secretion systems to deliver their secreted effectors. Among them, type IV secretion system exists widely in a variety of bacterial species, and secretes type IV secreted effectors (T4SEs), which play vital roles in host-pathogen interactions. However, experimental approaches to identify T4SEs are time- and resource-consuming. In the present study, we aim to develop an in silico stacked ensemble method to predict whether a protein is an effector of type IV secretion system or not based on its sequence information. The protein sequences were encoded by the feature of position specific scoring matrix (PSSM)-composition by summing rows that correspond to the same amino acid residues in PSSM profiles. Based on the PSSM-composition features, we develop a stacked ensemble model PredT4SE-Stack to predict T4SEs, which utilized an ensemble of base-classifiers implemented by various machine learning algorithms, such as support vector machine, gradient boosting machine, and extremely randomized trees, to generate outputs for the meta-classifier in the classification system. Our results demonstrated that the framework of PredT4SE-Stack was a feasible and effective way to accurately identify T4SEs based on protein sequence information. The datasets and source code of PredT4SE-Stack are freely available at http://xbioinfo.sjtu.edu.cn/PredT4SE_Stack/index.php.

...read moreread less

97 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Collapse

References

PDF

Open Access

More filters

Journal Issue•DOI•

The link-prediction problem for social networks

[...]

David Liben-Nowell¹, Jon Kleinberg²•Institutions (2)

Carleton College¹, Cornell University²

01 May 2007-Journal of the Association for Information Science and Technology

TL;DR: Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.

...read moreread less

Abstract: Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the link-prediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures. © 2007 Wiley Periodicals, Inc.

...read moreread less

4,181 citations

"Prediction and Validation of Diseas..." refers methods in this paper

...[14] introduce the Katz method, which has been successfully applied for link prediction in social networks [15], into the disease genes prediction problem....
[...]

Journal Article•DOI•

Human Protein Reference Database—2009 update

[...]

T. S. Keshava Prasad, Renu Goel, Kumaran Kandasamy¹, Kumaran Kandasamy², Shivakumar Keerthikumar², Sameer Kumar², Suresh Mathivanan², Deepthi Telikicherla², Rajesh Raju², Beema Shafreen, Abhilash K. Venugopal², Lavanya Balakrishnan, Arivusudar Marimuthu¹, Sutopa Banerjee, Devi S. Somanathan, Aimy Sebastian, Sandhya G. Rani, Somak Ray, C. J. Harrys Kishore, Sashi Kanth, Mukhtar Ahmed, Manoj Kumar Kashyap², Manoj Kumar Kashyap¹, Riaz Mohmood², Y. L. Ramachandra², Venkatarangaiah Krishna², B. Abdul Rahiman², Subburaman Mohan, Prathibha Ranganathan, Subhashri Ramabadran, Raghothama Chaerkady¹, Akhilesh Pandey¹ - Show less +28 more•Institutions (2)

Johns Hopkins University¹, Kuvempu University²

01 Jan 2009-Nucleic Acids Research

TL;DR: A number of new features in HPRD are added, including PhosphoMotif Finder, which allows users to find the presence of over 320 experimentally verified phosphorylation motifs in proteins of interest, and a protein distributed annotation system—Human Proteinpedia.

...read moreread less

Abstract: Human Protein Reference Database (HPRD--http://www.hprd.org/), initially described in 2003, is a database of curated proteomic information pertaining to human proteins. We have recently added a number of new features in HPRD. These include PhosphoMotif Finder, which allows users to find the presence of over 320 experimentally verified phosphorylation motifs in proteins of interest. Another new feature is a protein distributed annotation system--Human Proteinpedia (http://www.humanproteinpedia.org/)--through which laboratories can submit their data, which is mapped onto protein entries in HPRD. Over 75 laboratories involved in proteomics research have already participated in this effort by submitting data for over 15,000 human proteins. The submitted data includes mass spectrometry and protein microarray-derived data, among other data types. Finally, HPRD is also linked to a compendium of human signaling pathways developed by our group, NetPath (http://www.netpath.org/), which currently contains annotations for several cancer and immune signaling pathways. Since the last update, more than 5500 new protein sequences have been added, making HPRD a comprehensive resource for studying the human proteome.

...read moreread less

3,081 citations

"Prediction and Validation of Diseas..." refers methods in this paper

...Two different networks HumanNet [18] and HPRD network [19] are used....
[...]

Journal Article•DOI•

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

[...]

Ada Hamosh, Alan F. Scott¹, Joanna S. Amberger¹, Carol A. Bocchini¹, Victor A. McKusick¹ - Show less +1 more•Institutions (1)

Johns Hopkins University School of Medicine¹

01 Jan 2002-Nucleic Acids Research

TL;DR: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics.

...read moreread less

Abstract: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.

...read moreread less

2,715 citations

Journal Article•DOI•

Risks of cancer in BRCA1-mutation carriers

[...]

D Ford, Douglas F. Easton, D. T. Bishop, Steven A. Narod¹, David E. Goldgar² - Show less +1 more•Institutions (2)

Montreal General Hospital¹, University of Utah²

19 Mar 1994-The Lancet

TL;DR: In this paper, the risks of breast and ovarian cancer from the occurrence of second cancers in individuals with breast cancer, and examined the risk of other cancers in BRCA1 carriers.

...read moreread less

1,826 citations

Journal Article•DOI•

Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease.

[...]

David Botstein¹, Neil Risch², Neil Risch¹•Institutions (2)

Stanford University¹, Kaiser Permanente²

01 Mar 2003-Nature Genetics

TL;DR: The distribution of types of mutation in mendelian disease genes argues for serious consideration of the early application of a genomic-scale sequence-based approach to association studies and against complete reliance on a positional cloning approach based on a map of anonymous single nucleotide polymorphism haplotypes.

...read moreread less

Abstract: The past two decades have witnessed an explosion in the identification, largely by positional cloning, of genes associated with mendelian diseases The roughly 1,200 genes that have been characterized have clarified our understanding of the molecular basis of human genetic disease The principles derived from these successes should be applied now to strategies aimed at finding the considerably more elusive genes that underlie complex disease phenotypes The distribution of types of mutation in mendelian disease genes argues for serious consideration of the early application of a genomic-scale sequence-based approach to association studies and against complete reliance on a positional cloning approach based on a map of anonymous single nucleotide polymorphism haplotypes

...read moreread less

1,489 citations