scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

01 Sep 2016-Natural Computing (Springer Netherlands)-Vol. 15, Iss: 3, pp 449-463
TL;DR: Results indicate that the integrated method presented is quite promising and may become a useful tool for identifying disease genes.
Abstract: One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene expression profiles and shortest path analysis of protein---protein interaction networks. While the $$f$$f-information based maximum relevance-maximum significance framework is used to select differentially expressed genes as disease genes using gene expression profiles, the functional protein association network is used to study the mechanism of diseases. An important finding is that some $$f$$f-information measures are shown to be effective for selecting relevant and significant genes from microarray data. Extensive experimental study on colorectal cancer establishes the fact that the genes identified by the integrated method have more colorectal cancer genes than the genes identified from the gene expression profiles alone, irrespective of any gene selection algorithm. Also, these genes have greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. The enrichment analysis of the obtained genes reveals to be associated with some of the important KEGG pathways. All these results indicate that the integrated method is quite promising and may become a useful tool for identifying disease genes.
Citations
More filters
Journal ArticleDOI
TL;DR: A new gene selection algorithm is presented, termed as RelSim, to identify disease genes, that integrates judiciously the information of gene expression profiles and protein-protein interaction networks to compute the functional similarity among genes.

26 citations


Cites background or methods from "Gene expression and protein---prote..."

  • ...Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins RelSim: An integrated method to identify disease genes using gene expression profiles and PPIN based similarity measure Pradipta Maji a , ∗, Ekta Shah a , Sushmita Paul b a Biomedical Imaging and Bioinformatics Lab, Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India b Department of Biology, Indian Institute of Technology, Jodhpur, India a r t i c l e i n f o Article history: Received 31 October 2015 Revised 16 June 2016 Accepted 23 June 2016 Available online 23 June 2016 Keywords: Disease gene identification Microarray data analysis Feature selection Protein-protein interaction networks a b s t r a c t One of the important problems in functional genomics is how to select the disease genes....

    [...]

  • ...In this regard, data integration methods have become popular to identify pleiotropic genes involved in the physiological cellular processes of many diseases [15,22,24,35,45,50] ....

    [...]

  • ...The algorithms compared are MR + PPIN [28] , mRMR + PPIN [22] , MRMS + PPIN [35] , CLAIM [5] , and GenePEN [44] ....

    [...]

  • ...Finally, the performance of the proposed RelSim algorithm is compared with that of MR + PPIN [28] , mRMR + PPIN [22] , MRMS + PPIN [35] , CLAIM [5] , and GenePEN [44] ....

    [...]

  • ...Out of total 25 cases, the proposed method achieves best performance in 12 cases, while MRMS + PPIN [35] , CLAIM [5] , and GenePEN [44] attain it in 2, 5, and 6 cases, respectively....

    [...]

Journal ArticleDOI
TL;DR: Experimental results on three real-life gene expression datasets show that the addition of new objective capturing protein-protein interaction information aids in clustering the genes as compared to the existing techniques.

21 citations


Cites methods from "Gene expression and protein---prote..."

  • ...A new computational method that judiciously integrates the information of gene expression profiles and shortest path analysis of protein interaction networks to identify disease genes is described in (Paul and Maji [46])....

    [...]

  • ...Recently Maji et al. [51] computed the functional similarity between genes using the integrated information of expression profiles and PPI network....

    [...]

Journal ArticleDOI
TL;DR: A novel supervised regularized canonical correlation analysis, termed as CuRSaR, to extract relevant and significant features from multimodal high dimensional omics datasets by maximizing the relevance of extracted features with respect to sample categories and significance among them.
Abstract: Objective: This paper presents a novel supervised regularized canonical correlation analysis, termed as CuRSaR, to extract relevant and significant features from multimodal high dimensional omics datasets. Methods: The proposed method extracts a new set of features from two multidimensional datasets by maximizing the relevance of extracted features with respect to sample categories and significance among them. It integrates judiciously the merits of regularized canonical correlation analysis (RCCA) and rough hypercuboid approach. An analytical formulation, based on spectral decomposition, is introduced to establish the relation between canonical correlation analysis (CCA) and RCCA. The concept of hypercuboid equivalence partition matrix of rough hypercuboid is used to compute both relevance and significance of a feature. Significance: The analytical formulation makes the computational complexity of the proposed algorithm significantly lower than existing methods. The equivalence partition matrix offers an efficient way to find optimum regularization parameters employed in CCA. Results: The superiority of the proposed algorithm over other existing methods, in terms of computational complexity and classification accuracy, is established extensively on real life data.

9 citations

Journal ArticleDOI
TL;DR: An enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules is introduced.
Abstract: The discovery of reliable cancer biomarkers is crucial for accurate early detection and clinical diagnosis. One of the strategies is by identifying expression-based cancer biomarkers through integrative microarray data analysis. Microarray is a powerful high-throughput technology, which allows a genome-wide analysis of human genes with various biological information. Nevertheless, more studies are needed on improving the predictability of the discovered gene biomarkers, as well as their reproducibility and interpretability, to qualify them for clinical use. This paper proposes an informative top-k class associative rule (iTCAR) method in an integrative framework for identifying candidate genes of specific cancers. iTCAR introduces an enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules. A new interestingness measurement is used to rank and select class associative rules for building accurate classifiers. The experimental results show that iTCAR has excellent predictability by achieving the average classification accuracy above 90% and the average area under the curve above 0.8. Besides, iTCAR has significant reproducibility and interpretability through functional enrichment analysis and retrieval of meaningful cancer terms. These promising results suggest the proposed method has great potential in identifying candidate genes, which can be further investigated as biomarkers for cancer diseases.

9 citations

Book ChapterDOI
TL;DR: An existing robust mutual information-based Maximum-Relevance Maximum-Significance algorithm has been used for identification of miRNA-mRNA regulatory modules in gynecologic cancer and the effectiveness of the proposed approach is compared with the existing methods.
Abstract: Dysregulation of miRNA-mRNA regulatory networks is very common phenomenon in any diseases including cancer. Altered expression of biomarkers leads to these gynecologic cancers. Therefore, understanding the underlying biological mechanisms may help in developing a robust diagnostic as well as a prognostic tool. It has been demonstrated in various studies that the pathways associated with gynecologic cancer have dysregulated miRNA as well as mRNA expression. Identification of miRNA-mRNA regulatory modules may help in understanding the mechanism of altered gynecologic cancer pathways. In this regard, an existing robust mutual information-based Maximum-Relevance Maximum-Significance algorithm has been used for identification of miRNA-mRNA regulatory modules in gynecologic cancer. A set of miRNA-mRNA modules are identified first than their association with gynecologic cancer are studied exhaustively. The effectiveness of the proposed approach is compared with the existing methods. The proposed approach is found to generate more robust integrated networks of miRNA-mRNA in gynecologic cancer.

8 citations

References
More filters
Journal ArticleDOI
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.

31,015 citations


"Gene expression and protein---prote..." refers methods in this paper

  • ...4.6 KEGG enrichment analysis The gene set SGEþPPI consisting of twenty candidate genes and seventy seven shortest path genes are further analyzed using the functional annotation tool of David (Huang et al. 2009a)....

    [...]

  • ...These data sets have been widely used in different studies to understand the function of disease genes (Cai et al. 2010; Chen et al. 2010; Huang et al. 2010a, 2011, 2009b, 2010b)....

    [...]

  • ...Moreover, the gene set is found to be highly associated with colorectal cancer disease according to the OMIM disease database as analyzed by the functional annotation tool of David....

    [...]

Journal ArticleDOI
TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Abstract: We consider n points (nodes), some or all pairs of which are connected by a branch; the length of each branch is given. We restrict ourselves to the case where at least one path exists between any two nodes. We now consider two problems. Problem 1. Constrnct the tree of minimum total length between the n nodes. (A tree is a graph with one and only one path between every two nodes.) In the course of the construction that we present here, the branches are subdivided into three sets: I. the branches definitely assignec~ to the tree under construction (they will form a subtree) ; II. the branches from which the next branch to be added to set I, will be selected ; III. the remaining branches (rejected or not yet considered). The nodes are subdivided into two sets: A. the nodes connected by the branches of set I, B. the remaining nodes (one and only one branch of set II will lead to each of these nodes), We start the construction by choosing an arbitrary node as the only member of set A, and by placing all branches that end in this node in set II. To start with, set I is empty. From then onwards we perform the following two steps repeatedly. Step 1. The shortest branch of set II is removed from this set and added to

22,704 citations


"Gene expression and protein---prote..." refers methods in this paper

  • ...The Dijkstra’s algorithm (Dijkstra 1959) is used to construct the shortest paths between a pair of genes selected by the f -MRMS method....

    [...]

  • ...In order to identify the shortest path from each of the selected differentially expressed genes of SGE to remaining genes of the set SGE in the graph, Dijkstra’s algorithm (Dijkstra 1959) is used....

    [...]

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.
Abstract: Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate $t$. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which a procedure with proven FDR control can be offered is greatly increased.

9,335 citations


"Gene expression and protein---prote..." refers methods in this paper

  • ...05) with Benjamin multiple testing correction method (Benjamini and Yekutieli 2001)....

    [...]

  • ...The enrichment p value was corrected to control family-wide false discovery rate under certain rate (e.g., \0.05) with Benjamin multiple testing correction method (Benjamini and Yekutieli 2001)....

    [...]