scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Type 2 Diabetes Gene Identification Using an Integrated Approach from Single-Cell RNA Sequencing Data

01 Dec 2018-pp 2152-2158
TL;DR: The integrated approach is designed by incorporating protein-protein interaction network data and gene expression data to select a set of genes that are highly related to diabetes also they are functionally related among themselves and the effectiveness of the approach is demonstrated over other existing methods.
Abstract: Increase in number of people diagnosed with diabetes makes this disease a new health threat in the 21st century. Understanding the etiology of and finding a way to prevent diabetes, especially type 2 diabetes mellitus, is an urgent challenge for the health care community and our society. Pancreatic islet cells are responsible for maintaining normal blood glucose level and if there is any disturbance that leads to the onset of diabetes. Human pancreatic islet cells contain $\alpha$,$\beta$,$\delta$, and PP cells. Understanding the contribution of each type of cell through gene expression in type 2 diabetes mellitus is very important for the development of diagnostic tools. Therefore, gene expression data of $\alpha$,$\beta$,$\delta$ and PP cells can be used. Single cell RNA sequencing technology has been found useful to generate expression data for individual cells. The gene expression data is usually used to find genes that are related to clinical outcome. However, in a biological process a set of genes are involved that share functional similarity. Analysing only single type of data may not generate significant type 2 diabetes mellitus genes. In this regard, an integrated approach has been used to analyse single-cell RNA sequencing data of human pancreatic islet cells. The integrated approach is designed by incorporating protein-protein interaction network data and gene expression data to select a set of genes that are highly related to diabetes also they are functionally related among themselves. The effectiveness of the approach is demonstrated over other existing methods.
References
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

32,980 citations


"Type 2 Diabetes Gene Identification..." refers methods in this paper

  • ...For pathway enrichment ClueGO app [25] of Cytoscape [26] has been used....

    [...]

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations


"Type 2 Diabetes Gene Identification..." refers methods in this paper

  • ...Table I represents the number of differentially expressed genes in T2DM islet cells selected by the LIMMA R package [23], DESeq2 [18], and RelSim....

    [...]

Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

17,014 citations


"Type 2 Diabetes Gene Identification..." refers methods in this paper

  • ...The DESeq2 algorithm could generate lesser number of genes compared to RelSim except for the δ cell type....

    [...]

  • ...On the other hand, DESeq2 and LIMMA are not associated with any diabetes related diseases....

    [...]

  • ...Both the RelSim and DESeq2 could identify signature genes of pancreatic islet cells, that is, α (GCG), β (INS), δ (SST), and PP(PPY)....

    [...]

  • ...B. Islet Cell Genes Affected by T2D Table I represents the number of differentially expressed genes in T2DM islet cells selected by the LIMMA R package [23], DESeq2 [18], and RelSim....

    [...]

  • ...The DESeq2 is able to generate gene set that are associated with few diabetes related diseases....

    [...]

Journal ArticleDOI
TL;DR: H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.
Abstract: The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

8,224 citations


"Type 2 Diabetes Gene Identification..." refers methods in this paper

  • ...The STRING database [22] has been used as PPIN data....

    [...]