scispace - formally typeset
Open AccessJournal ArticleDOI

Risk prediction of diabetes and pre-diabetes based on physical examination data.

Reads0
Chats0
TLDR
This work collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards.
Abstract
Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.

read more

Citations
More filters
Journal ArticleDOI

Bitter-RF: A random forest machine model for recognizing bitter peptides

TL;DR: In this paper , a Random Forest (RF)-based model, called Bitter-RF, was developed for identifying bitter peptides. But, the model was not used to build a prediction model for the peptide.
Journal ArticleDOI

IBPred: A sequence-based predictor for identifying ion binding protein in phage

TL;DR: Yuan et al. as discussed by the authors proposed a random forest-based model to identify ion binding proteins (IBPs) based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were used to extract features.
Journal ArticleDOI

Cancer classification based on multiple dimensions: SNV patterns

TL;DR: In this paper , the authors defined multidimensional SNV (M-SNV) features to classify cancer and validated the feasibility of these features using a dataset obtained from The Cancer Genome Atlas (TCGA) consisting of 2761 samples from 12 cancers.
Journal ArticleDOI

Nomogram model and risk score to predict 5‐year risk of progression from prediabetes to diabetes in Chinese adults: Development and validation of a novel model

TL;DR: The authors developed a personalized nomogram and risk score to predict the 5-year risk of diabetes among Chinese adults with prediabetes and developed a risk score for each individual with pre-diabetes.
Journal ArticleDOI

Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis

TL;DR: DFRPI as discussed by the authors is a novel method based on deep autoencoder and marginal fisher analysis to detect long non-coding RNAs (lncRNAs)-protein interactions.
References
More filters
Journal ArticleDOI

Predicting Diabetes Mellitus With Machine Learning Techniques.

TL;DR: The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used and principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) was used to reduce the dimensionality.
Journal ArticleDOI

XGBoost Model for Chronic Kidney Disease Diagnosis

TL;DR: The set-theory based rule is presented which combines a few feature selection methods with their collective strengths and the reduced model using about a half of the original full features performs better than the models based on individual feature selection method and achieves accuracy, sensitivity, and specificity.
Journal ArticleDOI

HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation

TL;DR: Performance comparisons over empirical cross-validation analysis, independent test, and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity.
Journal ArticleDOI

Optimizing Survival Analysis of XGBoost for Ties to Predict Disease Progression of Breast Cancer

TL;DR: The proposed EXSA method can provide an important means for follow-up data of breast cancer or other disease research and can be utilized as an effective method for survival analysis.
Journal ArticleDOI

Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework

TL;DR: The novel approach provides necessary interpretations that help understanding model success by leveraging the powerful SHapley Additive exPlanation algorithm, thus underlining the most important feature encoding schemes significant for predicting cell-specific ORIs.
Related Papers (5)
Trending Questions (1)
What is caused of diabetes diabetes from sample claims data?

The paper does not provide information about the causes of diabetes from sample claims data. The paper focuses on risk prediction of diabetes and pre-diabetes based on physical examination data.