scispace - formally typeset
Search or ask a question

Showing papers on "Brier score published in 2022"


Journal ArticleDOI
TL;DR: In this paper , the authors discuss the development of prognostic machine learning (ML) models for COVID-19 progression, by focusing on the task of predicting ICU admission within (any of) the next 5 days.
Abstract: In this article, we discuss the development of prognostic machine learning (ML) models for COVID-19 progression, by focusing on the task of predicting ICU admission within (any of) the next 5 days. On the basis of 6,625 complete blood count (CBC) tests from 1,004 patients, of which 18% were admitted to intensive care unit (ICU), we created four ML models, by adopting a robust development procedure which was designed to minimize risks of bias and over-fitting, according to reference guidelines. The best model, a support vector machine, had an AUC of .85, a Brier score of .14, and a standardized net benefit of .69: these scores indicate that the model performed well over a variety of prediction criteria. We also conducted an interpretability study to back up our findings, showing that the data on which the developed model is based is consistent with the current medical literature. This also demonstrates that CBC data and ML methods can be used to predict COVID-19 patients' ICU admission at a relatively low cost: in particular, since CBC data can be quickly obtained by means of routine blood exams, our models could be used in resource-constrained settings and provide health practitioners with rapid and reliable indications.

15 citations


Journal ArticleDOI
TL;DR: In this article , the Parimutuel Gambling score, proposed and in some cases applied, as a metric for comparing probabilistic seismicity forecasts, is in general "improper" and can still be used improperly.
Abstract: SUMMARY Operational earthquake forecasting for risk management and communication during seismic sequences depends on our ability to select an optimal forecasting model. To do this, we need to compare the performance of competing models in prospective experiments, and to rank their performance according to the outcome using a fair, reproducible and reliable method, usually in a low-probability environment. The Collaboratory for the Study of Earthquake Predictability conducts prospective earthquake forecasting experiments around the globe. In this framework, it is crucial that the metrics used to rank the competing forecasts are ‘proper’, meaning that, on average, they prefer the data generating model. We prove that the Parimutuel Gambling score, proposed, and in some cases applied, as a metric for comparing probabilistic seismicity forecasts, is in general ‘improper’. In the special case where it is proper, we show it can still be used improperly. We demonstrate the conclusions both analytically and graphically providing a set of simulation based techniques that can be used to assess if a score is proper or not. They only require a data generating model and, at least two forecasts to be compared. We compare the Parimutuel Gambling score’s performance with two commonly used proper scores (the Brier and logarithmic scores) using confidence intervals to account for the uncertainty around the observed score difference. We suggest that using confidence intervals enables a rigorous approach to distinguish between the predictive skills of candidate forecasts, in addition to their rankings. Our analysis shows that the Parimutuel Gambling score is biased, and the direction of the bias depends on the forecasts taking part in the experiment. Our findings suggest the Parimutuel Gambling score should not be used to distinguishing between multiple competing forecasts, and for care to be taken in the case where only two are being compared.

15 citations


Journal ArticleDOI
TL;DR: In this paper , the SORG machine learning model has been developed and successfully tested using 5,413 patients from the United States (US) to predict the risk of prolonged opioid prescription after surgery for lumbar disc herniation.

12 citations


Journal ArticleDOI
TL;DR: In this article , the authors compared different machine learning algorithms using nested cross-validation, evaluate their benefit in naturalistic settings, and identify the best model as well as the most important variables.
Abstract: Background About 30% of patients drop out of cognitive–behavioural therapy (CBT), which has implications for psychiatric and psychological treatment. Findings concerning drop out remain heterogeneous. Aims This paper aims to compare different machine-learning algorithms using nested cross-validation, evaluate their benefit in naturalistic settings, and identify the best model as well as the most important variables. Method The data-set consisted of 2543 out-patients treated with CBT. Assessment took place before session one. Twenty-one algorithms and ensembles were compared. Two parameters (Brier score, area under the curve (AUC)) were used for evaluation. Results The best model was an ensemble that used Random Forest and nearest-neighbour modelling. During the training process, it was significantly better than generalised linear modelling (GLM) (Brier score: d = –2.93, 95% CI (−3.95, −1.90)); AUC: d = 0.59, 95% CI (0.11 to 1.06)). In the holdout sample, the ensemble was able to correctly identify 63.4% of cases of patients, whereas the GLM only identified 46.2% correctly. The most important predictors were lower education, lower scores on the Personality Style and Disorder Inventory (PSSI) compulsive scale, younger age, higher scores on the PSSI negativistic and PSSI antisocial scale as well as on the Brief Symptom Inventory (BSI) additional scale (mean of the four additional items) and BSI overall scale. Conclusions Machine learning improves drop-out predictions. However, not all algorithms are suited to naturalistic data-sets and binary events. Tree-based and boosted algorithms including a variable selection process seem well-suited, whereas more advanced algorithms such as neural networks do not.

11 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the model performance of different machine learning-based algorithms that incorporate time-to-event data, including DeepSurv, DeepHit, neural net-extended time-dependent cox model (Cox-Time), and random survival forest (RSF).

11 citations


Journal ArticleDOI
TL;DR: In this paper , the authors compared the model performance of different machine learning-based algorithms that incorporate time-to-event data, including DeepSurv, DeepHit, neural net-extended time-dependent cox model (Cox-Time), and random survival forest (RSF).

10 citations


Journal ArticleDOI
01 Mar 2022
TL;DR: In this article , the authors developed a machine learning algorithm and clinician-friendly tool to predict the likelihood of prolonged opioid use following hip arthroscopy following hip surgery, using data from the Military Data Repository.
Abstract: To develop a machine-learning algorithm and clinician-friendly tool predicting the likelihood of prolonged opioid use (>90 days) following hip arthroscopy.The Military Data Repository was queried for all adult patients undergoing arthroscopic hip surgery between 2012 and 2017. Demographic, health history, and prescription records were extracted for all included patients. Opioid use was divided into preoperative use (30-365 days before surgery), perioperative use (30 days before surgery through 14 days after surgery), postoperative use (14-90 days after surgery), and prolonged postoperative use (90-365 days after surgery). Six machine-learning algorithms (Naïve Bayes, Gradient Boosting Machine, Extreme Gradient Boosting, Random Forest, Elastic Net Regularization, and artificial neural network) were developed. Area under the receiver operating curve and Brier scores were calculated for each model. Decision curve analysis was applied to assess clinical utility. Local-Interpretable Model-Agnostic Explanations were used to demonstrate factor weights within the selected model.A total of 6,760 patients were included, of whom 2,762 (40.9%) filled at least 1 opioid prescription >90 days after surgery. The artificial neural network model showed superior discrimination and calibration with area under the receiver operating curve = 0.71 (95% confidence interval 0.68-0.74) and Brier score = 0.21 (95% confidence interval 0.20-0.22). Postsurgical opioid use, age, and preoperative opioid use had the most influence on model outcome. Lesser factors included the presence of a psychological comorbidity and strong history of a substance use disorder.The artificial neural network model shows sufficient validity and discrimination for use in clinical practice. The 5 identified factors (age, preoperative opioid use, postoperative opioid use, presence of a mental health comorbidity, and presence of a preoperative substance use disorder) accurately predict the likelihood of prolonged opioid use following hip arthroscopy.III, retrospective comparative prognostic trial.

10 citations


Journal ArticleDOI
TL;DR: This study aimed to develop a machine learning algorithm model to predict lung metastasis of thyroid cancer for providing relative information in clinical decision‐making.
Abstract: Lung metastasis (LM) is one of the most frequent distant metastases of thyroid cancer (TC). This study aimed to develop a machine learning algorithm model to predict lung metastasis of thyroid cancer for providing relative information in clinical decision‐making.

10 citations


Journal ArticleDOI
TL;DR: The QCOVID algorithm developed in England can be used for public health risk management for the adult Welsh population, and it fitted the Welsh data and population well.
Abstract: Abstract Introduction COVID-19 risk prediction algorithms can be used to identify at-risk individuals from short-term serious adverse COVID-19 outcomes such as hospitalisation and death. It is important to validate these algorithms in different and diverse populations to help guide risk management decisions and target vaccination and treatment programs to the most vulnerable individuals in society. Objectives To validate externally the QCOVID risk prediction algorithm that predicts mortality outcomes from COVID-19 in the adult population of Wales, UK. Methods We conducted a retrospective cohort study using routinely collected individual-level data held in the Secure Anonymised Information Linkage (SAIL) Databank. The cohort included individuals aged between 19 and 100 years, living in Wales on 24th January 2020, registered with a SAIL-providing general practice, and followed-up to death or study end (28th July 2020). Demographic, primary and secondary healthcare, and dispensing data were used to derive all the predictor variables used to develop the published QCOVID algorithm. Mortality data were used to define time to confirmed or suspected COVID-19 death. Performance metrics, including R2 values (explained variation), Brier scores, and measures of discrimination and calibration were calculated for two periods (24th January–30th April 2020 and 1st May–28th July 2020) to assess algorithm performance. Results 1,956,760 individuals were included. 1,192 (0.06%) and 610 (0.03%) COVID-19 deaths occurred in the first and second time periods, respectively. The algorithms fitted the Welsh data and population well, explaining 68.8% (95% CI: 66.9-70.4) of the variation in time to death, Harrell’s C statistic: 0.929 (95% CI: 0.921-0.937) and D statistic: 3.036 (95% CI: 2.913-3.159) for males in the first period. Similar results were found for females and in the second time period for both sexes. Conclusions The QCOVID algorithm developed in England can be used for public health risk management for the adult Welsh population.

9 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed three models to identify subjects most at risk of an imminent fracture, according to fracture site (any fracture, major osteoporotic fracture [MOF] or central).
Abstract: Patients who sustain a fracture are at greatest risk of recurrent fracture during the next 2 years. We propose three models to identify subjects most at risk of an imminent fracture, according to fracture site (any fracture, major osteoporotic fracture [MOF] or central). They were constructed using data of the prospective Frisbee cohort, which includes 3560 postmenopausal women aged 60 to 85 years who were followed for at least 5 years. A total of 881 subjects had a first incident validated fragility fracture before December 2018. Among these, we validated 130 imminent fractures occurring within the next 2 years; 79 were MOFs, and 88 were central fractures. Clinical risk factors were re-evaluated at the time of the index fracture. Fine and Gray proportional hazard models were derived separately for each group of fractures. The following risk factors were significantly associated with the risk of any imminent fracture: total hip bone mineral density (BMD) (p < 0.001), a fall history (p < 0.001), and comorbidities (p = 0.03). Age (p = 0.05 and p = 0.03, respectively) and a central fracture as the index fracture (p = 0.04 and p = 0.005, respectively) were additional predictors of MOFs and central fractures. The three prediction models are presented as nomograms. The calibration curves and the Brier scores based on bootstrap resampling showed calibration scores of 0.089 for MOF, 0.094 for central fractures, and 0.132 for any fractures. The predictive accuracy of the models expressed as area under the receiver operating characteristic (AUROC) curve (AUC) were 0.74 for central fractures, 0.72 for MOFs, and 0.66 for all fractures, respectively. These AUCs compare well with those of FRAX and Garvan to predict the 5- or 10-year fracture probability. In summary, five predictors (BMD, age, comorbidities, falls, and central fracture as the incident fracture) allow the calculation with a reasonable accuracy of the imminent risk of fracture at different sites (MOF, central fracture, and any fracture) after a recent sentinel fracture. © 2021 American Society for Bone and Mineral Research (ASBMR).

9 citations


Journal ArticleDOI
TL;DR: This is the first dynamic, preoperative and postoperative predictive model constructed for AIS patients who underwent MT, which is more accurate than the previous prediction model.
Abstract: The unfavorable outcome of acute ischemic stroke (AIS) with large vessel occlusion (LVO) is related to clinical factors at multiple time points. However, predictive models used for dynamically predicting unfavorable outcomes using clinically relevant preoperative and postoperative time point variables have not been developed. Our goal was to develop a machine learning (ML) model for the dynamic prediction of unfavorable outcomes. We retrospectively reviewed patients with AIS who underwent a consecutive mechanical thrombectomy (MT) from three centers in China between January 2014 and December 2018. Based on the eXtreme gradient boosting (XGBoost) algorithm, we used clinical characteristics on admission (“Admission” Model) and additional variables regarding intraoperative management and the postoperative National Institute of Health stroke scale (NIHSS) score (“24-Hour” Model, “3-Day” Model and “Discharge” Model). The outcome was an unfavorable outcome at the three-month mark (modified Rankin scale, mRS 3–6: unfavorable). The area under the receiver operating characteristic curve and Brier scores were the main evaluating indexes. The unfavorable outcome at the three-month mark was observed in 156 (62.0%) of 238 patients. These four models had a high accuracy in the range of 75.0% to 87.5% and had a good discrimination with AUC in the range of 0.824 to 0.945 on the testing set. The Brier scores of the four models ranged from 0.122 to 0.083 and showed a good predictive ability on the testing set. This is the first dynamic, preoperative and postoperative predictive model constructed for AIS patients who underwent MT, which is more accurate than the previous prediction model. The preoperative model could be used to predict the clinical outcome before MT and support the decision to perform MT, and the postoperative models would further improve the predictive accuracy of the clinical outcome after MT and timely adjust therapeutic strategies.

Journal ArticleDOI
TL;DR: The presented model requires only variables routinely acquired in hospitals, which allows immediate and wide-spread use as a decision support for earlier discharge of low-risk patients to reduce the burden on the health care system.
Abstract: Objective To develop and validate a prognostic model for in-hospital mortality after four days based on age, fever at admission and five haematological parameters routinely measured in hospitalized Covid-19 patients during the first four days after admission. Methods Haematological parameters measured during the first 4 days after admission were subjected to a linear mixed model to obtain patient-specific intercepts and slopes for each parameter. A prediction model was built using logistic regression with variable selection and shrinkage factor estimation supported by bootstrapping. Model development was based on 481 survivors and 97 non-survivors, hospitalized before the occurrence of mutations. Internal validation was done by 10-fold cross-validation. The model was temporally-externally validated in 299 survivors and 42 non-survivors hospitalized when the Alpha variant (B.1.1.7) was prevalent. Results The final model included age, fever on admission as well as the slope or intercept of lactate dehydrogenase, platelet count, C-reactive protein, and creatinine. Tenfold cross validation resulted in a mean area under the receiver operating characteristic curve (AUROC) of 0.92, a mean calibration slope of 1.0023 and a Brier score of 0.076. At temporal-external validation, application of the previously developed model showed an AUROC of 0.88, a calibration slope of 0.95 and a Brier score of 0.073. Regarding the relative importance of the variables, the (apparent) variation in mortality explained by the six variables deduced from the haematological parameters measured during the first four days is higher (explained variation 0.295) than that of age (0.210). Conclusions The presented model requires only variables routinely acquired in hospitals, which allows immediate and wide-spread use as a decision support for earlier discharge of low-risk patients to reduce the burden on the health care system. Clinical Trial Registration Austrian Coronavirus Adaptive Clinical Trial (ACOVACT); ClinicalTrials.gov, identifier NCT04351724.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed a predictive model for hepatocellular carcinoma (HCC) diagnosis using personalized biological pathways combined with a machine learning algorithm based on regularized regression and carry out relevant examinations.
Abstract: Abstract Background At present, the diagnostic ability of hepatocellular carcinoma (HCC) based on serum alpha-fetoprotein level is limited. Finding markers that can effectively distinguish cancer and non-cancerous tissues is important for improving the diagnostic efficiency of HCC. Results In this study, we developed a predictive model for HCC diagnosis using personalized biological pathways combined with a machine learning algorithm based on regularized regression and carry out relevant examinations. In two training sets, the overall cross-study-validated area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve and the Brier score of the diagnostic model were 0.987 [95%confidence interval (CI): 0.979–0.996], 0.981 and 0.091, respectively. Besides, the model showed good transferability in external validation set. In TCGA-LIHC cohort, the AUROC, AURPC and Brier score were 0.992 (95%CI: 0.985–0.998), 0.967 and 0.112, respectively. The diagnostic model has accomplished very impressive performance in distinguishing HCC from non-cancerous liver tissues. Moreover, we further analyzed the extracted biological pathways to explore molecular features and prognostic factors. The risk score generated from a 12-gene signature extracted from the characteristic pathways was correlated with some immune related pathways and served as an independent prognostic factor for HCC. Conclusion We used personalized biological pathways analysis and machine learning algorithm to construct a highly accurate HCC diagnostic model. The excellent interpretable performance and good transferability of this model enables it with great potential for personalized medicine, which can assist clinicians in diagnosis for HCC patients.

Journal ArticleDOI
TL;DR: In this article , three machine learning algorithms, namely, k-nearest neighbor, support vector machine, and random forest, were used to predict the default probability of online loan borrowers and compared their prediction performance with that of a logistic model.

Journal ArticleDOI
TL;DR: In this paper , a decision rule was proposed to initiate advanced imaging among patients with negative radiographs, which would yield 100% sensitivity, 38% specificity, and would have reduced the number of patients undergoing advanced imaging by 36% without missing a fracture.

Journal ArticleDOI
TL;DR: In this article, three machine learning algorithms, namely, k-nearest neighbor, support vector machine, and random forest, were used to predict the default probability of online loan borrowers and compared their prediction performance with that of a logistic model.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed and internally validated a clinical prediction model using machine learning (ML) algorithms for 90 day and 2 year mortality in femoral neck fracture patients aged 65 years or above.
Abstract: Abstract Purpose Preoperative prediction of mortality in femoral neck fracture patients aged 65 years or above may be valuable in the treatment decision-making. A preoperative clinical prediction model can aid surgeons and patients in the shared decision-making process, and optimize care for elderly femoral neck fracture patients. This study aimed to develop and internally validate a clinical prediction model using machine learning (ML) algorithms for 90 day and 2 year mortality in femoral neck fracture patients aged 65 years or above. Methods A retrospective cohort study at two trauma level I centers and three (non-level I) community hospitals was conducted to identify patients undergoing surgical fixation for a femoral neck fracture. Five different ML algorithms were developed and internally validated and assessed by discrimination, calibration, Brier score and decision curve analysis. Results In total, 2478 patients were included with 90 day and 2 year mortality rates of 9.1% ( n = 225) and 23.5% ( n = 582) respectively. The models included patient characteristics, comorbidities and laboratory values. The stochastic gradient boosting algorithm had the best performance for 90 day mortality prediction, with good discrimination (c-statistic = 0.74), calibration (intercept = − 0.05, slope = 1.11) and Brier score (0.078). The elastic-net penalized logistic regression algorithm had the best performance for 2 year mortality prediction, with good discrimination (c-statistic = 0.70), calibration (intercept = − 0.03, slope = 0.89) and Brier score (0.16). The models were incorporated into a freely available web-based application, including individual patient explanations for interpretation of the model to understand the reasoning how the model made a certain prediction: https://sorg-apps.shinyapps.io/hipfracturemortality/ Conclusions The clinical prediction models show promise in estimating mortality prediction in elderly femoral neck fracture patients. External and prospective validation of the models may improve surgeon ability when faced with the treatment decision-making. Level of evidence Prognostic Level II.

Journal ArticleDOI
TL;DR: In this paper , the 4C Deterioration Model and 4C Mortality Score were used to predict the deterioration and mortality risk in COVID-19 patients and evaluated whether the inclusion of the neutrophil-to-lymphocyte ratio (NLR) improves the predictive performance of the models.
Abstract: Prognostic models to predict the deterioration and mortality risk in COVID-19 patients are utterly needed to assist in informed decision making. Most of these models, however, are at high risk of bias, model overfitting, and unclear reporting. Here, we aimed to externally validate the modified (urea was omitted) 4C Deterioration Model and 4C Mortality Score in a cohort of Swiss COVID-19 patients and, second, to evaluate whether the inclusion of the neutrophil-to-lymphocyte ratio (NLR) improves the predictive performance of the models. We conducted a retrospective single-centre study with adult patients hospitalized with COVID-19. Both prediction models were updated by including the NLR. Model performance was assessed via the models’ discriminatory performance (area under the curve, AUC), calibration (intercept and slope), and their performance overall (Brier score). For the validation of the 4C Deterioration Model and Mortality Score, 546 and 527 patients were included, respectively. In total, 133 (24.4%) patients met the definition of in-hospital deterioration. Discrimination of the 4C Deterioration Model was AUC = 0.78 (95% CI 0.73–0.82). A total of 55 (10.44%) patients died in hospital. Discrimination of the 4C Mortality Score was AUC = 0.85 (95% CI 0.79–0.89). There was no evidence for an incremental value of the NLR. Our data confirm the role of the modified 4C Deterioration Model and Mortality Score as reliable prediction tools for the risk of deterioration and mortality. There was no evidence that the inclusion of NLR improved model performance.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed a predictive model for hepatocellular carcinoma (HCC) diagnosis using personalized biological pathways combined with a machine learning algorithm based on regularized regression and carry out relevant examinations.
Abstract: Abstract Background At present, the diagnostic ability of hepatocellular carcinoma (HCC) based on serum alpha-fetoprotein level is limited. Finding markers that can effectively distinguish cancer and non-cancerous tissues is important for improving the diagnostic efficiency of HCC. Results In this study, we developed a predictive model for HCC diagnosis using personalized biological pathways combined with a machine learning algorithm based on regularized regression and carry out relevant examinations. In two training sets, the overall cross-study-validated area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve and the Brier score of the diagnostic model were 0.987 [95%confidence interval (CI): 0.979–0.996], 0.981 and 0.091, respectively. Besides, the model showed good transferability in external validation set. In TCGA-LIHC cohort, the AUROC, AURPC and Brier score were 0.992 (95%CI: 0.985–0.998), 0.967 and 0.112, respectively. The diagnostic model has accomplished very impressive performance in distinguishing HCC from non-cancerous liver tissues. Moreover, we further analyzed the extracted biological pathways to explore molecular features and prognostic factors. The risk score generated from a 12-gene signature extracted from the characteristic pathways was correlated with some immune related pathways and served as an independent prognostic factor for HCC. Conclusion We used personalized biological pathways analysis and machine learning algorithm to construct a highly accurate HCC diagnostic model. The excellent interpretable performance and good transferability of this model enables it with great potential for personalized medicine, which can assist clinicians in diagnosis for HCC patients.

Journal ArticleDOI
TL;DR: Novel machine learning algorithms were developed that leveraged preoperative demographic, clinical, and imaging-based features to reliably predict clinically meaningful improvement after hip arthroscopy for FAIS.
Abstract: Background: The International Hip Outcome Tool 12-Item Questionnaire (IHOT-12) has been proposed as a more appropriate outcome assessment for hip arthroscopy populations. The extent to which preoperative patient factors predict achieving clinically meaningful outcomes among patients undergoing hip arthroscopy for femoroacetabular impingement syndrome (FAIS) remains poorly understood. Purpose: To determine the predictive relationship of preoperative imaging, patient-reported outcome measures, and patient demographics with achievement of the minimal clinically important difference (MCID), Patient Acceptable Symptom State (PASS), and substantial clinical benefit (SCB) for the IHOT-12 at a minimum of 2 years postoperatively. Study Design: Case-control study; Level of evidence, 3. Methods: Data were analyzed for consecutive patients who underwent hip arthroscopy for FAIS between 2012 and 2018 and completed the IHOT-12 preoperatively and at a minimum of 2 years postoperatively. Fifteen novel machine learning algorithms were developed using 47 potential demographic, clinical, and radiographic predictors. Model performance was evaluated with discrimination, calibration, decision-curve analysis and the brier score. Results: A total of 859 patients were identified, with 685 (79.7%) achieving the MCID, 535 (62.3%) achieving the PASS, and 498 (58.0%) achieving the SCB. For predicting the MCID, discrimination for the best-performing models ranged from fair to excellent (area under the curve [AUC], 0.69-0.89), although calibration was excellent (calibration intercept and slopes: –0.06 to 0.02 and 0.24 to 0.85, respectively). For predicting the PASS, discrimination for the best-performing models ranged from fair to excellent (AUC, 0.63-0.81), with excellent calibration (calibration intercept and slopes: 0.03-0.18 and 0.52-0.90, respectively). For predicting the SCB, discrimination for the best-performing models ranged from fair to good (AUC, 0.61-0.77), with excellent calibration (calibration intercept and slopes: –0.08 to 0.00 and 0.56 to 1.02, respectively). Thematic predictors for failing to achieve the MCID, PASS, and SCB were presence of back pain, anxiety/depression, chronic symptom duration, preoperative hip injections, and increasing body mass index (BMI). Specifically, thresholds associated with lower likelihood to achieve a clinically meaningful outcome were preoperative Hip Outcome Score–Activities of Daily Living <55, preoperative Hip Outcome Score–Sports Subscale >55.6, preoperative IHOT-12 score ≥48.5, preoperative modified Harris Hip Score ≤51.7, age >41 years, BMI ≥27, and preoperative α angle >76.6°. Conclusion: We developed novel machine learning algorithms that leveraged preoperative demographic, clinical, and imaging-based features to reliably predict clinically meaningful improvement after hip arthroscopy for FAIS. Despite consistent improvements after hip arthroscopy, meaningful improvements are negatively influenced by greater BMI, back pain, chronic symptom duration, preoperative mental health, and use of hip corticosteroid injections.

Journal ArticleDOI
TL;DR: The language-based suicide risk model performed with good discrimination when identifying the language of suicidal patients from a different part of the US and at a later time period than when the model was originally developed and trained.
Abstract: Background Emergency departments (ED) are an important intercept point for identifying suicide risk and connecting patients to care, however, more innovative, person-centered screening tools are needed. Natural language processing (NLP) -based machine learning (ML) techniques have shown promise to assess suicide risk, although whether NLP models perform well in differing geographic regions, at different time periods, or after large-scale events such as the COVID-19 pandemic is unknown. Objective To evaluate the performance of an NLP/ML suicide risk prediction model on newly collected language from the Southeastern United States using models previously tested on language collected in the Midwestern US. Method 37 Suicidal and 33 non-suicidal patients from two EDs were interviewed to test a previously developed suicide risk prediction NLP/ML model. Model performance was evaluated with the area under the receiver operating characteristic curve (AUC) and Brier scores. Results NLP/ML models performed with an AUC of 0.81 (95% CI: 0.71–0.91) and Brier score of 0.23. Conclusion The language-based suicide risk model performed with good discrimination when identifying the language of suicidal patients from a different part of the US and at a later time period than when the model was originally developed and trained.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors compared the utility of the Freiburg index of post-TIPS survival (FIPS) score for predicting survival after TIPS placement in a cohort of Chinese patients with cirrhosis.
Abstract: BACKGROUND. Various prognostic scores for patients with chronic liver disease have been applied for predicting survival after TIPS placement. In 2021, the Freiburg index of post-TIPS survival (FIPS) score was developed specifically for predicting survival after TIPS placement. The score has exhibited variable performance in initial investigations conducted in German and U.S. cohorts. OBJECTIVE. The purpose of this study was to compare the utility of the FIPS score and traditional scoring systems for predicting post-TIPS survival in a cohort of Chinese patients with cirrhosis. METHODS. This retrospective validation study compared four prognostic scores (model for end-stage liver disease [MELD], sodium MELD [MELD-Na], Chronic Liver Failure Consortium acute decompensation [CLIF-C AD], and FIPS) in 383 patients (mean age, 54.9 ± 11.7 years; 249 men, 134 women) with cirrhosis who underwent TIPS placement (341 for variceal bleeding, 42 for refractory ascites) at Wuhan Union Hospital between January 2016 and August 2021. Model performance was assessed in terms of discrimination (using concordance index) and calibration (using Brier score and observed-to-predicted ratios) for 6-, 12-, and 24-month post-TIPS survival. Discrimination was further stratified by TIPS indication. Risk stratification was performed using previously proposed cutoffs for each score. RESULTS. During postprocedural follow-up, 72 (18.8%) patients died. Discriminative performance for 6-month survival was highest for FIPS score (concordance index, 0.784), followed by CLIF-C AD (0.743), MELD-Na (0.699), and MELD (0.694). FIPS score also showed the highest calibration in terms of lower Brier scores and observed-to-predicted ratios closer to 1 and showed the strongest prognostic performance for 12- and 24-month survival and in subgroups of patients who underwent TIPS placement for either variceal bleeding or refractory ascites (except for similar performance of FIPS and CLIF-C AD in the refractory ascites subgroup). When prior cutoffs were applied, further application of FIPS score was significantly associated with survival among patients classified as low risk by the other scores. CONCLUSION. FIPS score outperformed traditional risk scores in predicting post-TIPS survival in patients with cirrhosis. CLINICAL IMPACT. The findings support utility of FIPS score in differentiating patients who are optimal candidates for TIPS placement versus those at high risk who may instead warrant close monitoring and early liver transplant.

Journal ArticleDOI
TL;DR: In this paper , the authors developed ML algorithms to predict 7-day and 30-day mortality in patients with acute decompensated heart failure and compared these with an existing logistic regression model at the same timepoints.

Journal ArticleDOI
TL;DR: In this article , the authors compared the performance of Cox regression models versus Random Survival Forest (RSF) to predict suicidal behavior in more than 300 high-risk suicidal patients from a multicenter prospective cohort study.

Journal ArticleDOI
TL;DR: Time-to-event prediction models based on deep learning algorithms are successful in predicting chondrosarcoma prognosis, with DeepSurv producing the best discriminative performance and calibration.
Abstract: Background Accurate prediction of prognosis is critical for therapeutic decisions in chondrosarcoma patients. Several prognostic models have been created utilizing multivariate Cox regression or binary classification-based machine learning approaches to predict the 3- and 5-year survival of patients with chondrosarcoma, but few studies have investigated the results of combining deep learning with time-to-event prediction. Compared with simplifying the prediction as a binary classification problem, modeling the probability of an event as a function of time by combining it with deep learning can provide better accuracy and flexibility. Materials and methods Patients with the diagnosis of chondrosarcoma between 2000 and 2018 were extracted from the Surveillance, Epidemiology, and End Results (SEER) registry. Three algorithms—two based on neural networks (DeepSurv, neural multi-task logistic regression [NMTLR]) and one on ensemble learning (random survival forest [RSF])—were selected for training. Meanwhile, a multivariate Cox proportional hazards (CoxPH) model was also constructed for comparison. The dataset was randomly divided into training and testing datasets at a ratio of 7:3. Hyperparameter tuning was conducted through a 1000-repeated random search with 5-fold cross-validation on the training dataset. The model performance was assessed using the concordance index (C-index), Brier score, and Integrated Brier Score (IBS). The accuracy of predicting 1-, 3-, 5- and 10-year survival was evaluated using receiver operating characteristic curves (ROC), calibration curves, and the area under the ROC curves (AUC). Results A total of 3145 patients were finally enrolled in our study. The mean age at diagnosis was 52 ± 18 years, 1662 of the 3145 patients were male (53%), and mean survival time was 83 ± 67 months. Two deep learning models outperformed the RSF and classical CoxPH models, with the C-index on test datasets achieving values of 0.832 (DeepSurv) and 0.821 (NMTLR). The DeepSurv model produced better accuracy and calibrated survival estimates in predicting 1-, 3- 5- and 10-year survival (AUC:0.895-0.937). We deployed the DeepSurv model as a web application for use in clinical practice; it can be accessed through https://share.streamlit.io/whuh-ml/chondrosarcoma/Predict/app.py. Conclusions Time-to-event prediction models based on deep learning algorithms are successful in predicting chondrosarcoma prognosis, with DeepSurv producing the best discriminative performance and calibration.

Journal ArticleDOI
TL;DR: In this paper , the authors applied predictive machine learning algorithms to anonymized, patient-level HIV programmatic data from two districts in South Africa, 2016-2018, and developed patient risk scores for two outcomes: (1) visit attendance ≤ 28 days of the next scheduled clinic visit and (2) suppression of HIV viral load (VL).
Abstract: HIV treatment programs face challenges in identifying patients at risk for loss-to-follow-up and uncontrolled viremia. We applied predictive machine learning algorithms to anonymised, patient-level HIV programmatic data from two districts in South Africa, 2016-2018. We developed patient risk scores for two outcomes: (1) visit attendance ≤ 28 days of the next scheduled clinic visit and (2) suppression of the next HIV viral load (VL). Demographic, clinical, behavioral and laboratory data were investigated in multiple models as predictor variables of attending the next scheduled visit and VL results at the next test. Three classification algorithms (logistical regression, random forest and AdaBoost) were evaluated for building predictive models. Data were randomly sampled on a 70/30 split into a training and test set. The training set included a balanced set of positive and negative examples from which the classification algorithm could learn. The predictor variable data from the unseen test set were given to the model, and each predicted outcome was scored against known outcomes. Finally, we estimated performance metrics for each model in terms of sensitivity, specificity, positive and negative predictive value and area under the curve (AUC). In total, 445,636 patients were included in the retention model and 363,977 in the VL model. The predictive metric (AUC) ranged from 0.69 for attendance at the next scheduled visit to 0.76 for VL suppression, suggesting that the model correctly classified whether a scheduled visit would be attended in 2 of 3 patients and whether the VL result at the next test would be suppressed in approximately 3 of 4 patients. Variables that were important predictors of both outcomes included prior late visits, number of prior VL tests, time since their last visit, number of visits on their current regimen, age, and treatment duration. For retention, the number of visits at the current facility and the details of the next appointment date were also predictors, while for VL suppression, other predictors included the range of the previous VL value. Machine learning can identify HIV patients at risk for disengagement and unsuppressed VL. Predictive modeling can improve the targeting of interventions through differentiated models of care before patients disengage from treatment programmes, increasing cost-effectiveness and improving patient outcomes.

Journal ArticleDOI
TL;DR: In this article , the performance of machine learning (ML) framework to predict recurrence after renal cell carcinoma (RCC) surgery and compare them with current validated models was investigated.

Journal ArticleDOI
TL;DR: In this article , the Ada model performed best in predicting 4-year mortality after cardiac surgery among the eight ML models, which might have significant application in the development of early warning systems for patients following operations.
Abstract: Objective: This study aims to construct and validate several machine learning (ML) algorithms to predict long-term mortality and identify risk factors in unselected patients post-cardiac surgery. Methods The Medical Information Mart for Intensive Care (MIMIC-III) database was used to perform a retrospective administrative database study. Candidate predictors consisted of the demographics, comorbidity, vital signs, laboratory test results, scoring systems, and treatment information on the first day of ICU admission. Four-year mortality was set as the study outcome. We used the ML methods of logistic regression (LR), artificial neural network (NNET), naïve bayes (NB), gradient boosting machine (GBM), adapting boosting (Ada), random forest (RF), bagged trees (BT), and eXtreme Gradient Boosting (XGB). The prognostic capacity and clinical utility of these ML models were compared using the area under the receiver operating characteristic curves (AUC), calibration curves, and decision curve analysis (DCA). Results Of 7,368 patients in MIMIC-III included in the final cohort, a total of 1,337 (18.15%) patients died during a 4-year follow-up. Among 65 variables extracted from the database, a total of 25 predictors were selected using recursive feature elimination and included in the subsequent analysis. The Ada model performed best among eight models in both discriminatory ability with the highest AUC of 0.801 and goodness of fit (visualized by calibration curve). Moreover, the DCA shows that the net benefit of the RF, Ada, and BT models surpassed that of other ML models for almost all threshold probability values. Additionally, through the Ada technique, we determined that red blood cell distribution width (RDW), blood urea nitrogen (BUN), SAPS II, anion gap (AG), age, urine output, chloride, creatinine, congestive heart failure, and SOFA were the Top 10 predictors in the feature importance rankings. Conclusions The Ada model performs best in predicting 4-year mortality after cardiac surgery among the eight ML models, which might have significant application in the development of early warning systems for patients following operations.

Journal ArticleDOI
TL;DR: In this paper , a multivariable logistic regression model was developed to predict severe rebound pain after foot and ankle surgery involving single-shot popliteal sciatic nerve block, defined as transition from well-controlled pain in the PACU (numerical rating scale [NRS] 3 or less) to severe pain (NRS ≥ 7).
Abstract: Rebound pain occurs after up to 50% of ambulatory surgeries involving regional anaesthesia. To assist with risk stratification, we developed a model to predict severe rebound pain after foot and ankle surgery involving single-shot popliteal sciatic nerve block.After ethics approval, we performed a single-centre retrospective cohort study. Patients undergoing lower limb surgery with popliteal sciatic nerve block from January 2016 to November 2019 were included. Exclusion criteria were uncontrolled pain in the PACU, use of a perineural catheter, or loss to follow-up. We developed and internally validated a multivariable logistic regression model for severe rebound pain, defined as transition from well-controlled pain in the PACU (numerical rating scale [NRS] 3 or less) to severe pain (NRS ≥7) within 48 h. A priori predictors were age, sex, surgery type, planned admission, local anaesthetic type, dexamethasone use, and intraoperative anaesthesia type. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), Nagelkerke's R2, scaled Brier score, and calibration slope.The cohort included 1365 patients (mean [standard deviation] age: 50 [16] yr). The primary outcome was abstracted in 1311 (96%) patients, with severe rebound pain in 652 (50%). Internal validation revealed poor model performance, with AUROC 0.632 (95% confidence interval [CI]: 0.602-0.661; bootstrap optimisation 0.021), Nagelkerke's R2 0.063, and scaled Brier score 0.047. Calibration slope was 0.832 (95% CI: 0.623-1.041).We show that a multivariable risk prediction model developed using routinely collected clinical data had poor predictive performance for severe rebound pain after foot and ankle surgery. Prospective studies involving other patient-related predictors are needed.NCT05018104.

Journal ArticleDOI
08 Feb 2022-PLOS ONE
TL;DR: PIs showed overconfidence in favorable outcomes and exhibited limited skill in predicting scientific or operational outcomes for their own trials, but showed modest ability to discriminate between positive and non-positive trial outcomes.
Abstract: Objective To assess the accuracy of principal investigators’ (PIs) predictions about three events for their own clinical trials: positivity on trial primary outcomes, successful recruitment and timely trial completion. Study design and setting A short, electronic survey was used to elicit subjective probabilities within seven months of trial registration. When trial results became available, prediction skill was calculated using Brier scores (BS) and compared against uninformative prediction (i.e. predicting 50% all of the time). Results 740 PIs returned surveys (16.7% response rate). Predictions on all three events tended to exceed observed event frequency. Averaged PI skill did not surpass uninformative predictions (e.g., BS = 0.25) for primary outcomes (BS = 0.25, 95% CI 0.20, 0.30) and were significantly worse for recruitment and timeline predictions (BS 0.38, 95% CI 0.33, 0.42; BS = 0.52, 95% CI 0.50, 0.55, respectively). PIs showed poor calibration for primary outcome, recruitment, and timelines (calibration index = 0.064, 0.150 and 0.406, respectively), modest discrimination in primary outcome predictions (AUC = 0.76, 95% CI 0.65, 0.85) but minimal discrimination in the other two outcomes (AUC = 0.64, 95% CI 0.57, 0.70; and 0.55, 95% CI 0.47, 0.62, respectively). Conclusion PIs showed overconfidence in favorable outcomes and exhibited limited skill in predicting scientific or operational outcomes for their own trials. They nevertheless showed modest ability to discriminate between positive and non-positive trial outcomes. Low survey response rates may limit generalizability.