scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Two-Stage Model to Predict Surgical Patients’ Lengths of Stay From an Electronic Patient Database

Ashwani Kumar1, Hamideh Anjomshoa1
01 Mar 2019-IEEE Journal of Biomedical and Health Informatics (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 23, Iss: 2, pp 848-856
TL;DR: A two-stage classification model to classify patients into lower variability resource user groups by using electronic patient records is developed and it is found that the CART analysis is also useful for determining the patient attributes that can explain the variability in resource requirements.
Abstract: Soaring healthcare costs and the growing demand for services require us to use healthcare resources more efficiently. Randomness in resource requirements makes the care delivery process less efficient. Our aim is to reduce the uncertainty in patients’ resource requirements, and we achieve that objective by classifying patients into similar resource user groups. In this article, we develop a two-stage classification model to classify patients into lower variability resource user groups by using electronic patient records. There are various statistical tools for classifying patients into lower variability resource user groups. However, classification and regression tree (CART) analysis is a more suitable method for analyzing healthcare data because it has some distinct features. For example, it can handle the interaction between predictor variables naturally, it is nonparametric in nature, and it is relatively insensitive to the curse of dimensionality. We found that the CART analysis is also useful for determining the patient attributes that can explain the variability in resource requirements. Furthermore, we observed that some of the covariates, such as the principal prescribed procedure code, the admission point, and the operating surgeon, were able to explain up to $53.43\%$ of the variability in patients’ lengths of stay (LoS). Reducing the uncertainty in patients’ LoS predictions helps us manage patient flow efficiently and subsequently obtain a better throughput.
Citations
More filters
Journal ArticleDOI
TL;DR: A stochastic mixed integer programming model is developed to optimise the tactical master surgery schedule (MSS) in order to achieve a better patient flow under downstream capacity constraints and develops a robust MSS to maximise the utilisation level while keeping the number of cancellations within acceptable limits.

30 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a structured view on why, when and how to apply biclustering to mine discriminative patterns of post-surgical risk with guarantees of usability.
Abstract: Understanding the individualized risks of undertaking surgical procedures is essential to personalize preparatory, intervention and post-care protocols for minimizing post-surgical complications. This knowledge is key in oncology given the nature of interventions, the fragile profile of patients with comorbidities and cytotoxic drug exposure, and the possible cancer recurrence. Despite its relevance, the discovery of discriminative patterns of post-surgical risk is hampered by major challenges: i) the unique physiological and demographic profile of individuals, as well as their differentiated post-surgical care; ii) the high-dimensionality and heterogeneous nature of available biomedical data, combining non-identically distributed risk factors, clinical and molecular variables; iii) the need to generalize tumors have significant histopathological differences and individuals undertake unique surgical procedures; iv) the need to focus on non-trivial patterns of post-surgical risk, while guaranteeing their statistical significance and discriminative power; and v) the lack of interpretability and actionability of current approaches. Biclustering, the discovery of groups of individuals correlated on subsets of variables, has unique properties of interest, being positioned to satisfy the aforementioned challenges. In this context, this work proposes a structured view on why, when and how to apply biclustering to mine discriminative patterns of post-surgical risk with guarantees of usability, a subject remaining unexplored up to date. These patterns offer a comprehensive view on how the patient profile, cancer histopathology and entailed surgical procedures determine: i) post-surgical complications, ii) survival, and iii) hospitalization needs. The gathered results confirm the role of biclustering in comprehensively finding interpretable, actionable and statistically significant patterns of post-surgical risk. The found patterns are already assisting healthcare professionals at IPO-Porto to establish specialized pre-habilitation protocols and bedside care.

10 citations

Journal ArticleDOI
TL;DR: In this paper , a Neural Process (NP) model is proposed to estimate missing values in clinical time-series data using a neural latent variable model, which employs a conditional prior distribution in the latent space to learn global uncertainty in the data by modelling variations at a local level.
Abstract: Clinical time-series data retrieved from electronic medical records are widely used to build predictive models of adverse events to support resource management. Such data is often sparse and irregularly-sampled, which makes it challenging to use many common machine learning methods. Missing values may be interpolated by carrying the last value forward, or through linear regression. Gaussian process (GP) regression is also used for performing imputation, and often re-sampling of time-series at regular intervals. The use of GPs can require extensive, and likely adhoc, investigation to determine model structure, such as an appropriate covariance function. This can be challenging for multivariate real-world clinical data, in which time-series variables exhibit different dynamics to one another. In this work, we construct generative models to estimate missing values in clinical time-series data using a neural latent variable model, known as a Neural Process (NP). The NP model employs a conditional prior distribution in the latent space to learn global uncertainty in the data by modelling variations at a local level. In contrast to conventional generative modelling, this prior is not fixed and is itself learned during the training process. Thus, NP model provides the flexibility to adapt to the dynamics of the available clinical data. We propose a variant of the NP framework for efficient modelling of the mutual information between the latent and input spaces, ensuring meaningful learned priors. Experiments using the MIMIC III dataset demonstrate the effectiveness of the proposed approach as compared to conventional methods.

5 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed an optimal data-driven model for predicting the pLOS risk of peritoneal dialysis patients using basic admission data, where a stacking model was constructed with support vector machine, random forest (RF), and K-nearest neighbor algorithms as its base models and traditional logistic regression (LR) as its meta-model.
Abstract: Background: The increasing number of patients treated with peritoneal dialysis (PD) and their consistently high rate of hospital admissions have placed a large burden on the health care system. Early clinical interventions and optimal management of patients at a high risk of prolonged length of stay (pLOS) may help improve the medical efficiency and prognosis of PD-treated patients. If timely clinical interventions are not provided, patients at a high risk of pLOS may face a poor prognosis and high medical expenses, which will also be a burden on hospitals. Therefore, physicians need an effective pLOS prediction model for PD-treated patients. Objective: This study aimed to develop an optimal data-driven model for predicting the pLOS risk of PD-treated patients using basic admission data. Methods: Patient data collected using the Hospital Quality Monitoring System (HQMS) in China were used to develop pLOS prediction models. A stacking model was constructed with support vector machine, random forest (RF), and K-nearest neighbor algorithms as its base models and traditional logistic regression (LR) as its meta-model. The meta-model used the outputs of all 3 base models as input and generated the output of the stacking model. Another LR-based pLOS prediction model was built as the benchmark model. The prediction performance of the stacking model was compared with that of its base models and the benchmark model. Five-fold cross-validation was employed to develop and validate the models. Performance measures included the Brier score, area under the receiver operating characteristic curve (AUROC), estimated calibration index (ECI), accuracy, sensitivity, specificity, and geometric mean (Gm). In addition, a calibration plot was employed to visually demonstrate the calibration power of each model. Results: The final cohort extracted from the HQMS database consisted of 23,992 eligible PD-treated patients, among whom 30.3% had a pLOS (ie, longer than the average LOS, which was 16 days in our study). Among the models, the stacking model achieved the best calibration (ECI 8.691), balanced accuracy (Gm 0.690), accuracy (0.695), and specificity (0.701). Meanwhile, the stacking and RF models had the best overall performance (Brier score 0.174 for both) and discrimination (AUROC 0.757 for the stacking model and 0.756 for the RF model). Compared with the benchmark LR model, the stacking model was superior in all performance measures except sensitivity, but there was no significant difference in sensitivity between the 2 models. The 2-sided t tests revealed significant performance differences between the stacking and LR models in overall performance, discrimination, calibration, balanced accuracy, and accuracy. Conclusions: This study is the first to develop data-driven pLOS prediction models for PD-treated patients using basic admission data from a national database. The results indicate the feasibility of utilizing a stacking-based pLOS prediction model for PD-treated patients. The pLOS prediction tools developed in this study have the potential to assist clinicians in identifying patients at a high risk of pLOS and to allocate resources optimally for PD-treated patients.

4 citations

References
More filters
Book ChapterDOI

[...]

01 Jan 2012

139,059 citations


"A Two-Stage Model to Predict Surgic..." refers background in this paper

  • ...Tang, Luo, and Gardiner [8] fitted Coxian distributions to acute myocardial infarction patients’ LoS data and computed the partial effect of various covariates on the mean LoS....

    [...]

  • ...Available:http:// link.springer.com/10.1007/978-3-642-00179-6 [8] X. Tang, Z. Luo, and J. C. Gardiner, “Modeling hospital length of stay by Coxian phase-type regression with heterogeneity,” Stat....

    [...]

Journal ArticleDOI
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Abstract: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research. Chapter 12 concludes the book with some commentary about the scientiŽ c contributions of MTS. The Taguchi method for design of experiment has generated considerable controversy in the statistical community over the past few decades. The MTS/MTGS method seems to lead another source of discussions on the methodology it advocates (Montgomery 2003). As pointed out by Woodall et al. (2003), the MTS/MTGS methods are considered ad hoc in the sense that they have not been developed using any underlying statistical theory. Because the “normal” and “abnormal” groups form the basis of the theory, some sampling restrictions are fundamental to the applications. First, it is essential that the “normal” sample be uniform, unbiased, and/or complete so that a reliable measurement scale is obtained. Second, the selection of “abnormal” samples is crucial to the success of dimensionality reduction when OAs are used. For example, if each abnormal item is really unique in the medical example, then it is unclear how the statistical distance MD can be guaranteed to give a consistent diagnosis measure of severity on a continuous scale when the larger-the-better type S/N ratio is used. Multivariate diagnosis is not new to Technometrics readers and is now becoming increasingly more popular in statistical analysis and data mining for knowledge discovery. As a promising alternative that assumes no underlying data model, The Mahalanobis–Taguchi Strategy does not provide sufŽ cient evidence of gains achieved by using the proposed method over existing tools. Readers may be very interested in a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Overall, although the idea of MTS/MTGS is intriguing, this book would be more valuable had it been written in a rigorous fashion as a technical reference. There is some lack of precision even in several mathematical notations. Perhaps a follow-up with additional theoretical justiŽ cation and careful case studies would answer some of the lingering questions.

11,507 citations


"A Two-Stage Model to Predict Surgic..." refers background in this paper

  • ...Each partition is a binary split based on single independent variable [18]....

    [...]

Journal ArticleDOI
TL;DR: A generic framework for modelling of hospital resources in the light of perceived user-needs and real-life hospital processes is proposed and incorporates the need for patient classification techniques to be adopted, which forms a key differentiator between this approach and other attempts to produce practical capacity planning and management tools.
Abstract: The provision of hospital resources, such as beds, operating theatres and nurses, is a matter of considerable public and political concern and has been the subject of widespread debate [1, 2, 3]. The political element of healthcare emphasises the need for objective methods and tools to inform the debate and provide a better foundation for decision-making. There is considerable scope for operational models to be widely used for this purpose. An appreciation of the dynamics governing a hospital system, and the flow of patients through it, point towards the need for sophisticated capacity models reflecting the complexity, uncertainty, variability and limited resources. Working alongside managers and clinicians from participating hospitals, this paper proposes a generic framework for modelling of hospital resources in the light of perceived user-needs and real-life hospital processes. The proposed framework incorporates the need for patient classification techniques to be adopted, which forms a key differentiator between this approach and other attempts to produce practical capacity planning and management tools. Statistically and clinically meaningful patient groupings may then be fed into developed simulation models and individual patients from each group passed through the particular hospital system of concern. The effectiveness of the framework is demonstrated through the development and use of an integrated hospital capacity tool.

275 citations


"A Two-Stage Model to Predict Surgic..." refers background or methods in this paper

  • ...The random arrival time and the uncertainty in resource requirements of each individual are the sources of variability in demand for services [2]....

    [...]

  • ...Happer [2] developed Apollo, a statistical analysis program, in which he incorporated CART analysis to classify patients into similar resource user groups....

    [...]

Journal ArticleDOI
TL;DR: Models can be successfully created to help improve resource planning and from which a simple decision support system can be produced to help patient expectation on their length of stay.
Abstract: To investigate whether factors can be identified that significantly affect hospital length of stay from those available in an electronic patient record system, using primary total knee replacements as an example. To investigate whether a model can be produced to predict the length of stay based on these factors to help resource planning and patient expectations on their length of stay. Data were extracted from the electronic patient record system for discharges from primary total knee operations from January 2007 to December 2011 (n = 2,130) at one UK hospital and analysed for their effect on length of stay using Mann-Whitney and Kruskal-Wallis tests for discrete data and Spearman’s correlation coefficient for continuous data. Models for predicting length of stay for primary total knee replacements were tested using the Poisson regression and the negative binomial modelling techniques. Factors found to have a significant effect on length of stay were age, gender, consultant, discharge destination, deprivation and ethnicity. Applying a negative binomial model to these variables was successful. The model predicted the length of stay of those patients who stayed 4–6 days (~50% of admissions) with 75% accuracy within 2 days (model data). Overall, the model predicted the total days stayed over 5 years to be only 88 days more than actual, a 6.9% uplift (test data). Valuable information can be found about length of stay from the analysis of variables easily extracted from an electronic patient record system. Models can be successfully created to help improve resource planning and from which a simple decision support system can be produced to help patient expectation on their length of stay.

131 citations


"A Two-Stage Model to Predict Surgic..." refers methods in this paper

  • ...Carter and Potts [10] used a Poisson regression model and a negative binomial model to predict the LoS of knee replacement surgery patients from patient attributes such as age, gender, ethnicity, deprivation, and consultant....

    [...]

  • ...[10] E. M. Carter and H. W. Potts, “Predicting length of stay from an electronic patient record system: A primary total knee replacement example,” BMC Med....

    [...]