scispace - formally typeset

Book ChapterDOI

A Clustering-Based Patient Grouper for Burn Care

14 Nov 2019-pp 123-131

TL;DR: It is argued that a data-driven approach minimises bias in feature selection in patient groups, and a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix is demonstrated.

AbstractPatient casemix is a system of defining groups of patients. For reimbursement purposes, these groups should be clinically meaningful and share similar resource usage during their hospital stay. In the UK National Health Service (NHS) these groups are known as health resource groups (HRGs), and are predominantly derived based on expert advice and checked for homogeneity afterwards, typically using length of stay (LOS) to assess similarity in resource consumption. LOS does not fully capture the actual resource usage of patients, and assurances on the accuracy of HRG as a basis of payment rate derivation are therefore difficult to give. Also, with complex patient groups such as those encountered in burn care, expert advice will often reflect average patients only, therefore not capturing the complexity and severity of many patients’ injury profile. The data-driven development of a grouper may support the identification of features and segments that more accurately account for patient complexity and resource use. In this paper, we describe the development of such a grouper using established techniques for dimensionality reduction and cluster analysis. We argue that a data-driven approach minimises bias in feature selection. Using a registry of patients from 23 burn services in England and Wales, we demonstrate a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix.

Summary (2 min read)

Introduction

  • Imbursement purposes, these groups should be clinically meaningful and share similar resource usage during their hospital stay.
  • In the UK National Health Service (NHS) these groups are known as health resource groups (HRGs), and are predominantly derived based on expert advice and checked for homogeneity afterwards, typically using length of stay (LOS) to assess similarity in resource consumption.
  • Also, with complex patient groups such as those encountered in burn care, expert advice will often reflect average patients only, therefore not capturing the complexity and severity of many patients’ injury profile.
  • The data-driven development of a grouper may support the identification of features and segments that more accurately account for patient complexity and resource use.
  • The authors argue that a data-driven approach minimises bias in feature selection.

1 Motivation

  • The NHS serves a wide population with varied demographic and medical histories, with the aim of providing health interventions to the population who need them.
  • In contrast, prospective payment systems (PPSs) determine the provider's payment rates ex ante without any link to the real costs of the individual provider [2].
  • HRGs are generated using nationally mandated patient-level data, which primarily includes age, complications and comorbidities, diagnosis and procedures.
  • The authors core hypothesis is that in-depth analysis of the available data should be used in conjunction with expert input to develop an evidence-based model that comprehensively captures the complexity of care provided by such services, and accurately classifies patients into homogeneous groups with respect to costs and patient characteristics.
  • Burn services are to be open regardless of the number of patients admitted, with a minimum number of staff, and they rely on the use of highly specialist equipment and interventions.

2.1 Data

  • This study uses comprehensive anonymized patient-level data that is nationally mandated for all burn units in England and Wales.
  • This includes features such as demographic characteristics (age, gender), burn characteristics (depth, total burn surface area, burn site, locality, type, source, category and injury group), pre-existing conditions (self-harm, alcohol usage, asthma, clotting disorder etc.), time from injury to admission, patient-level cost, LOS and index of multiple deprivation (IMD).
  • To highlight current variation in HRGs and as a benchmark for model performance, the authors use the 2017/18 average patient-level cost by HRG open data released by NHS Improvement.
  • This is limited to one year as PLICS adoption was introduced just in 2017/18 data collection cycle.

2.2 Analysis Pipeline

  • Selecting relevant features and cases, also known as Step 1.
  • Linear discriminant analysis (LDA), a supervised approach to dimensionality reduction, is adopted.
  • The target feature is then generated using k-means clustering algorithm (k = 38, same as number of HRGs) to partition the two-dimensional target space defined by adjusted LOS and patient-level cost.
  • The current grouper splits the data into young patients (<16 years old) and older patients (>=16 years old).
  • This reflects the burn care pathway, designed to treat pediatrics separately from adults as young age is identified as a significant complicator.

3 Results and Analysis

  • The authors explore the patient-level cost by HRG, as generated by the National Casemix office.
  • The wider the boxplot, the more variable are the costs within that group.
  • When comparing the clusters Adult3 and Adult12, these have very similar average age, but Adult3 has the more severe burns (TBSA), higher LOS and cost, and so the necessity to have separate groups.
  • Child5 and Child10 though with similar adjusted LOS, Child5 has a higher TBSA, higher score with respect to the severity of existing disorders and thus a higher average patient-level cost.
  • These results highlight the effectiveness of the datadriven HAC grouper in generating groups with homogenous patient characteristics.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

The University of Manchester Research
A Clustering-Based Patient Grouper for Burn Care
DOI:
10.1007/978-3-030-33617-2_14
Document Version
Accepted author manuscript
Link to publication record in Manchester Research Explorer
Citation for published version (APA):
Onah, C., Allmendinger, R., Handl, J., Yiapanis, P., & Dunn, K. W. (2019). A Clustering-Based Patient Grouper for
Burn Care. In Intelligent Data Engineering and Automated Learning - IDEAL 2019 https://doi.org/10.1007/978-3-
030-33617-2_14
Published in:
Intelligent Data Engineering and Automated Learning - IDEAL 2019
Citing this paper
Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript
or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the
publisher's definitive version.
General rights
Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the
authors and/or other copyright owners and it is a condition of accessing publications that users recognise and
abide by the legal requirements associated with these rights.
Takedown policy
If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown
Procedures [http://man.ac.uk/04Y6Bo] or contact uml.scholarlycommunications@manchester.ac.uk providing
relevant details, so we can investigate your claim.
Download date:10. Aug. 2022

A Clustering-Based Patient Grouper for Burn Care
Chimdimma Noelyn Onah
1
, Richard Allmendinger
1
, Julia Handl
1
, Paraskevas
Yiapanis
2
, Ken W. Dunn
3
1
University of Manchester
2
Medical Data Solutions and Services
3
University Hospital South Manchester
Abstract. Patient casemix is a system of defining groups of patients. For re-
imbursement purposes, these groups should be clinically meaningful and share
similar resource usage during their hospital stay. In the UK National Health
Service (NHS) these groups are known as health resource groups (HRGs), and
are predominantly derived based on expert advice and checked for homogeneity
afterwards, typically using length of stay (LOS) to assess similarity in resource
consumption. LOS does not fully capture the actual resource usage of patients,
and assurances on the accuracy of HRG as a basis of payment rate derivation
are therefore difficult to give. Also, with complex patient groups such as those
encountered in burn care, expert advice will often reflect average patients only,
therefore not capturing the complexity and severity of many patients’ injury
profile. The data-driven development of a grouper may support the identifica-
tion of features and segments that more accurately account for patient complexi-
ty and resource use. In this paper, we describe the development of such a group-
er using established techniques for dimensionality reduction and cluster analy-
sis. We argue that a data-driven approach minimises bias in feature selection.
Using a registry of patients from 23 burn services in England and Wales, we
demonstrate a reduction of within cluster cost-variation in the identified groups,
when compared to the original casemix.
Keywords: Patient Casemix, Clustering, Data Driven.
1 Motivation
The NHS serves a wide population with varied demographic and medical histories,
with the aim of providing health interventions to the population who need them. The
provision and maintenance of these interventions is constrained by scarce resources
and cost containment [1]. The pressure from binding budget constraints, and thus the
need to control costs, has induced a shift in favor of prospective payments over retro-
spective payment systems.
The use of patient-level payment system transfers all cost burden to the payer,
since the reimbursement is based on the real costs. In the context of such a system,
even profit maximizing providers may be insufficiently motivated to decrease costs.
In contrast, prospective payment systems (PPSs) determine the provider's payment

2
rates ex ante without any link to the real costs of the individual provider [2]. This
payment system is increasingly being adopted over retrospective systems, as it en-
courages cost containment and a shared burden with the providers. There is wide
adoption of PPS globally, with approximately 70% of all OECD countries and more
than 25 low-and middle-income countries having adopted some sort of casemix system
for reimbursement purposes [3, 4].
Here, a casemix is a system of defining cohorts of related patients, which comprise
cases that are homogenous by resource consumption pattern and at the same time,
clinically similar. In the NHS, the National Casemix Office (NCO) is commissioned to
develop and maintain a set of casemix groupings, called HRG (health resource
group). This is a type of PPS where payment rate is determined as the average patient
cost in each HRG. HRGs are generated using nationally mandated patient-level data,
which primarily includes age, complications and comorbidities, diagnosis and proce-
dures. Adopted in acute care, the groups are generated by transcribing expert advice
into if-else rules, with the aim of capturing differing patient severity and length of
stay (LOS).
Any reimbursement methodology based on generalizations across patient groups
(i.e. determining payment rate as an average of cost in each HRG) will have weaknesses
regarding its ability to fairly work across a variety of settings and HRGs are no excep-
tion to this. The use of LOS as an (imperfect) indicator of resource use contributes
further to this weakness it is known to be unreliable particularly for the case of sur-
gical patients [5]. Finally, the identification of relevant factors based on expert advice
alone carries the risk of ignoring other unknown (or less well established) factors that
may account for the case complexity of certain patient sub-groups.
Our core hypothesis is that in-depth analysis of the available data should be used in
conjunction with expert input to develop an evidence-based model that comprehen-
sively captures the complexity of care provided by such services, and accurately clas-
sifies patients into homogeneous groups with respect to costs and patient characteris-
tics. This dual approach was previously not possible due to a lack of availability of
extensive patient-level cost data, and the resulting primary dependence on expert
advice.
Our research aims to provide evidence for this hypothesis. First, we explore the ac-
curacy of current HRGs in terms of actual resource usage. Second, we describe an
analytical approach to the development of an alternative, data-driven grouper.
Throughout our analysis, we use burn care as a base case. Burn services are selected
as an example of a specialized service, which deals with rare and complex conditions
and by necessity operates at high expenditure. Burn services are to be open regardless
of the number of patients admitted, with a minimum number of staff, and they rely on
the use of highly specialist equipment and interventions. We expect that the complex
characteristics of this setting make them particularly sensitive to the impact of weak-
nesses in the current HRG classification.
The remainder of this paper is structured as follows. The next section introduces
the data sets used to explore HRGs and generate the data-driven groups. We then in-
troduce the analysis pipeline adopted, which includes data pre-processing, dimension-
ality reduction and the deployment of clustering approaches in two separate steps. In

3
Section 3, we discuss the results, using visualizations and within cluster variation of
costs to identify improvements. The final section includes a conclusion and discussion
of future work.
2 Methodology
2.1 Data
This study uses comprehensive anonymized patient-level data that is nationally man-
dated for all burn units in England and Wales. The data covers a time period from
2003 to 2019 and captures 164 features for just over 100,000 patients. This includes
features such as demographic characteristics (age, gender), burn characteristics
(depth, total burn surface area, burn site, locality, type, source, category and injury
group), pre-existing conditions (self-harm, alcohol usage, asthma, clotting disorder
etc.), time from injury to admission, patient-level cost, LOS and index of multiple
deprivation (IMD).
To highlight current variation in HRGs and as a benchmark for model perfor-
mance, we use the 2017/18 average patient-level cost by HRG open data released by
NHS Improvement. This is limited to one year as PLICS adoption was introduced just
in 2017/18 data collection cycle. This data is at the burn service level and so repre-
sents average patient level cost in each service.
2.2 Analysis Pipeline
Step 1: Selecting relevant features and cases. To ensure the use of quality features
that reflect the clinical and cost differences of patients, the features selected for clus-
tering were those identified as statistically significant in predicting patient-level cost
and patient outcome. Cost prediction accuracy was improved with the removal of
non-survivals, which LOS and cost less compared to survivals with similar burn char-
acteristics. Thus, is in line with the current grouper, the following analysis focuses on
survival cases only. All cases with missing data were deleted, leaving just over 80,000
cases and 24 features after feature selection. Table 1 lists these features.
Table 1. Selected Features
Feature type (count)
Feature
Demographic (3)
Gender, Age, Index of Multiple Deprivation (IMD)
Burn characteristics (17)
Total burn surface area (TBSA); Presence of inhalation;
Site of burn (leg; upper limb (UL); torso and thorax; face, hands, feet
and perineum (FHPP); head and hand (HH); face, hands and feet
(FHF)); Type of injury (contact, cold, flame, electrical, scald, chemi-
cal, friction, flash, radiation)
Comorbidity (2)
Number existing disorders, significance of existing disorder
Cost Features (2)
Adjusted LOS, Patient-level cost

4
We implement further dimensionality reduction to minimise noise, data complexity
and reduce redundancy. Dimensionality reduction also helps reduce processing time
and mitigates against the curse of dimensionality [6]. Linear discriminant analysis
(LDA), a supervised approach to dimensionality reduction, is adopted. Here, this
method is preferred over unsupervised dimension reduction models such as principal
component analysis (PCA), as we wish to identify components that maximise cost
separation rather than percent of variance alone.
Step 2: Deriving target feature for LDA. We derive a set of target classes for the
LDA using a cluster analysis on multiple cost features, to reduce sensitivity to a single
cost measure. This is achieved by using cost features: patient level cost and adjusted
LOS as the target space. The target feature is then generated using k-means clustering
algorithm (k = 38, same as number of HRGs) to partition the two-dimensional target
space defined by adjusted LOS and patient-level cost.
Step 3: Segmentation by age. The current grouper splits the data into young patients
(<16 years old) and older patients (>=16 years old). This reflects the burn care path-
way, designed to treat pediatrics separately from adults as young age is identified as a
significant complicator. The 2001 National Burn Care Review Report [8] highlights
the unpredictable complication of seemingly simple burn injuries especially for pedi-
atric patients. It argues and mandates the need for separate burn units for children and
adults, due to the peculiar needs of children such as play specialist, teachers, family
counselors and intensive psychosocial support. In line with the current grouper, we
therefore further split the data by age group.
Step 4: Dimensionality reduction using LDA. The comorbidity details, demograph-
ic and burn characteristics listed in Table 1 are used as the input features for the LDA.
We retain the first two LDA components. Therefore, the output of this analysis is a
projection from the original feature space into a two-dimensional manifold spanned
by orthogonal components that maximise separation by the target feature constructed
in Step 2. This is done on each segment derived in Step 3.
Step 5: Segmentation into homogeneous patient groups. With these pre-processing
and dimensionality reduction steps completed, an unsupervised clustering method is
deployed to derive homogenous patient groups. This paper uses an unsupervised clus-
tering method, as we assume that the true class of patients are unknown. The use of a
supervised method, for example, using cost labels may create groups that are homoge-
nous in terms of cost only. This therefore does not meet the clinical relevance criteria.
In particular, we deploy an agglomerative hierarchical clustering (HAC) algorithm
using the LDA components generated on each age segments (<16 years old and >=16
years old) as input data to generate 13 and 25 patient groups respectively. The group
numbers reflect the number of segments generated by the current grouper, to facilitate
comparison.

Citations
More filters

Journal ArticleDOI
Abstract: With a reduction in the mortality rate of burn patients, length of stay (LOS) has been increasingly adopted as an outcome measure. Some studies have attempted to identify factors that explain a burn patient’s LOS. However, few have investigated the association between LOS and a patient’s mental and socioeconomic status. There is anecdotal evidence for links between these factors; uncovering these will aid in better addressing the specific physical and emotional needs of burn patients and facilitate the planning of scarce hospital resources. Here, we employ machine learning (clustering) and statistical models (regression) to investigate whether segmentation by socioeconomic/mental status can improve the performance and interpretability of an upstream predictive model, relative to a unitary model. Although we found no significant difference in the unitary model’s performance and the segment-specific models, the interpretation of the segment-specific models reveals a reduced impact of burn severity in LOS prediction with increasing adverse socioeconomic and mental status. Furthermore, the socioeconomic segments’ models highlight an increased influence of living circumstances and source of injury on LOS. These findings suggest that in addition to ensuring that patients’ physical needs are met, management of their mental status is crucial for delivering an effective care plan.

1 citations


Posted Content
Abstract: The adoption of the Prospective Payment System (PPS) in the UK National Health Service (NHS) has led to the creation of patient groups called Health Resource Groups (HRG). HRGs aim to identify groups of clinically similar patients that share similar resource usage for reimbursement purposes. These groups are predominantly identified based on expert advice, with homogeneity checked using the length of stay (LOS). However, for complex patients such as those encountered in burn care, LOS is not a perfect proxy of resource usage, leading to incomplete homogeneity checks. To improve homogeneity in resource usage and severity, we propose a data-driven model and the inclusion of patient-level costing. We investigate whether a data-driven approach that considers additional measures of resource usage can lead to a more comprehensive model. In particular, a cost-sensitive decision tree model is adopted to identify features of importance and rules that allow for a focused segmentation on resource usage (LOS and patient-level cost) and clinical similarity (severity of burn). The proposed approach identified groups with increased homogeneity compared to the current HRG groups, allowing for a more equitable reimbursement of hospital care costs if adopted.

Proceedings ArticleDOI
16 Nov 2020
TL;DR: An ad-hoc genetic algorithm is proposed which combines filter feature selection and clustering strategies to determine if there is a set of features related to the case-mix that allow to reach the same categorisation proposed by the MINSAL.
Abstract: The healthcare services must provide quality health safeguarding the efficient use of the resources. To evaluate technical efficiency performing fairly comparisons it is necessary to group the hospitals according to the type of patient treated: case-mix. Generally, this evaluation is performed by using the Related Groups for Diagnosis (DRG) system. Since only a few hospitals have implemented this system in Chile, the analysis of technical efficiency results limited. The Ministry of Health of Chile (MINSAL) has proposed an administrative categorisation for the public hospitals: high, medium and low complexity. However, it has not been studied if this definition is associated to the case-mix and if it can be used to study technical efficiency. In this work, we propose an ad-hoc genetic algorithm which combines filter feature selection and clustering strategies to determine if there is a set of features related to the case-mix that allow to reach the same categorisation proposed by the MINSAL. The results show that, although a small set of features is able to reach this categorisation by year, there is not enough evidence to establish a relationship with the case-mix. It is recommended that future technical efficiency analyses use new categorisations based on case-mix instead of the MINSAL categorisation.

References
More filters

Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

33,540 citations


"A Clustering-Based Patient Grouper ..." refers methods in this paper

  • ...Average link HAC clustering and all preprocessing steps were implemented using the relevant scikit-learn module on python, with other parameters left as default [9]....

    [...]


Posted Content
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations


Journal ArticleDOI
TL;DR: A typology to classify provider payment systems from an incentive point of view is developed and provides a useful framework for future research of health care payment systems.
Abstract: A typology to classify provider payment systems from an incentive point of view is developed. We analyse the way, how these systems can influence provider behaviour and, a fortiori, contribute to attain the general objectives of health care, i.e. quality of care, efficiency and accessibility. The first dimension of the typology indicates whether there is a link between the provider's income and his activity. In variable systems, the provider has an ability to influence his earnings, contrary to fixed systems. The second dimension indicates whether the provider's payments are related to his actual costs or not. In retrospective systems, the provider's own costs are the basis for reimbursement ex post whereas in prospective systems payments are determined ex ante without any link to the real costs of the individual provider. These different characteristics are likely to influence provider behaviour in different ways. Furthermore the most frequently used criteria to determine the provider's income are discussed: per service, per diem, per case, per patient and per period. Also a distinction is made between incentives at the level of the individual provider (micro-level) and the sponsor (macro-level). Finally, the potential interactions when several payment systems are used simultaneously are discussed. This typology is useful to classify and compare different types of payment systems as prevailing in different countries, and provides a useful framework for future research of health care payment systems.

160 citations


Journal ArticleDOI
TL;DR: The findings suggest that the greater portion of health-care financing should be public rather than private and countries that import an existing variant of a DRG-based system should be mindful of the need for adaptation.
Abstract: Objective This paper provides a comprehensive overview of hospital payment systems based on diagnosis-related groups (DRGs) in low- and middle-income countries It also explores design and implementation issues and the related challenges countries face Methods A literature research for papers on DRG-based payment systems in low- and middle-income countries was conducted in English, French and Spanish through Pubmed, the Pan American Health Organization's Regional Library of Medicine and Google Findings Twelve low- and middle-income countries have DRG-based payment systems and another 17 are in the piloting or exploratory stage Countries have chosen from a wide range of imported and self-developed DRG models and most have adapted such models to their specific contexts All countries have set expenditure ceilings In general, systems were piloted before being implemented The need to meet certain requirements in terms of coding standardization, data availability and information technology made implementation difficult Private sector providers have not been fully integrated, but most countries have managed to delink hospital financing from public finance budgeting Conclusion Although more evidence on the impact of DRG-based payment systems is needed, our findings suggest that (i) the greater portion of health-care financing should be public rather than private; (ii) it is advisable to pilot systems first and to establish expenditure ceilings; (iii) countries that import an existing variant of a DRG-based system should be mindful of the need for adaptation; and (iv) countries should promote the cooperation of providers for appropriate data generation and claims management

124 citations


Journal ArticleDOI
TL;DR: An analytical strategy is set out to examine variations in resource use, whether cost or length of stay, of patients hospitalised with different conditions and to assess relative hospital performance in managing resources and the characteristics of hospitals that explain this performance.
Abstract: We set out an analytical strategy to examine variations in resource use, whether cost or length of stay, of patients hospitalised with different conditions. The methods are designed to evaluate (i) how well diagnosis-related groups (DRGs) capture variation in resource use relative to other patient characteristics and (ii) what influence the hospital has on their resource use. In a first step, we examine the influence of variables that describe each individual patient, including the DRG to which the patients are assigned and a range of personal and treatment-related characteristics. In a second step, we explore the influence that hospitals have on the average cost or length of stay of their patients, purged of the influence of the variables accounted for in the first stage. We provide a rationale for the variables used in both stages of the analysis and detail how each is defined. The analytical strategy allows us (i) to identify those factors that explain variation in resource use across patients, (ii) to assess the explanatory power of DRGs relative to other patient and treatment characteristics and (iii) to assess relative hospital performance in managing resources and the characteristics of hospitals that explain this performance.

61 citations


Frequently Asked Questions (2)
Q1. What have the authors stated for future works in "A clustering-based patient grouper for burn care" ?

The collection of patient-level cost, at a national scale, has created the possibility of generating improved data-driven groups. Future work will be aimed at exploring changes to their analytical model, including the consideration of different approaches to dimensionality reduction and cluster analysis, as well as the inclusion of expert opinion in feature selection and group validation. The authors have been able to highlight that improvements can be made in identifying patient case mix suitable for payment rate derivation. There could be further reduction in within cluster variance with the use of state-ofthe-art clustering algorithms that simultaneously consider Step 2, 3 and 4 of their analysis. 

In this paper, the authors describe the development of such a grouper using established techniques for dimensionality reduction and cluster analysis. Using a registry of patients from 23 burn services in England and Wales, the authors demonstrate a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix.