scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Clustering-Based Patient Grouper for Burn Care

14 Nov 2019-pp 123-131
TL;DR: It is argued that a data-driven approach minimises bias in feature selection in patient groups, and a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix is demonstrated.
Abstract: Patient casemix is a system of defining groups of patients. For reimbursement purposes, these groups should be clinically meaningful and share similar resource usage during their hospital stay. In the UK National Health Service (NHS) these groups are known as health resource groups (HRGs), and are predominantly derived based on expert advice and checked for homogeneity afterwards, typically using length of stay (LOS) to assess similarity in resource consumption. LOS does not fully capture the actual resource usage of patients, and assurances on the accuracy of HRG as a basis of payment rate derivation are therefore difficult to give. Also, with complex patient groups such as those encountered in burn care, expert advice will often reflect average patients only, therefore not capturing the complexity and severity of many patients’ injury profile. The data-driven development of a grouper may support the identification of features and segments that more accurately account for patient complexity and resource use. In this paper, we describe the development of such a grouper using established techniques for dimensionality reduction and cluster analysis. We argue that a data-driven approach minimises bias in feature selection. Using a registry of patients from 23 burn services in England and Wales, we demonstrate a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix.

Summary (2 min read)

Introduction

  • Imbursement purposes, these groups should be clinically meaningful and share similar resource usage during their hospital stay.
  • In the UK National Health Service (NHS) these groups are known as health resource groups (HRGs), and are predominantly derived based on expert advice and checked for homogeneity afterwards, typically using length of stay (LOS) to assess similarity in resource consumption.
  • Also, with complex patient groups such as those encountered in burn care, expert advice will often reflect average patients only, therefore not capturing the complexity and severity of many patients’ injury profile.
  • The data-driven development of a grouper may support the identification of features and segments that more accurately account for patient complexity and resource use.
  • The authors argue that a data-driven approach minimises bias in feature selection.

1 Motivation

  • The NHS serves a wide population with varied demographic and medical histories, with the aim of providing health interventions to the population who need them.
  • In contrast, prospective payment systems (PPSs) determine the provider's payment rates ex ante without any link to the real costs of the individual provider [2].
  • HRGs are generated using nationally mandated patient-level data, which primarily includes age, complications and comorbidities, diagnosis and procedures.
  • The authors core hypothesis is that in-depth analysis of the available data should be used in conjunction with expert input to develop an evidence-based model that comprehensively captures the complexity of care provided by such services, and accurately classifies patients into homogeneous groups with respect to costs and patient characteristics.
  • Burn services are to be open regardless of the number of patients admitted, with a minimum number of staff, and they rely on the use of highly specialist equipment and interventions.

2.1 Data

  • This study uses comprehensive anonymized patient-level data that is nationally mandated for all burn units in England and Wales.
  • This includes features such as demographic characteristics (age, gender), burn characteristics (depth, total burn surface area, burn site, locality, type, source, category and injury group), pre-existing conditions (self-harm, alcohol usage, asthma, clotting disorder etc.), time from injury to admission, patient-level cost, LOS and index of multiple deprivation (IMD).
  • To highlight current variation in HRGs and as a benchmark for model performance, the authors use the 2017/18 average patient-level cost by HRG open data released by NHS Improvement.
  • This is limited to one year as PLICS adoption was introduced just in 2017/18 data collection cycle.

2.2 Analysis Pipeline

  • Selecting relevant features and cases, also known as Step 1.
  • Linear discriminant analysis (LDA), a supervised approach to dimensionality reduction, is adopted.
  • The target feature is then generated using k-means clustering algorithm (k = 38, same as number of HRGs) to partition the two-dimensional target space defined by adjusted LOS and patient-level cost.
  • The current grouper splits the data into young patients (<16 years old) and older patients (>=16 years old).
  • This reflects the burn care pathway, designed to treat pediatrics separately from adults as young age is identified as a significant complicator.

3 Results and Analysis

  • The authors explore the patient-level cost by HRG, as generated by the National Casemix office.
  • The wider the boxplot, the more variable are the costs within that group.
  • When comparing the clusters Adult3 and Adult12, these have very similar average age, but Adult3 has the more severe burns (TBSA), higher LOS and cost, and so the necessity to have separate groups.
  • Child5 and Child10 though with similar adjusted LOS, Child5 has a higher TBSA, higher score with respect to the severity of existing disorders and thus a higher average patient-level cost.
  • These results highlight the effectiveness of the datadriven HAC grouper in generating groups with homogenous patient characteristics.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

The University of Manchester Research
A Clustering-Based Patient Grouper for Burn Care
DOI:
10.1007/978-3-030-33617-2_14
Document Version
Accepted author manuscript
Link to publication record in Manchester Research Explorer
Citation for published version (APA):
Onah, C., Allmendinger, R., Handl, J., Yiapanis, P., & Dunn, K. W. (2019). A Clustering-Based Patient Grouper for
Burn Care. In Intelligent Data Engineering and Automated Learning - IDEAL 2019 https://doi.org/10.1007/978-3-
030-33617-2_14
Published in:
Intelligent Data Engineering and Automated Learning - IDEAL 2019
Citing this paper
Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript
or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the
publisher's definitive version.
General rights
Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the
authors and/or other copyright owners and it is a condition of accessing publications that users recognise and
abide by the legal requirements associated with these rights.
Takedown policy
If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown
Procedures [http://man.ac.uk/04Y6Bo] or contact uml.scholarlycommunications@manchester.ac.uk providing
relevant details, so we can investigate your claim.
Download date:10. Aug. 2022

A Clustering-Based Patient Grouper for Burn Care
Chimdimma Noelyn Onah
1
, Richard Allmendinger
1
, Julia Handl
1
, Paraskevas
Yiapanis
2
, Ken W. Dunn
3
1
University of Manchester
2
Medical Data Solutions and Services
3
University Hospital South Manchester
Abstract. Patient casemix is a system of defining groups of patients. For re-
imbursement purposes, these groups should be clinically meaningful and share
similar resource usage during their hospital stay. In the UK National Health
Service (NHS) these groups are known as health resource groups (HRGs), and
are predominantly derived based on expert advice and checked for homogeneity
afterwards, typically using length of stay (LOS) to assess similarity in resource
consumption. LOS does not fully capture the actual resource usage of patients,
and assurances on the accuracy of HRG as a basis of payment rate derivation
are therefore difficult to give. Also, with complex patient groups such as those
encountered in burn care, expert advice will often reflect average patients only,
therefore not capturing the complexity and severity of many patients’ injury
profile. The data-driven development of a grouper may support the identifica-
tion of features and segments that more accurately account for patient complexi-
ty and resource use. In this paper, we describe the development of such a group-
er using established techniques for dimensionality reduction and cluster analy-
sis. We argue that a data-driven approach minimises bias in feature selection.
Using a registry of patients from 23 burn services in England and Wales, we
demonstrate a reduction of within cluster cost-variation in the identified groups,
when compared to the original casemix.
Keywords: Patient Casemix, Clustering, Data Driven.
1 Motivation
The NHS serves a wide population with varied demographic and medical histories,
with the aim of providing health interventions to the population who need them. The
provision and maintenance of these interventions is constrained by scarce resources
and cost containment [1]. The pressure from binding budget constraints, and thus the
need to control costs, has induced a shift in favor of prospective payments over retro-
spective payment systems.
The use of patient-level payment system transfers all cost burden to the payer,
since the reimbursement is based on the real costs. In the context of such a system,
even profit maximizing providers may be insufficiently motivated to decrease costs.
In contrast, prospective payment systems (PPSs) determine the provider's payment

2
rates ex ante without any link to the real costs of the individual provider [2]. This
payment system is increasingly being adopted over retrospective systems, as it en-
courages cost containment and a shared burden with the providers. There is wide
adoption of PPS globally, with approximately 70% of all OECD countries and more
than 25 low-and middle-income countries having adopted some sort of casemix system
for reimbursement purposes [3, 4].
Here, a casemix is a system of defining cohorts of related patients, which comprise
cases that are homogenous by resource consumption pattern and at the same time,
clinically similar. In the NHS, the National Casemix Office (NCO) is commissioned to
develop and maintain a set of casemix groupings, called HRG (health resource
group). This is a type of PPS where payment rate is determined as the average patient
cost in each HRG. HRGs are generated using nationally mandated patient-level data,
which primarily includes age, complications and comorbidities, diagnosis and proce-
dures. Adopted in acute care, the groups are generated by transcribing expert advice
into if-else rules, with the aim of capturing differing patient severity and length of
stay (LOS).
Any reimbursement methodology based on generalizations across patient groups
(i.e. determining payment rate as an average of cost in each HRG) will have weaknesses
regarding its ability to fairly work across a variety of settings and HRGs are no excep-
tion to this. The use of LOS as an (imperfect) indicator of resource use contributes
further to this weakness it is known to be unreliable particularly for the case of sur-
gical patients [5]. Finally, the identification of relevant factors based on expert advice
alone carries the risk of ignoring other unknown (or less well established) factors that
may account for the case complexity of certain patient sub-groups.
Our core hypothesis is that in-depth analysis of the available data should be used in
conjunction with expert input to develop an evidence-based model that comprehen-
sively captures the complexity of care provided by such services, and accurately clas-
sifies patients into homogeneous groups with respect to costs and patient characteris-
tics. This dual approach was previously not possible due to a lack of availability of
extensive patient-level cost data, and the resulting primary dependence on expert
advice.
Our research aims to provide evidence for this hypothesis. First, we explore the ac-
curacy of current HRGs in terms of actual resource usage. Second, we describe an
analytical approach to the development of an alternative, data-driven grouper.
Throughout our analysis, we use burn care as a base case. Burn services are selected
as an example of a specialized service, which deals with rare and complex conditions
and by necessity operates at high expenditure. Burn services are to be open regardless
of the number of patients admitted, with a minimum number of staff, and they rely on
the use of highly specialist equipment and interventions. We expect that the complex
characteristics of this setting make them particularly sensitive to the impact of weak-
nesses in the current HRG classification.
The remainder of this paper is structured as follows. The next section introduces
the data sets used to explore HRGs and generate the data-driven groups. We then in-
troduce the analysis pipeline adopted, which includes data pre-processing, dimension-
ality reduction and the deployment of clustering approaches in two separate steps. In

3
Section 3, we discuss the results, using visualizations and within cluster variation of
costs to identify improvements. The final section includes a conclusion and discussion
of future work.
2 Methodology
2.1 Data
This study uses comprehensive anonymized patient-level data that is nationally man-
dated for all burn units in England and Wales. The data covers a time period from
2003 to 2019 and captures 164 features for just over 100,000 patients. This includes
features such as demographic characteristics (age, gender), burn characteristics
(depth, total burn surface area, burn site, locality, type, source, category and injury
group), pre-existing conditions (self-harm, alcohol usage, asthma, clotting disorder
etc.), time from injury to admission, patient-level cost, LOS and index of multiple
deprivation (IMD).
To highlight current variation in HRGs and as a benchmark for model perfor-
mance, we use the 2017/18 average patient-level cost by HRG open data released by
NHS Improvement. This is limited to one year as PLICS adoption was introduced just
in 2017/18 data collection cycle. This data is at the burn service level and so repre-
sents average patient level cost in each service.
2.2 Analysis Pipeline
Step 1: Selecting relevant features and cases. To ensure the use of quality features
that reflect the clinical and cost differences of patients, the features selected for clus-
tering were those identified as statistically significant in predicting patient-level cost
and patient outcome. Cost prediction accuracy was improved with the removal of
non-survivals, which LOS and cost less compared to survivals with similar burn char-
acteristics. Thus, is in line with the current grouper, the following analysis focuses on
survival cases only. All cases with missing data were deleted, leaving just over 80,000
cases and 24 features after feature selection. Table 1 lists these features.
Table 1. Selected Features
Feature type (count)
Feature
Demographic (3)
Gender, Age, Index of Multiple Deprivation (IMD)
Burn characteristics (17)
Total burn surface area (TBSA); Presence of inhalation;
Site of burn (leg; upper limb (UL); torso and thorax; face, hands, feet
and perineum (FHPP); head and hand (HH); face, hands and feet
(FHF)); Type of injury (contact, cold, flame, electrical, scald, chemi-
cal, friction, flash, radiation)
Comorbidity (2)
Number existing disorders, significance of existing disorder
Cost Features (2)
Adjusted LOS, Patient-level cost

4
We implement further dimensionality reduction to minimise noise, data complexity
and reduce redundancy. Dimensionality reduction also helps reduce processing time
and mitigates against the curse of dimensionality [6]. Linear discriminant analysis
(LDA), a supervised approach to dimensionality reduction, is adopted. Here, this
method is preferred over unsupervised dimension reduction models such as principal
component analysis (PCA), as we wish to identify components that maximise cost
separation rather than percent of variance alone.
Step 2: Deriving target feature for LDA. We derive a set of target classes for the
LDA using a cluster analysis on multiple cost features, to reduce sensitivity to a single
cost measure. This is achieved by using cost features: patient level cost and adjusted
LOS as the target space. The target feature is then generated using k-means clustering
algorithm (k = 38, same as number of HRGs) to partition the two-dimensional target
space defined by adjusted LOS and patient-level cost.
Step 3: Segmentation by age. The current grouper splits the data into young patients
(<16 years old) and older patients (>=16 years old). This reflects the burn care path-
way, designed to treat pediatrics separately from adults as young age is identified as a
significant complicator. The 2001 National Burn Care Review Report [8] highlights
the unpredictable complication of seemingly simple burn injuries especially for pedi-
atric patients. It argues and mandates the need for separate burn units for children and
adults, due to the peculiar needs of children such as play specialist, teachers, family
counselors and intensive psychosocial support. In line with the current grouper, we
therefore further split the data by age group.
Step 4: Dimensionality reduction using LDA. The comorbidity details, demograph-
ic and burn characteristics listed in Table 1 are used as the input features for the LDA.
We retain the first two LDA components. Therefore, the output of this analysis is a
projection from the original feature space into a two-dimensional manifold spanned
by orthogonal components that maximise separation by the target feature constructed
in Step 2. This is done on each segment derived in Step 3.
Step 5: Segmentation into homogeneous patient groups. With these pre-processing
and dimensionality reduction steps completed, an unsupervised clustering method is
deployed to derive homogenous patient groups. This paper uses an unsupervised clus-
tering method, as we assume that the true class of patients are unknown. The use of a
supervised method, for example, using cost labels may create groups that are homoge-
nous in terms of cost only. This therefore does not meet the clinical relevance criteria.
In particular, we deploy an agglomerative hierarchical clustering (HAC) algorithm
using the LDA components generated on each age segments (<16 years old and >=16
years old) as input data to generate 13 and 25 patient groups respectively. The group
numbers reflect the number of segments generated by the current grouper, to facilitate
comparison.

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors investigate whether segmentation by socioeconomic/mental status can improve the performance and interpretability of an upstream predictive model, relative to a unitary model, and the interpretation of the segment-specific models reveals a reduced impact of burn severity in LOS prediction with increasing adverse socioeconomic and mental status.
Abstract: With a reduction in the mortality rate of burn patients, length of stay (LOS) has been increasingly adopted as an outcome measure. Some studies have attempted to identify factors that explain a burn patient’s LOS. However, few have investigated the association between LOS and a patient’s mental and socioeconomic status. There is anecdotal evidence for links between these factors; uncovering these will aid in better addressing the specific physical and emotional needs of burn patients and facilitate the planning of scarce hospital resources. Here, we employ machine learning (clustering) and statistical models (regression) to investigate whether segmentation by socioeconomic/mental status can improve the performance and interpretability of an upstream predictive model, relative to a unitary model. Although we found no significant difference in the unitary model’s performance and the segment-specific models, the interpretation of the segment-specific models reveals a reduced impact of burn severity in LOS prediction with increasing adverse socioeconomic and mental status. Furthermore, the socioeconomic segments’ models highlight an increased influence of living circumstances and source of injury on LOS. These findings suggest that in addition to ensuring that patients’ physical needs are met, management of their mental status is crucial for delivering an effective care plan.

5 citations

Posted Content
TL;DR: In this article, a cost-sensitive decision tree model is adopted to identify features of importance and rules that allow for a focused segmentation on resource usage (LOS and patient-level cost) and clinical similarity.
Abstract: The adoption of the Prospective Payment System (PPS) in the UK National Health Service (NHS) has led to the creation of patient groups called Health Resource Groups (HRG). HRGs aim to identify groups of clinically similar patients that share similar resource usage for reimbursement purposes. These groups are predominantly identified based on expert advice, with homogeneity checked using the length of stay (LOS). However, for complex patients such as those encountered in burn care, LOS is not a perfect proxy of resource usage, leading to incomplete homogeneity checks. To improve homogeneity in resource usage and severity, we propose a data-driven model and the inclusion of patient-level costing. We investigate whether a data-driven approach that considers additional measures of resource usage can lead to a more comprehensive model. In particular, a cost-sensitive decision tree model is adopted to identify features of importance and rules that allow for a focused segmentation on resource usage (LOS and patient-level cost) and clinical similarity (severity of burn). The proposed approach identified groups with increased homogeneity compared to the current HRG groups, allowing for a more equitable reimbursement of hospital care costs if adopted.
Proceedings ArticleDOI
16 Nov 2020
TL;DR: An ad-hoc genetic algorithm is proposed which combines filter feature selection and clustering strategies to determine if there is a set of features related to the case-mix that allow to reach the same categorisation proposed by the MINSAL.
Abstract: The healthcare services must provide quality health safeguarding the efficient use of the resources. To evaluate technical efficiency performing fairly comparisons it is necessary to group the hospitals according to the type of patient treated: case-mix. Generally, this evaluation is performed by using the Related Groups for Diagnosis (DRG) system. Since only a few hospitals have implemented this system in Chile, the analysis of technical efficiency results limited. The Ministry of Health of Chile (MINSAL) has proposed an administrative categorisation for the public hospitals: high, medium and low complexity. However, it has not been studied if this definition is associated to the case-mix and if it can be used to study technical efficiency. In this work, we propose an ad-hoc genetic algorithm which combines filter feature selection and clustering strategies to determine if there is a set of features related to the case-mix that allow to reach the same categorisation proposed by the MINSAL. The results show that, although a small set of features is able to reach this categorisation by year, there is not enough evidence to establish a relationship with the case-mix. It is recommended that future technical efficiency analyses use new categorisations based on case-mix instead of the MINSAL categorisation.
References
More filters
Journal ArticleDOI
TL;DR: Examination of the value of adding functioning information into casemix systems with respect to the prediction of resource use as measured by costs and length of stay suggests that, in particular, DRG casemIX systems can be improved in predicting resource use and capturing outcomes for frail elderly or severely functioning-impaired patients.
Abstract: Contemporary casemix systems for health services need to ensure that payment rates adequately account for actual resource consumption based on patients’ needs for services. It has been argued that functioning information, as one important determinant of health service provision and resource use, should be taken into account when developing casemix systems. However, there has to date been little systematic collation of the evidence on the extent to which the addition of functioning information into existing casemix systems adds value to those systems with regard to the predictive power and resource variation explained by the groupings of these systems. Thus, the objective of this research was to examine the value of adding functioning information into casemix systems with respect to the prediction of resource use as measured by costs and length of stay. A systematic literature review was performed. Peer-reviewed studies, published before May 2014 were retrieved from CINAHL, EconLit, Embase, JSTOR, PubMed and Sociological Abstracts using keywords related to functioning (‘Functioning’, ‘Functional status’, ‘Function*, ‘ICF’, ‘International Classification of Functioning, Disability and Health’, ‘Activities of Daily Living’ or ‘ADL’) and casemix systems (‘Casemix’, ‘case mix’, ‘Diagnosis Related Groups’, ‘Function Related Groups’, ‘Resource Utilization Groups’ or ‘AN-SNAP’). In addition, a hand search of reference lists of included articles was conducted. Information about study aims, design, country, setting, methods, outcome variables, study results, and information regarding the authors’ discussion of results, study limitations and implications was extracted. Ten included studies provided evidence demonstrating that adding functioning information into casemix systems improves predictive ability and fosters homogeneity in casemix groups with regard to costs and length of stay. Collection and integration of functioning information varied across studies. Results suggest that, in particular, DRG casemix systems can be improved in predicting resource use and capturing outcomes for frail elderly or severely functioning-impaired patients. Further exploration of the value of adding functioning information into casemix systems is one promising approach to improve casemix systems ability to adequately capture the differences in patient’s needs for services and to better predict resource use.

33 citations

Posted Content
TL;DR: This paper proposes a model for extracting multidimensional data clustering of health database and implemented four dimension reduction techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Self Organizing Map (SOM) and FastICA.
Abstract: The current data tends to be more complex than conventional data and need dimension reduction. Dimension reduction is important in cluster analysis and creates a smaller data in volume and has the same analytical results as the original representation. A clustering process needs data reduction to obtain an efficient processing time while clustering and mitigate curse of dimensionality. This paper proposes a model for extracting multidimensional data clustering of health database. We implemented four dimension reduction techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Self Organizing Map (SOM) and FastICA. The results show that dimension reductions significantly reduce dimension and shorten processing time and also increased performance of cluster in several health datasets.

19 citations

Journal Article
TL;DR: In this article, the authors proposed a model for extracting multidimensional data clustering of health database and implemented four dimension reduction techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Self Organizing Map (SOM), and FastICA.
Abstract: The current data tends to be more complex than conventional data and need dimension reduction. Dimension reduction is important in cluster analysis and creates a smaller data in volume and has the same analytical results as the original representation. A clustering process needs data reduction to obtain an efficient processing time while clustering and mitigate curse of dimensionality. This paper proposes a model for extracting multidimensional data clustering of health database. We implemented four dimension reduction techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Self Organizing Map (SOM) and FastICA. The results show that dimension reductions significantly reduce dimension and shorten processing time and also increased performance of cluster in several health datasets.

11 citations

Frequently Asked Questions (2)
Q1. What have the authors stated for future works in "A clustering-based patient grouper for burn care" ?

The collection of patient-level cost, at a national scale, has created the possibility of generating improved data-driven groups. Future work will be aimed at exploring changes to their analytical model, including the consideration of different approaches to dimensionality reduction and cluster analysis, as well as the inclusion of expert opinion in feature selection and group validation. The authors have been able to highlight that improvements can be made in identifying patient case mix suitable for payment rate derivation. There could be further reduction in within cluster variance with the use of state-ofthe-art clustering algorithms that simultaneously consider Step 2, 3 and 4 of their analysis. 

In this paper, the authors describe the development of such a grouper using established techniques for dimensionality reduction and cluster analysis. Using a registry of patients from 23 burn services in England and Wales, the authors demonstrate a reduction of within cluster cost-variation in the identified groups, when compared to the original casemix.