scispace - formally typeset
Open AccessJournal ArticleDOI

Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis

TLDR
In this article, the smooth Euler characteristic transform (SECT) was used to quantify magnetic resonance images (MRIs) of brain cancer patients and found that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays.
Abstract
Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular data make other readily-available forms of data, such as images, an important resource for actionable measures in patients. Our goal is to utilize information given by medical images taken from GBM patients in statistical settings. To do this, we design a novel statistic---the smooth Euler characteristic transform (SECT)---that quantifies magnetic resonance images (MRIs) of tumors. Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. When applied to a cohort of GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. The main takeaways from our findings are thus twofold. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics.

read more

Content maybe subject to copyright    Report

1
Predicting Clinical Outcomes in Glioblastoma:
An Application of Topological and Functional Data Analysis
Lorin Crawford
1-3,
, Anthea Monod
4,
, Andrew X. Chen
5
, Sayan Mukherjee
6-9
, and Ra´ul Rabad´an
5
1 Department of Biostatistics, Brown University, Providence, RI, USA
2 Center for Statistical Sciences, Brown University, Providence, RI, USA
3 Center for Computational Molecular Biology, Brown University, Providence, RI, USA
4 Department of Applied Mathematics, Tel Aviv University, Tel Aviv, Israel
5 Department of Systems Biology, Columbia University, New York, NY, USA
6 Department of Statistical Science, Duke University, Durham, NC, USA
7 Department of Computer Science, Duke University, Durham, NC, USA
8 Department of Mathematics, Duke University, Durham, NC, USA
9 Department of Bioinformatics & Biostatistics, Duke University, Durham, NC, USA
Corresponding E-mail: lorin crawford@brown.edu; antheam@tauex.tau.ac.il
Abstract
Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study
in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular
data make other readily-available forms of data, such as images, an important resource for actionable
measures in patients. Our goal is to utilize information given by medical images taken from GBM patients
in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform
(SECT)—that quantifies magnetic resonance images (MRIs) of tumors. Due to its well-defined inner
product structure, the SECT can be used in a wider range of functional and nonparametric modeling
approaches than other previously proposed topological summary statistics. When applied to a cohort of
GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor
shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features
alone explain more of the variance in GBM patient survival than gene expression, volumetric features,
and morphometric features. The main takeaways from our findings are thus twofold. First, they suggest
that images contain valuable information that can play an important role in clinical prognosis and other
medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical
imaging informatics.
1 Introduction
The field of radiomics is focused on the extraction of quantitative features from medical magnetic res-
onance images (MRIs), typically constructed by tomography and digitally stored as shapes or surfaces.
Quantifying geometric features from shapes in a way that is amenable to computational analyses has
been a long-standing and fundamental challenge in both statistics and radiomics. Overcoming such a
challenge would provide significant breakthroughs in broader scientific disciplines with the potential for
real, practical impact. One particularly important application, where a viable quantification of shapes is
needed, is the study of glioblastoma multiforme (GBM)—a glioma that materializes into aggressive, can-
cerous tumor growths within the human brain. GBM is a disease that is currently under active research in
oncology; it is marked by characteristics that are not common in other cancers, such as spatial diffusivity
and molecular heterogeneity. In human patients, it is a rapidly-progressing disease with a post-diagnosis
survival period of 12-15 months and, currently, there are only limited therapies available [1]. Obtaining

2
molecular information of GBM tumors entails an invasive medical procedure on the patient that is costly
in terms of both time and resources. In comparison, magnetic resonance images (MRIs) of these tumors
are easily accessible and often readily available. Being able to effectively utilize MRIs of GBM tumors
in computational settings increases the potential for well-developed statistical methodology to have a
significant impact in cancer research and future treatment strategies.
There are two key aims of our work in this paper: first, to quantify GBM tumor images to integrate
medical imaging information into statistical models; and second, to explore the utility of medical imaging
information in clinical studies of GBM. To achieve the first aim, we develop a novel statistic, the smooth
Euler characteristic transform (SECT), that summarizes shape information of GBM MRIs as a collection
of smooth curves. This allows the direct implementation of existing statistical models from functional data
analysis (FDA); in particular, it allows tumor shape information to be used as a covariate in regression
frameworks. To achieve the second aim, we study a cohort of individuals with publicly available MRIs
from The Cancer Imaging Archive (TCIA) [2,3], as well as matched genomic and clinical data collected
by The Cancer Genome Atlas (TCGA) [4]. Through our extensive predictive analysis, we demonstrate a
clinically-relevant connection between the shape of brain malignancies and the variation of survival-based
outcomes that are driven by molecular heterogeneity.
The remainder of this paper is organized as follows. In Section 2, we outline the theoretical concepts
used to quantify shape information of tumors and highlight their statistical utility; we also detail the
construction of our statistic that summarizes tumor shape information, the SECT. In Section 3, we detail
how regression methodologies for functional covariates are naturally suited to model the curves that
capture tumor shape information. This connection with functional data allows us to specify a general
regression model that intakes tumor shape information and turns out to be particularly powerful when
conducting predictive inference. For our case study, we focus on Gaussian process (GP) regression with
Markov chain Monte Carlo (MCMC) inference. In Section 4, we use the GP modeling framework to
predict the clinical outcomes of GBM patients using gene expression data, existing morphometric and
volumetric tumor image quantifications, and our proposed tumor shape summaries. Here, we perform a
comparative study between each covariate type across different regressions generated by various covariance
functions. Finally, in Section 5, we close with a discussion on possible future research.
2 Quantifying Tumor Images Using Topology
In this section, we develop a summary statistic that captures shape information from MRI images of GBM
tumors, which will then be used as covariates in a regression model. The key strategy is to construct
these statistics as a function that maps shapes into a Hilbert space. This function has two important
properties: (i) it is injective, and (ii) it admits a well-defined inner product structure. Notably, the inner
product structure allows us to adapt ideas from functional data analysis to specify general regression
models that use shape summary statistics as predictor variables.
2.1 Background on Summary Statistics for Shape Data
Classical approaches represent shapes as a collection of landmark points [5–7]. This data representation
was implemented partly due to the limited image processing technology of the time. Current imaging
technologies have since greatly improved and now allow three-dimensional shapes to be represented as
meshes, which are collections of vertices, edges, and faces. Figure S1 depicts an example of a mesh
representation for a brain tumor and ventricles. Recently, methods have been developed to generate
automated geometric morphometrics for mesh representations [8–11]. However, despite these advance-
ments, both user-specified and automated landmark-based methods are known to suffer from structural
errors when comparing shapes that are highly dissimilar. Some examples of structural errors include:
inaccurate pairwise correspondences between landmarks, alignment problems between dissimilar shapes,

3
and global inconsistency of pairwise mappings. These structural errors tend to accumulate as the number
of landmarks imposed on each shape increases, and a high number of these points is often required to
accurately capture shape information (especially when analyzing diverse shapes) [12]. Such complications
generally make landmark-based approaches less attractive.
Most recently, an approach known as the persistent homology transform (PHT) was developed to
comprehensively address issues induced by landmark-based methods, and to maintain robust quantifica-
tion performance for highly dissimilar and non-isomorphic shapes [13]. While the PHT allows for the
comparison of shapes without requiring landmarks, it does so by producing a collection of persistence
diagrams—multiscale topological summaries used extensively in topological data analysis (TDA). This
is restrictive because the geometry of the resulting summary statistics does not allow for an inner prod-
uct structure that is amenable to (generalized) functional data models [14]. We propose the smooth
Euler characteristic transform (SECT) because it builds upon the theory of the PHT, in that it also
produces a topological summary statistic, but it is constructed to be able to integrate shape information
in regression-based methods. This proves to be particularly useful in our case study on predicting clinical
outcomes in GBM.
2.2 Homology and Persistence
We begin by developing an intuition for persistent homology [15,16], which is a foundational concept in
TDA. Briefly, persistent homology can be viewed as the data-analytic counterpart to homology—a theo-
retical concept from of algebraic topology, where the goal is to study the shape of abstract mathematical
objects, such as sets and spaces, by counting occurrences of geometric patterns. In homology, the geo-
metric patterns of interest are holes: homology groups provide a mathematical language for describing
and keeping track of holes of an abstract mathematical object. The motivation behind classical algebraic
topology is to then use these holes to distinguish between or suggest similarities among different abstract
mathematical objects. For a more detailed review and theoretical discussion of these concepts, see the
Supplementary Material.
Homology. Homology is particularly relevant to our application and case study in GBM. Intuitively,
not only does it describe contrasting physical tumor characteristics, but it also implicitly captures some
information about the stage of disease progression. For example, necrosis is a form of cell injury which
results in the premature death of cells. Multifocality is a radiological observation where individual tumor
cells separate from the main mass and disperse elsewhere within the brain. From an imaging perspective,
necrotic regions show up as dark regions (or holes) within a tumor, while multifocal tumors appear as
segregated masses. Examples of both necrosis and multifocality captured by MRI images are shown
in Figure S2. It has been suggested that the more necrosis or multifocality there is in a GBM tumor,
the more aggressive the disease [17, 18]. Applying homology to radiomic studies not only identifies such
phenomena, but also tracks the number of times they occur and thereby provides a notional measure of
disease severity.
Homology is indexed by integers: the 0th-degree homology captures the number of connected compo-
nents in the shape, the 1st-degree homology captures the number of loops, and the 2nd-degree homology
captures the number of voids. In the context of our GBM application, degree 0 homology corresponds to
tumor masses and lesions. Degrees 1 and 2 homology correspond to necrosis, depending on whether we
are analyzing 2-dimensional image slices from an MRI or the 3-dimensional tumor as a whole.
Despite its intuitive description, computing homology can be challenging. To this end, it is often
convenient to represent the shape as a discrete union of simple building blocks, “glued” together in a
combinatorial fashion. An important example of such a building block is the simplex: simplices are
skeletal elements that take the form of vertices, edges, triangles (faces), tetrahedra, and other higher
dimensional structures. A simplicial complex K is a collection of simplices and represents the discretiza-
tion of a shape or tumor. Meshes that represent three-dimensional shapes are particular examples of

4
finite simplicial complexes (again see Figure S1). There are two key interests in discretizing shapes into
simplicial complexes. First, there exist efficient algorithms to compute homology for such discretizations;
and second, discretization is essential for applying these abstract concepts to real data, where any given
dataset will necessarily be finite.
In this paper, we use the notation H
k
(K) to denote the k-th homology group for the simplicial
complex K. This corresponds to the collection of the k-dimensional elements of the simplicial complex.
For example, H
0
(K) corresponds to the collection of vertices of the simplicial complex or, equivalently,
to the collection of connected components of the shape (e.g. the masses and lesions of a tumor).
Persistent Homology. Persistent homology applies homology to data by continuously tracking the
evolution of homology in the data at different scales (or resolutions). It can thus be seen as a way to
extract and summarize geometric information. In persistent homology, the index s of a filtration tracks
the homological evolution. A filtration is a collection of simplicial complexes {K
s
} where the index s
induces totally ordered sets K
i
K
j
for i < j. As s increases, the sequence of simplicial complexes
{K
s
} also changes and grows. In this way, the index s of the filtration {K
s
} tracks the scale according
to which the “shape” of the data changes and grows. The shape information at each scale s is encoded
by the homology groups H
k
(K
s
) of the simplicial complex K
s
. More specifically, H
0
corresponds to the
vertices, H
1
corresponds to edges, and H
2
corresponds to the faces of the simplicial complex or discretized
shape. An example of a filtration is depicted in Figure 1. Here, the index s corresponds to the value of
height function (which depends on some variable x and is discussed in detail further below) in the vertical
direction ν. We see the evolution of vertices, edges, and a face appearing sequentially with height. Higher
order structures are revealed as s increases.
Computing persistent homology produces a collection of intervals for each degree of homology, where
each interval represents a k-dimensional topological feature (e.g. a connected component, loop, or void
for a general, three-dimensional shape) that is “born” at the parameter value given by the left endpoint
of the interval, and “dies” at the value at the right endpoint. The length of the interval corresponds
to how long the topological feature “lives,” or persists. In this paper, we consider these intervals to be
represented by a persistence diagram. Persistence diagrams treat the start and end points of each interval
as an ordered pair, and displays them as plotted points on a plane where the x-axis corresponds to birth
time and the y-axis is the death time. Thus, one can consider a persistence diagram as a collection of
points on and above the diagonal, with the set of points on the diagonal having infinite multiplicity (and
included for regularity conditions; see the Appendix for further detail).
Persistent Homology Transform. The PHT captures shape information by collecting persistence
diagrams of all degrees of homology, for all possible orientations of the shape. More formally, for a
d-dimensional shape, the PHT results in d-many persistence diagrams arising from height function fil-
trations over infinitely-many direction vectors on the surface of the sphere. The space of persistence
diagrams is a complicated, but theoretically well-defined probability space [19]. In particular, it is a
metric space, meaning that distances between persistence diagrams may be defined. This is important
because distances between PHT summary statistics provide a way of comparing shapes. The injectivity
of the PHT for two- and three-dimensional shapes [13], or the one-to-one relation between the shape
itself and its infinite collection of persistence diagrams, guarantees that the PHT effectively summarizes
all relevant information about the shape.
Considering all possible directions on the surface of the sphere to summarize shape information is
particularly well-suited to our radiomics application. MRI scans of the brain are known to be subject to
noise: the positioning of patients’ heads could vary both between patients and individual scans, causing
image registration issues. Considering all directions on the surface of the sphere bypasses this problem,
and incorporates perturbations directly into the statistic. This is an important feature of the PHT
that we retain in the development of the SECT. We expand upon the PHT to produce a collection of

5
continuous, piecewise linear functions that live in Hilbert space L
2
. The corresponding inner product
structure inherent to Hilbert spaces allows us to apply the SECT to a much broader set of statistical
methodologies. It is worth noting that for select covariance functions, the PHT can be adapted to
nonparametric statistical models [20–22], but this class is considerably limited.
2.3 Smooth Euler Characteristic Transform
While the SECT uses the same underlying mathematical principles as the PHT, it produces a collection of
continuous, piecewise linear functions rather than persistence diagrams. The SECT implements persistent
homology via the Euler characteristic (EC), which is a topological invariant that appears in many branches
of mathematics. In terms of homology, the EC counts the ranks of the homology groups (i.e. the Betti
numbers, β
k
, for the k-th homology group H
k
) in an alternating sum and thus reduces the mathematical
description of holes in a topological space from an algebraic group structure to an integer.
Definition 1. Let X be an arbitrary topological space, H
k
(X) be the k-th homology group of X, and
β
k
be the rank of H
k
(X). The Euler characteristic (EC) χ(X) of X is the alternating sum
χ(X) = β
0
β
1
+ β
2
β
3
+ ··· =
X
k=0
(1)
k
β
k
.
For a discretized shape or surface in three dimensions represented as a simplicial complex K, the EC may
be analogously defined by the number of simplices in K by
χ(K) = V E + F,
where V , E, and F are the numbers of vertices (0-simplices), edges (1-simplices), and faces (2-simplices),
respectively.
Just as homology may be augmented to persistent homology by considering a filtration, ECs may
also be calculated with respect to a filtration. The result is an EC curve, which tracks the progression
of the EC as a function with respect to the filtration. Let the dimension d = {2, 3}, and fix a direction
ν on the surface of the unit circle or sphere S
d1
(where ν S
d1
). Let M
d1
be the set of all
closed, compact subsets (shapes) embedded in R
d
that can be represented in a finite, discrete manner
as simplicial complexes [23]. Next, denote the simplicial complex representation of M M
d1
by K,
and let K
ν
indicate the ν-orientation of K. The sublevel set filtration of K
ν
parameterized by a height
function r(, ) is the set {x K : x · ν r}. The ν-directional parameter height function r
ν
(, ) is
r : K × S
d1
R
{x, ν} 7→ x · ν.
(1)
Denote the extremal heights from this filtration by
a
ν
:= min{r
ν
(x), x K},
b
ν
:= max{r
ν
(x), x K}.
We use the subscript notation to denote the simplicial complex representation K of a shape M , in the
direction ν, as K
ν
for d = {2, 3}. Similarly, we use the superscript notation K
x
ν
to denote the varying
simplicial complex of K
ν
, generated by a sublevel set filtration with respect to Equation (1) and defined
by varying x K
ν
.
Definition 2. The EC curve of K (which discretizes M) in the direction ν is defined by
χ
K
ν
: [a
ν
, b
ν
] Z R
x 7→ χ
K
x
ν
.
(2)

Figures
Citations
More filters
Posted Content

Functional Summaries of Persistence Diagrams

TL;DR: The definition of persistence landscape functions is generalized, several theoretical properties of the persistence functional summaries are established, and their performance in the context of classification using simulated prostate cancer histology data is demonstrated.
Journal ArticleDOI

Topological data analysis in biomedicine: A review

TL;DR: Topological data analysis (TDA) is a set of methods grounded in the mathematical field of algebraic topology that seeks to describe and harness features related to the "shape" of data as discussed by the authors .
Posted Content

Persistent Homology and Euler Integral Transforms

TL;DR: The Euler calculus—an integral calculus based on Euler characteristic as a valuation on constructible functions—is shown to be an incisive tool for answering questions about injectivity and invertibility of recent transforms based on persistent homology for shape characterization.
Journal ArticleDOI

Multiscale topology characterizes dynamic tumor vascular networks

TL;DR: This topological approach validates and quantifies known qualitative trends such as dynamic changes in tortuosity and loops in response to antibodies that modulate vessel sprouting; furthermore, it quantifies the effect of radiotherapy on vessel architecture.
Posted Content

Realizations of Indecomposable Persistence Modules of Arbitrarily Large Dimension

TL;DR: In this article, a simple algebraic construction is proposed to illustrate the existence of infinite families of indecomposable persistence modules over regular grids of sufficient size, and realizations by topological spaces and Vietoris-Rips filtrations, showing that they can actually appear in real data and are not the product of degeneracies.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Book

Gaussian Processes for Machine Learning

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Journal ArticleDOI

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal

TL;DR: A practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics, which makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries.
Journal ArticleDOI

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis" ?

Glioblastoma multiforme ( GBM ) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Specifically, the authors demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. 

Despite these results, several interesting future directions and open questions still remain. However, in the future, it would be useful to see how their topological summary statistics may be integrated within deep learning frameworks. 

Some examples of structural errors include: inaccurate pairwise correspondences between landmarks, alignment problems between dissimilar shapes,3 and global inconsistency of pairwise mappings. 

Most recently, an approach known as the persistent homology transform (PHT) was developed to comprehensively address issues induced by landmark-based methods, and to maintain robust quantification performance for highly dissimilar and non-isomorphic shapes [13]. 

One particularly important application, where a viable quantification of shapes is needed, is the study of glioblastoma multiforme (GBM)—a glioma that materializes into aggressive, cancerous tumor growths within the human brain. 

GBM is a disease that is currently under active research in oncology; it is marked by characteristics that are not common in other cancers, such as spatial diffusivity and molecular heterogeneity. 

The authors propose the smooth Euler characteristic transform (SECT) because it builds upon the theory of the PHT, in that it also produces a topological summary statistic, but it is constructed to be able to integrate shape information in regression-based methods. 

There are two key aims of their work in this paper: first, to quantify GBM tumor images to integrate medical imaging information into statistical models; and second, to explore the utility of medical imaging information in clinical studies of GBM. 

Quantifying geometric features from shapes in a way that is amenable to computational analyses has been a long-standing and fundamental challenge in both statistics and radiomics. 

In Section 4, the authors use the GP modeling framework to predict the clinical outcomes of GBM patients using gene expression data, existing morphometric and volumetric tumor image quantifications, and their proposed tumor shape summaries. 

Current imaging technologies have since greatly improved and now allow three-dimensional shapes to be represented as meshes, which are collections of vertices, edges, and faces.