(Open Access) Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis (2016) | Lorin Crawford

Q: What are the contributions mentioned in the paper "Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis" ?

Glioblastoma multiforme ( GBM ) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Specifically, the authors demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions.

Q: What have the authors stated for future works in "Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis" ?

Despite these results, several interesting future directions and open questions still remain. However, in the future, it would be useful to see how their topological summary statistics may be integrated within deep learning frameworks.

Q: Why does the smooth Euler characteristic transform (SECT) exist?

The authors propose the smooth Euler characteristic transform (SECT) because it builds upon the theory of the PHT, in that it also produces a topological summary statistic, but it is constructed to be able to integrate shape information in regression-based methods.

Q: What are the two key aims of this paper?

There are two key aims of their work in this paper: first, to quantify GBM tumor images to integrate medical imaging information into statistical models; and second, to explore the utility of medical imaging information in clinical studies of GBM.

Q: What is the main purpose of this article?

In Section 4, the authors use the GP modeling framework to predict the clinical outcomes of GBM patients using gene expression data, existing morphometric and volumetric tumor image quantifications, and their proposed tumor shape summaries.

Predicting Clinical Outcomes in Glioblastoma:

An Application of Topological and Functional Data Analysis

Lorin Crawford

1-3,†

, Anthea Monod

4,†

, Andrew X. Chen

, Sayan Mukherjee

6-9

, and Ra´ul Rabad´an

1 Department of Biostatistics, Brown University, Providence, RI, USA

2 Center for Statistical Sciences, Brown University, Providence, RI, USA

3 Center for Computational Molecular Biology, Brown University, Providence, RI, USA

4 Department of Applied Mathematics, Tel Aviv University, Tel Aviv, Israel

5 Department of Systems Biology, Columbia University, New York, NY, USA

6 Department of Statistical Science, Duke University, Durham, NC, USA

7 Department of Computer Science, Duke University, Durham, NC, USA

8 Department of Mathematics, Duke University, Durham, NC, USA

9 Department of Bioinformatics & Biostatistics, Duke University, Durham, NC, USA

† Corresponding E-mail: lorin crawford@brown.edu; antheam@tauex.tau.ac.il

Abstract

Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study

in the ﬁeld of cancer biology. Its rapid progression and the relative time cost of obtaining molecular

data make other readily-available forms of data, such as images, an important resource for actionable

measures in patients. Our goal is to utilize information given by medical images taken from GBM patients

in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform

(SECT)—that quantiﬁes magnetic resonance images (MRIs) of tumors. Due to its well-deﬁned inner

product structure, the SECT can be used in a wider range of functional and nonparametric modeling

approaches than other previously proposed topological summary statistics. When applied to a cohort of

GBM patients, we ﬁnd that the SECT is a better predictor of clinical outcomes than both existing tumor

shape quantiﬁcations and common molecular assays. Speciﬁcally, we demonstrate that SECT features

alone explain more of the variance in GBM patient survival than gene expression, volumetric features,

and morphometric features. The main takeaways from our ﬁndings are thus twofold. First, they suggest

that images contain valuable information that can play an important role in clinical prognosis and other

medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical

imaging informatics.

1 Introduction

The ﬁeld of radiomics is focused on the extraction of quantitative features from medical magnetic res-

onance images (MRIs), typically constructed by tomography and digitally stored as shapes or surfaces.

Quantifying geometric features from shapes in a way that is amenable to computational analyses has

been a long-standing and fundamental challenge in both statistics and radiomics. Overcoming such a

challenge would provide signiﬁcant breakthroughs in broader scientiﬁc disciplines with the potential for

real, practical impact. One particularly important application, where a viable quantiﬁcation of shapes is

needed, is the study of glioblastoma multiforme (GBM)—a glioma that materializes into aggressive, can-

cerous tumor growths within the human brain. GBM is a disease that is currently under active research in

oncology; it is marked by characteristics that are not common in other cancers, such as spatial diﬀusivity

and molecular heterogeneity. In human patients, it is a rapidly-progressing disease with a post-diagnosis

survival period of 12-15 months and, currently, there are only limited therapies available [1]. Obtaining

molecular information of GBM tumors entails an invasive medical procedure on the patient that is costly

in terms of both time and resources. In comparison, magnetic resonance images (MRIs) of these tumors

are easily accessible and often readily available. Being able to eﬀectively utilize MRIs of GBM tumors

in computational settings increases the potential for well-developed statistical methodology to have a

signiﬁcant impact in cancer research and future treatment strategies.

There are two key aims of our work in this paper: ﬁrst, to quantify GBM tumor images to integrate

medical imaging information into statistical models; and second, to explore the utility of medical imaging

information in clinical studies of GBM. To achieve the ﬁrst aim, we develop a novel statistic, the smooth

Euler characteristic transform (SECT), that summarizes shape information of GBM MRIs as a collection

of smooth curves. This allows the direct implementation of existing statistical models from functional data

analysis (FDA); in particular, it allows tumor shape information to be used as a covariate in regression

frameworks. To achieve the second aim, we study a cohort of individuals with publicly available MRIs

from The Cancer Imaging Archive (TCIA) [2,3], as well as matched genomic and clinical data collected

by The Cancer Genome Atlas (TCGA) [4]. Through our extensive predictive analysis, we demonstrate a

clinically-relevant connection between the shape of brain malignancies and the variation of survival-based

outcomes that are driven by molecular heterogeneity.

The remainder of this paper is organized as follows. In Section 2, we outline the theoretical concepts

used to quantify shape information of tumors and highlight their statistical utility; we also detail the

construction of our statistic that summarizes tumor shape information, the SECT. In Section 3, we detail

how regression methodologies for functional covariates are naturally suited to model the curves that

capture tumor shape information. This connection with functional data allows us to specify a general

regression model that intakes tumor shape information and turns out to be particularly powerful when

conducting predictive inference. For our case study, we focus on Gaussian process (GP) regression with

Markov chain Monte Carlo (MCMC) inference. In Section 4, we use the GP modeling framework to

predict the clinical outcomes of GBM patients using gene expression data, existing morphometric and

volumetric tumor image quantiﬁcations, and our proposed tumor shape summaries. Here, we perform a

comparative study between each covariate type across diﬀerent regressions generated by various covariance

functions. Finally, in Section 5, we close with a discussion on possible future research.

2 Quantifying Tumor Images Using Topology

In this section, we develop a summary statistic that captures shape information from MRI images of GBM

tumors, which will then be used as covariates in a regression model. The key strategy is to construct

these statistics as a function that maps shapes into a Hilbert space. This function has two important

properties: (i) it is injective, and (ii) it admits a well-deﬁned inner product structure. Notably, the inner

product structure allows us to adapt ideas from functional data analysis to specify general regression

models that use shape summary statistics as predictor variables.

2.1 Background on Summary Statistics for Shape Data

Classical approaches represent shapes as a collection of landmark points [5–7]. This data representation

was implemented partly due to the limited image processing technology of the time. Current imaging

technologies have since greatly improved and now allow three-dimensional shapes to be represented as

meshes, which are collections of vertices, edges, and faces. Figure S1 depicts an example of a mesh

representation for a brain tumor and ventricles. Recently, methods have been developed to generate

automated geometric morphometrics for mesh representations [8–11]. However, despite these advance-

ments, both user-speciﬁed and automated landmark-based methods are known to suﬀer from structural

errors when comparing shapes that are highly dissimilar. Some examples of structural errors include:

inaccurate pairwise correspondences between landmarks, alignment problems between dissimilar shapes,

and global inconsistency of pairwise mappings. These structural errors tend to accumulate as the number

of landmarks imposed on each shape increases, and a high number of these points is often required to

accurately capture shape information (especially when analyzing diverse shapes) [12]. Such complications

generally make landmark-based approaches less attractive.

Most recently, an approach known as the persistent homology transform (PHT) was developed to

comprehensively address issues induced by landmark-based methods, and to maintain robust quantiﬁca-

tion performance for highly dissimilar and non-isomorphic shapes [13]. While the PHT allows for the

comparison of shapes without requiring landmarks, it does so by producing a collection of persistence

diagrams—multiscale topological summaries used extensively in topological data analysis (TDA). This

is restrictive because the geometry of the resulting summary statistics does not allow for an inner prod-

uct structure that is amenable to (generalized) functional data models [14]. We propose the smooth

Euler characteristic transform (SECT) because it builds upon the theory of the PHT, in that it also

produces a topological summary statistic, but it is constructed to be able to integrate shape information

in regression-based methods. This proves to be particularly useful in our case study on predicting clinical

outcomes in GBM.

2.2 Homology and Persistence

We begin by developing an intuition for persistent homology [15,16], which is a foundational concept in

TDA. Brieﬂy, persistent homology can be viewed as the data-analytic counterpart to homology—a theo-

retical concept from of algebraic topology, where the goal is to study the shape of abstract mathematical

objects, such as sets and spaces, by counting occurrences of geometric patterns. In homology, the geo-

metric patterns of interest are holes: homology groups provide a mathematical language for describing

and keeping track of holes of an abstract mathematical object. The motivation behind classical algebraic

topology is to then use these holes to distinguish between or suggest similarities among diﬀerent abstract

mathematical objects. For a more detailed review and theoretical discussion of these concepts, see the

Supplementary Material.

Homology. Homology is particularly relevant to our application and case study in GBM. Intuitively,

not only does it describe contrasting physical tumor characteristics, but it also implicitly captures some

information about the stage of disease progression. For example, necrosis is a form of cell injury which

results in the premature death of cells. Multifocality is a radiological observation where individual tumor

cells separate from the main mass and disperse elsewhere within the brain. From an imaging perspective,

necrotic regions show up as dark regions (or holes) within a tumor, while multifocal tumors appear as

segregated masses. Examples of both necrosis and multifocality captured by MRI images are shown

in Figure S2. It has been suggested that the more necrosis or multifocality there is in a GBM tumor,

the more aggressive the disease [17, 18]. Applying homology to radiomic studies not only identiﬁes such

phenomena, but also tracks the number of times they occur and thereby provides a notional measure of

disease severity.

Homology is indexed by integers: the 0th-degree homology captures the number of connected compo-

nents in the shape, the 1st-degree homology captures the number of loops, and the 2nd-degree homology

captures the number of voids. In the context of our GBM application, degree 0 homology corresponds to

tumor masses and lesions. Degrees 1 and 2 homology correspond to necrosis, depending on whether we

are analyzing 2-dimensional image slices from an MRI or the 3-dimensional tumor as a whole.

Despite its intuitive description, computing homology can be challenging. To this end, it is often

convenient to represent the shape as a discrete union of simple building blocks, “glued” together in a

combinatorial fashion. An important example of such a building block is the simplex: simplices are

skeletal elements that take the form of vertices, edges, triangles (faces), tetrahedra, and other higher

dimensional structures. A simplicial complex K is a collection of simplices and represents the discretiza-

tion of a shape or tumor. Meshes that represent three-dimensional shapes are particular examples of

ﬁnite simplicial complexes (again see Figure S1). There are two key interests in discretizing shapes into

simplicial complexes. First, there exist eﬃcient algorithms to compute homology for such discretizations;

and second, discretization is essential for applying these abstract concepts to real data, where any given

dataset will necessarily be ﬁnite.

In this paper, we use the notation H

(K) to denote the k-th homology group for the simplicial

complex K. This corresponds to the collection of the k-dimensional elements of the simplicial complex.

For example, H

(K) corresponds to the collection of vertices of the simplicial complex or, equivalently,

to the collection of connected components of the shape (e.g. the masses and lesions of a tumor).

Persistent Homology. Persistent homology applies homology to data by continuously tracking the

evolution of homology in the data at diﬀerent scales (or resolutions). It can thus be seen as a way to

extract and summarize geometric information. In persistent homology, the index s of a ﬁltration tracks

the homological evolution. A ﬁltration is a collection of simplicial complexes {K

} where the index s

induces totally ordered sets K

⊆ K

for i < j. As s increases, the sequence of simplicial complexes

} also changes and grows. In this way, the index s of the ﬁltration {K

} tracks the scale according

to which the “shape” of the data changes and grows. The shape information at each scale s is encoded

by the homology groups H

) of the simplicial complex K

. More speciﬁcally, H

corresponds to the

vertices, H

corresponds to edges, and H

corresponds to the faces of the simplicial complex or discretized

shape. An example of a ﬁltration is depicted in Figure 1. Here, the index s corresponds to the value of

height function (which depends on some variable x and is discussed in detail further below) in the vertical

direction ν. We see the evolution of vertices, edges, and a face appearing sequentially with height. Higher

order structures are revealed as s increases.

Computing persistent homology produces a collection of intervals for each degree of homology, where

each interval represents a k-dimensional topological feature (e.g. a connected component, loop, or void

for a general, three-dimensional shape) that is “born” at the parameter value given by the left endpoint

of the interval, and “dies” at the value at the right endpoint. The length of the interval corresponds

to how long the topological feature “lives,” or persists. In this paper, we consider these intervals to be

represented by a persistence diagram. Persistence diagrams treat the start and end points of each interval

as an ordered pair, and displays them as plotted points on a plane where the x-axis corresponds to birth

time and the y-axis is the death time. Thus, one can consider a persistence diagram as a collection of

points on and above the diagonal, with the set of points on the diagonal having inﬁnite multiplicity (and

included for regularity conditions; see the Appendix for further detail).

Persistent Homology Transform. The PHT captures shape information by collecting persistence

diagrams of all degrees of homology, for all possible orientations of the shape. More formally, for a

d-dimensional shape, the PHT results in d-many persistence diagrams arising from height function ﬁl-

trations over inﬁnitely-many direction vectors on the surface of the sphere. The space of persistence

diagrams is a complicated, but theoretically well-deﬁned probability space [19]. In particular, it is a

metric space, meaning that distances between persistence diagrams may be deﬁned. This is important

because distances between PHT summary statistics provide a way of comparing shapes. The injectivity

of the PHT for two- and three-dimensional shapes [13], or the one-to-one relation between the shape

itself and its inﬁnite collection of persistence diagrams, guarantees that the PHT eﬀectively summarizes

all relevant information about the shape.

Considering all possible directions on the surface of the sphere to summarize shape information is

particularly well-suited to our radiomics application. MRI scans of the brain are known to be subject to

noise: the positioning of patients’ heads could vary both between patients and individual scans, causing

image registration issues. Considering all directions on the surface of the sphere bypasses this problem,

and incorporates perturbations directly into the statistic. This is an important feature of the PHT

that we retain in the development of the SECT. We expand upon the PHT to produce a collection of

continuous, piecewise linear functions that live in Hilbert space L

. The corresponding inner product

structure inherent to Hilbert spaces allows us to apply the SECT to a much broader set of statistical

methodologies. It is worth noting that for select covariance functions, the PHT can be adapted to

nonparametric statistical models [20–22], but this class is considerably limited.

2.3 Smooth Euler Characteristic Transform

While the SECT uses the same underlying mathematical principles as the PHT, it produces a collection of

continuous, piecewise linear functions rather than persistence diagrams. The SECT implements persistent

homology via the Euler characteristic (EC), which is a topological invariant that appears in many branches

of mathematics. In terms of homology, the EC counts the ranks of the homology groups (i.e. the Betti

numbers, β

, for the k-th homology group H

) in an alternating sum and thus reduces the mathematical

description of holes in a topological space from an algebraic group structure to an integer.

Deﬁnition 1. Let X be an arbitrary topological space, H

(X) be the k-th homology group of X, and

be the rank of H

(X). The Euler characteristic (EC) χ(X) of X is the alternating sum

χ(X) = β

− β

+ β

− β

+ ··· =

∞

k=0

(−1)

For a discretized shape or surface in three dimensions represented as a simplicial complex K, the EC may

be analogously deﬁned by the number of simplices in K by

χ(K) = V −E + F,

where V , E, and F are the numbers of vertices (0-simplices), edges (1-simplices), and faces (2-simplices),

respectively.

Just as homology may be augmented to persistent homology by considering a ﬁltration, ECs may

also be calculated with respect to a ﬁltration. The result is an EC curve, which tracks the progression

of the EC as a function with respect to the ﬁltration. Let the dimension d = {2, 3}, and ﬁx a direction

ν on the surface of the unit circle or sphere S

d−1

(where ν ∈ S

d−1

). Let M

d−1

be the set of all

closed, compact subsets (shapes) embedded in R

that can be represented in a ﬁnite, discrete manner

as simplicial complexes [23]. Next, denote the simplicial complex representation of M ∈ M

d−1

by K,

and let K

indicate the ν-orientation of K. The sublevel set ﬁltration of K

parameterized by a height

function r(•, •) is the set {x ∈ K : x · ν ≤ r}. The ν-directional parameter height function r

(•, •) is

r : K × S

d−1

→ R

{x, ν} 7→ x · ν.

(1)

Denote the extremal heights from this ﬁltration by

:= min{r

(x), x ∈ K},

:= max{r

(x), x ∈ K}.

We use the subscript notation to denote the simplicial complex representation K of a shape M , in the

direction ν, as K

for d = {2, 3}. Similarly, we use the superscript notation K

to denote the varying

simplicial complex of K

, generated by a sublevel set ﬁltration with respect to Equation (1) and deﬁned

by varying x ∈ K

Deﬁnition 2. The EC curve of K (which discretizes M) in the direction ν is deﬁned by

: [a

, b

] → Z ⊂ R

x 7→ χ





(2)

Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis

Figures

Citations

Functional Summaries of Persistence Diagrams

Topological data analysis in biomedicine: A review

Persistent Homology and Euler Integral Transforms

Multiscale topology characterizes dynamic tumor vascular networks

Realizations of Indecomposable Persistence Modules of Arbitrarily Large Dimension

References

R: A language and environment for statistical computing.

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Gaussian Processes for Machine Learning

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

Related Papers (5)

A framework for multimodal imaging-based prognostic model building: Preliminary study on multimodal MRI in Glioblastoma Multiforme

Survival prediction of patients suffering from glioblastoma based on two-branch DenseNet using multi-channel features.

Diagnosis-Guided Multi-modal Feature Selection for Prognosis Prediction of Lung Squamous Cell Carcinoma

Predicting cancer outcomes from histology and genomics using convolutional networks

Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome.

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis" ?

Q2. What have the authors stated for future works in "Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis" ?

Q3. What are some examples of structural errors?

Q4. What is the purpose of the PHT?

Q5. What is the purpose of this paper?

Q6. What is the main purpose of this paper?

Q7. Why does the smooth Euler characteristic transform (SECT) exist?

Q8. What are the two key aims of this paper?

Q9. What is the main goal of this paper?

Q10. What is the main purpose of this article?

Q11. What is the definition of a mesh?