What have the authors contributed in "Interactive visual analysis of heterogeneous cohort study data" ?

Cohort studies in medicine are conducted to enable the study of medical hypotheses in large samples. The analysis is usually hypothesis-driven, i. e., a specific subset of such data is studied to confirm or reject specific hypotheses. In this paper, the authors demonstrate how they enable the interactive visual exploration and analysis of such data, helping with the generation of new hypotheses and contributing to the process of validating them. The authors propose a data-cube based model which handles partially overlapping data subsets during the interactive visualization. The authors implemented this model in an application prototype, and used it to analyze data acquired in the context of a cohort study on cognitive aging. The authors present case-study analyses of selected aspects of brain connectivity by using the prototype implementation of the presented model, to demonstrate its potential and flexibility..

What are the future works mentioned in the paper "Interactive visual analysis of heterogeneous cohort study data" ?

In future, the authors plan to continue in this research direction, and extend the capabilities of this tool. As future work the authors also plan to import genotype data for the subjects, that at the time being was not readily available, and to integrate 2D/3D graph views for representing the brain connectivity information. Finally, the authors plan to include the retrieval and visualization of patient-specific image data, to assess whether outliers originate from the image data, or whether they are the result of an erroneous derivation process.

What are the key aspects that were regarded as useful in generating new hypotheses?

The key aspects that were regarded as most useful in generating new hypotheses are: having the whole data at hand in one tool, the ease of use, and being able to fire queries in the tool.

How do the authors encode the values of the fibers?

The aggregated values are then encoded, upon normalization, by modifying the color saturation of each fiber segment (high values resulting in high saturation).

How do the authors present statistics in physical space?

To present selections and statistical information in physical space, the authors employ a brain atlas onto which aggregated statistics can be mapped.

What is the correlation between the Stroop task scores and the cortical thickness?

At this point the authors also wonder whether any relation between the Stroop task scores and the cortical thickness is present in the data, as thickness is another measure that has been shown to correlate with level of cognitive functions [3]

(Open Access) Interactive Visual Analysis of Heterogeneous Cohort-Study Data (2014) | Paolo Angelelli

Q: Why do analysts have to limit their attention to subsets of the data?

Due to the complexities posed by the data heterogeneity, analysts often have to limit their attention to subsets of the data, making the analysis lose the overall relations within different modalities.

Q: How fast can OLAP cubes perform on relational data?

It has been reported that executing complex queries using OLAP cubes can perform about hundred times faster than doing the same on relational data [4].

Q: What is the typical workflow approach to analyze the data coming from such studies?

The typical workflow approach to analyze the data coming from such studies is to manually extract the pieces of data to analyze from the dataset (e.g., using custom scripts or programs for each analysis), andthen process them using mathematical and statistical packages.

Q: What are the methods to evaluate specific hypotheses based on such cohort study data?

There are means to evaluate specific hypotheses, based on such cohort study data, often involving accordingly designed data extraction, transformation, and fusion approaches.

Q: What are the dimensions and measures used to access?

The dimensions and measures can be thought of as independent and dependent variables, and dimension coordinates are used to access the measures.

City, University of London Institutional Repository

Citation: Angelelli, P., Oeltze, S., Turkay, C., Haasz, J., Hodneland, E., Lundervold, A.,

Hauser, H. and Preim, B. (2014). Interactive Visual Analysis of Heterogeneous Cohort Study

Data. IEEE Computer Graphics and Applications, PP(99), doi: 10.1109/MCG.2014.40

This is the unspecified version of the paper.

This version of the publication may differ from the final published

version.

Permanent repository link: https://openaccess.city.ac.uk/id/eprint/3846/

Link to published version: http://dx.doi.org/10.1109/MCG.2014.40

University of London available to a wider audience. Copyright and Moral

Rights remain with the author(s) and/or copyright holders. URLs from

City Research Online may be freely distributed and linked to.

Reuse: Copies of full items can be used for personal research or study,

educational, or not-for-profit purposes without prior permission or

charge. Provided that the authors, title and full bibliographic details are

credited, a hyperlink and/or URL is given for the original metadata page

and the content is not changed in any way.

City Research Online: http://openaccess.city.ac.uk/ publications@city.ac.uk

City Research Online

Interactive Visual Analysis of Heterogeneous Cohort Study Data

Paolo Angelelli, Steffen Oeltze, Judit Ha

asz, Cagatay Turkay, Erlend Hodneland,

Arvid Lundervold, Astri J. Lundervold, Bernhard Preim and Helwig Hauser

Abstract— Cohort studies in medicine are conducted to enable the study of medical hypotheses in large samples. Often, a large

amount of heterogeneous data is acquired from many subjects. The analysis is usually hypothesis-driven, i.e., a speciﬁc subset of

such data is studied to conﬁrm or reject speciﬁc hypotheses. In this paper, we demonstrate how we enable the interactive visual

exploration and analysis of such data, helping with the generation of new hypotheses and contributing to the process of validating

them. We propose a data-cube based model which handles partially overlapping data subsets during the interactive visualization.

This model enables seamless integration of the heterogeneous data, as well as linking spatial and non-spatial views on these data.

We implemented this model in an application prototype, and used it to analyze data acquired in the context of a cohort study on

cognitive aging. We present case-study analyses of selected aspects of brain connectivity by using the prototype implementation of

the presented model, to demonstrate its potential and ﬂexibility. .

Index Terms—heterogeneous data, medical visualization, IVA

1 INTRODUCTION

Cohort studies in medicine become increasingly common, partly

thanks to the availability and to the recent improvements in medical

imaging technologies. Such studies are a type of observational study

that follows one or more groups of people (samples), called cohorts,

over time. They are used to evaluate medical hypotheses in samples

sharing common characteristics, for example being healthy, or present-

ing speciﬁc risk factors, to gain a better understanding of the absolute

risks of certain pathologies and of the pathology development. Cohort

study data is often acquired over longer time periods, following strictly

deﬁned protocols, being therefore not trivial to set up. Because of that,

they are often designed to deliver a larger variety of data than the focus

of the initial study, which, later on, can be the basis for retrospective

analyses, evaluating further sets of hypotheses.

There are means to evaluate speciﬁc hypotheses, based on such co-

hort study data, often involving accordingly designed data extraction,

transformation, and fusion approaches. However, there is a lack of

technology to support the ﬂexible and open-ended exploration of such

data, mostly because of its heterogeneity. This means collections of

image and non-image (quantitative, often image-derived) data, which

in turn can be categorical and numerical, and deﬁned on domains that

only partly overlap. Due to the complexities posed by the data het-

erogeneity, analysts often have to limit their attention to subsets of

the data, making the analysis lose the overall relations within different

modalities. Integrating all the available data within one visual analysis

tool that allows to seamlessly combine them in an on demand fashion

is expected to support the experts in the exploration of heterogeneous

cohort study data and in the hypothesis generation and veriﬁcation,

and to accelerate their research workﬂow.

The exploration and analysis of heterogeneous cohort study data

generates speciﬁc new challenges for visualization. The contribution

of this article is therefore two-fold. First, in Section 2, we charac-

terize these challenges, in relation to the substantial heterogeneity of

the data, and in relation to the analysis tasks, goals, and typical ana-

lysis workﬂow in the speciﬁc context of a cohort study on cognitive

• Paolo Angelelli and Helwig Hauser are with the department of Informatics

at the University of Bergen. E-Mail: paolo.angelelli@uib.no .

• Cagatay Turkay is with giCentre at City University, London.

• Steffen Oeltze and Bernhard Preim are with the department of Informatics

at the University of Magdeburg.

• Judit Ha

asz, Erlend Hodneland and Arvid Lundervold are with the

department of Biomedicine at the University of Bergen.

• Astri J. Lundervold is with the department of Biological and Medical

Psychology at the University of Bergen.

aging. Second, in Section 4, we describe our solution, based on a new,

general multi data-cube model to support heterogeneous data, and that

can be also adapted to other situations of highly heterogeneous prob-

lems. Finally, in Section 5 we describe our prototype implementation

of our model, that, in Section 6, we use to exemplify how our novel

approach can enable the generation of new hypotheses, as well as the

swift analysis of relations between otherwise unconnected data parts,

thus improving the analysis and exploration process. In Section 6 we

also provide an evaluation of our method by two domain experts from

the medical and neuropsychological domain.

2 A SCENARIO OF HETEROGENEOUS DATA IN A COHORT

STUDY

One major goal of this work is to create a solution to enable the ex-

plorative visualization and analysis of data that was acquired as part

of a longitudinal study on cognitive aging. During this study, more

than 100 healthy individuals (mean age 60.8 (7.8), 65% females at in-

clusion) were recruited through advertisements in local newspapers.

At inclusion, all the subjects who responded were interviewed, to

exclude those reporting previous or present neurological or psychi-

atric disorders, a history of substance abuse, or other signiﬁcant med-

ical conditions. The neuropsychological evaluation conﬁrmed that the

participants showed no symptoms indicating mild cognitive impair-

ment (MCI) or dementia. Each participant was examined every three

years, starting in year 2004/2005, and then in 2008. The participants

were subjected to neuropsychological testing, genetic analysis (data

not available for this work), and multimodal MR imaging. The re-

sult of each examination consisted of data on white matter ﬁber in-

tegrity, expressed by anisotropy measures computed from diffusion

tensor imaging (DTI), cortical and subcortical gray matter measures,

automatically calculated from structural MR images, and a number

of neuropsychological tests, including the California Verbal Learn-

ing Test–Second Version (CVLT-II), the Color–Word Interference Test

(CWIT), the Digit Symbol Substitution Task from WAIS-R, and the

Mini Mental State Exam (MMSE). To summarize, each examination

(per subject and year) consists of:

• white matter ﬁber bundles with anisotropy measures. Each in-

dividual ﬁber was divided into 100 segments of equal length for

the derivation of associated measures.

• gray matter cortical and subcortical regions with quantitative

measures for each region.

• scores from different neuropsychological tests.

For a detailed description of the study protocol and for previous se-

lected analyses of this longitudinal study please refer to Ystad et

Fig. 1. a) Illustration of the dimensions (red), measures (green) and entities (blue) in the dataset of the cohort study on cognitive aging. The

hierarchy in the ﬁgure is used only for presentation, as the presented model treats the dimensions independently. b) Simpliﬁed illustration of the

proposed model. User interactions are colored in red, automatic operations, transparent to the user, are green, information sources are blue,

and in black the components necessary to implement the model. Note that the selections require interaction to be used as ﬁlters, but are also

automatically re-aggregated upon measure changes in views, or br ush changes, and the result is automatically updated in the views. c) Illustration

of the projection operation. The dimensions which are not common (in red) are processed using a statistical estimator (e.g., average). This

operation can be steered by using a selection for each data-cube to ﬁlter the elements that are aggregated.

al. [14].

2.1 A heterogeneous dataset

Resulting from this study, a number of measures related to different

aspects are available. One speciﬁc challenge with respect to the data

exploration and analysis is that the measure’s domains overlap only

partially. Taking a scatterplot as an example, how should two hete-

rogeneous measures be combined? In our case, these measures could

be the fractional white matter ﬁber anisotropy (FA), that describes the

degree of anisotropy of water diffusion along a ﬁber, deﬁned for each

segment of each ﬁber bundle, and the thickness of the cortex, available

for each cortical region in both left and right brain hemisphere. This

partial incompatibility of the data domains proved to be one if not

the key challenge of this work. To overcome this challenge we devel-

oped the method presented in this article, able to seamlessly combine

heterogeneous measures on the ﬂy.

2.2 Abstract and physical data and their representation

In such studies certain measures, such as white matter FA or gray mat-

ter region volume, as well as others, are quantitative abstract measures

that relate to physical (anatomical) entities. These, for the example,

would be the white matter ﬁbers or the gray matter regions. For these

entities additional qualitative data is often also acquired, such as the

bundles trajectories, or brain regions meshes or volumes. While anal-

yses are often performed on the quantitative measures, it also becomes

necessary to occasionally fetch and inspect the related anatomical data,

to explain, for example, data outliers, or to see what effects certain

conditions have on the anatomy. For these reasons domain experts

would beneﬁt from a system that can link different types of data, and

bring up the appropriate sets on demand, e.g. in linked views.

In addition, when dealing with abstract views of measures related to

physical entities, domain experts often need to relate groups of entities,

such as selections, in abstract views to their physical location. To ease

this process we propose to use a view with an illustrative physical

model, or atlas, of the entities, which is linked to the other views.

Through this atlas, the content of the selections is put in its physical

context, to improve the understanding of such data. The deﬁnition of

this model for the speciﬁc case described in this article, and its use, are

described in Section 4.5.

3 RELATED WORK

While the majority of visualization research –in particular also medi-

cal visualization– was (and still is) focused on the visualization of in-

dividual datasets, the visualization of data from population studies has

not been a research topic until recently. One recent exception is the

work of Bruckner et al. [1], presenting a system to retrieve and visu-

alize anatomical brain data of Drosophila, covered in a large database

of such ﬂies’ brains. This system enables a novel way to perform vi-

sual queries, combined with a volume rendering solution called Max-

imum Intensity Difference Accumulation (MIDA). Still in the biol-

ogy domain, Jeanquartier and Holzinger presented a visual analytics

approach for cell physiology to support the exploration and sense-

making process. [5]. Steenwijk et al. [10] also presented a novel visual

analytics framework to query and visualize data from a cohort study

, consisting of imaging and non-imaging data for each subject. Their

approach was to preprocess and store the imaging and non-imaging

data in a searchable relational database, to which a visual interface

would perform dynamic queries. Still in the healthcare domain, Si-

monic et al. [9] presented a visualization system to improve prediction

and treatment of patients based on longitudinal data.

More generally, few other visual analysis methods have been pro-

posed for the analysis of higher-dimensional and heterogeneous data.

One relevant related solution was presented by North et al. [7], who

introduced visualization schemas to achieve the concurrent analysis

of different sources of information in relational databases. Their sys-

tem enables building coordinated visualizations in a similar fashion as

when constructing relational data schemas. More recently, Weaver

uses a method called cross-ﬁltered views [13] to interactively drill

down into multidimensional relations between multiple datasets. In

his method, different variables are visualized in particular views and

brushes in these multiple views are cross-ﬁltered to discover complex

relations in the data.

4 A DATA-CUBE BASED MODEL TO ENABLE INTERACTIVE

VISUAL ANALYSIS

The typical workﬂow approach to analyze the data coming from such

studies is to manually extract the pieces of data to analyze from the

dataset (e.g., using custom scripts or programs for each analysis), and

then process them using mathematical and statistical packages. Fi-

nally, plots of the results are generated either using custom scripts, or

by importing the results into applications that can plot the data.

The ﬁrst, and perhaps the biggest challenge in designing an inter-

active visualization system targeted at this problem is storing the data

acquired with such studies in a way that allows fast and ﬂexible ac-

cess, retaining the meta-information expressing the relationships be-

tween the different pieces of data. Organizing the data in a relational

database, similarly to Steenwijk et al. [10], is probably the ﬁrst solu-

tion at hand, and possibly the easiest to design from scratch.

However, organizing data in a relational database is relatively in-

ﬂexible: the database schema is bound to the speciﬁc structure of a

particular study, together with the queries associated to it. Using a

system designed in such a way to analyze a different dataset would

require the redeﬁnition of the database schema, as well as reprogram-

ming the logic for data access. In addition, processing the queried data

with mathematical or statistical methods that are not implemented in

the database itself would require an additional application layer into

which the data should be loaded, thus voiding the beneﬁts of using a

relational database. Finally, from a performance point of view, using

a relational database to perform complex queries touching all the rows

on a large amount of data becomes quickly a performance bottleneck

in interactive operations, and this is even more problematic when item

selection and measure ﬁltering based on multiple attributes, requiring

table joins, are used.

With Polaris, Stolte et al. [11] showed how visualization systems

can also ground on data organized in a n-dimensional, possibly hier-

archical, data-cube, which is also known as OLAP cube (for On-Line

Analytical Processing) in the ﬁeld of data warehousing. It has been re-

ported that executing complex queries using OLAP cubes can perform

about hundred times faster than doing the same on relational data [4].

A single, hierarchical, data-cube organization however, shows its lim-

itations when the dataset, and its dimensionality, become heteroge-

neous.

4.1 Data-cubes: dimensions, entities and measures

In our model, data-cubes are constructed using categorical attributes

as dimensions, while quantitative numerical values are stored as mea-

sures [11]. The dimensions and measures can be thought of as in-

dependent and dependent variables, and dimension coordinates are

used to access the measures. Practically, after assigning an order to

the dimensions of a cube, a data-cube can be implemented as an in-

memory n-dimensional array. To make an example taken from the

system presented in this paper, a measure for segments of white mat-

ter ﬁber bundles in our dataset, e.g., FA, is represented as a ﬂoating

point n-dimensional array consisting of n = 4 dimensions: subject,

year, bundle, and segment.

Compared to the model proposed for Polaris, we also introduce a

third concept, called entity. An entity can be thought of as a row in

a database table, and quantitative row ﬁelds would be the measures

for that entity. In the example above, the measure ﬁbersegment.fa (fa

for fractional anisotropy) would be related to the entity ﬁbersegment,

being a measure of that entity. When, in our model, a data selection

is deﬁned, it also contains selection values for entities, which are then

propagated to the measures related to it when it becomes necessary.

4.2 Multiple data-cubes and seamless dimension aggre-

gation

A challenging feature of the data acquired in cohort studies is their

heterogeneity. This means that measures are collected for different

entities, which do not share the same set of dimensions. In our spe-

ciﬁc case, when referring to entities, we can talk of white matter ﬁber

segments, grey matter subcortical regions and grey matter cortical re-

gions, as well as neuropsychological tests. As shown in Figure 1a,

the dimensions’ sets of the measures are only partially overlapping,

having all these entities in common only two dimensions, subject and

year. The standard way to organize these data into a single data-cube

would be to build a denormalized cube characterized by all the dimen-

sions in the dataset, which would contain all the data. When the data is

signiﬁcantly heterogeneous, however, this strategy may lead to an ex-

plosion of the memory requirements caused by the denormalization.

In the model that we present here, the solution to this problem is

twofold: on one side we store all the data in multiple, normalized data-

cubes, to eliminate any kind of information redundancy and minimize

the memory occupancy. Secondly, we propose runtime aggregation

of the measures’ data-cubes, when data which are held in data-cubes

belonging to different entities have to be combined or cross-checked.

Such aggregation operation is also referred to as the projection of a

data-cube [11] (see Fig. 1c). Our model includes an engine to per-

form aggregation on-the-ﬂy, for reducing the data-cubes’ dimensions

to their largest common subset, without having any embedded knowl-

edge of the relations between measures. In contrast, this would be nec-

essary when using a relational model for the data, as the system would

need to incorporate knowledge about each speciﬁc database schema,

together with logic for performing the operations.

In our model, when multiple measures are combined in a visualiza-

tion (e.g., in a scatterplot, a parallel coordinate view, curve view, etc.),

each measure is aggregated across those dimensions not belonging to

the intersection. For the moment we can consider the mean as mea-

sure aggregator, but there are several other options, such as different

statistic estimators which can be selected by the user.

In certain cases, it is also useful to change the level of detail. To

allow this, we enable toggling which common dimension to keep dur-

ing the aggregation. This is similar to a roll-up operation, with the

difference that the dimensions’ structure is treated as hierarchy-less.

Finally, even if some of the dimensions may embed a hierarchy, oth-

ers are independent from each other. For example, it is easy to imagine

that subject is independent from other dimensions, while bundle and

segment are logically nested, as segments are part of a bundle. How-

ever, an imposed dimension hierarchy for all the dimensions would

be useful to represent the data in a tree-like visualization, and let the

user navigate the dataset (as shown in Fig. 1a). To compute such a

hierarchy, we group entities recursively by the number of common

dimensions, with each group reﬂecting dimensions occurring in the

same number of entities. By letting the dimensions that occur in more

entities ﬂoating higher in the tree hierarchy, and then proceeding re-

cursively on subgroups, we can generate a complete hierarchy. Having

deﬁned such a hierarchy, it is possible to represent the measures in our

cohort study data like in Fig. 1a.

4.3 Selections and selection-based ﬁltering

In section 4.2 we explained how to create projections of a measure

by aggregating it over entire dimensions. Obtaining an aggregate of a

measure over a whole brain, however, may not always produce speciﬁc

enough data to answer questions of interest. To enable a more focused

analysis, selection techniques can be used in order to restrict the pro-

cessed or visualized data to speciﬁc subsets under investigation. An

example is the Polaris speciﬁcations [11], introduced for deﬁning se-

lections. Interactive visual analysis has introduced the related concept

of brushing, a visual method to select items with certain characteris-

tics (e.g., ﬁtting certain ranges on speciﬁc measures), by deﬁning a

visual brush over a view on the data. These brushes normally contain

a value for each data item, either binary or a percentage value, to ex-

press if or how much the data item is selected. Our model makes use of

brushing to let the user deﬁne data selections. Using data-cubes, this

brush should be transformed into a data-cube itself, where each item

contains the tag information for the related entity. In our case, hav-

ing several entities in the dataset generates an additional challenge:

when tagging one entity, we must also propagate the selection to all

those other entities in the dataset sharing at least one dimension with

the tagged one. As a clarifying example, let us consider a selection

of only those white matter ﬁber segments above a certain FA thresh-

old. Such selection does not necessarily involve all the examinations,

or even all the subjects. Let us say the user wants to cross only items

in this selection with the cortical thickness. Then this selection has

to be propagated to the entity cortical region, knowing that the shared

dimensions between the entities cortical region and ﬁber segment are

subject and year. This has to be done in an appropriate manner, so

Fig. 2. Screen-shot of the prototype of the proposed model. The Measure Browser (a) lets the user drag desired measures into a view, and the

Selection Manager (b) allows to add new selections, activate them, enable one of them for editing, and drag them into views, to be used as ﬁlters.

The Dimension Brusher (c) enables slicing the data-cubes in the data collection, while the other views (d,e,f), in this setup a scatterplot, a curve

view and a histogram view, can be seen as projections of the data, and allow a more advanced deﬁnition of the selections, by means of brushing

ranges of measures. In each view a drop down menu lets the user adjust the aggregation dimensions as well as the additional analyses to perform.

Finally, the Atlas view (g) represents the selections in their anatomical context using a brain model. The two selections visualized contain, the ﬁrst,

both the ﬁbers and the brain region of the Corpus Callosum anterior, and the second both the ﬁbers of the corticospinal tract and the brainstem

region (colors representing different bundles).

that only those ( subject, year) pairs selected in one entity are selected

also in the other one. In our model we propose a propagation scheme

where a brush on one entity is propagated to all the other entities in the

dataset that share dimensions with the brushed one. The propagation

is done by ﬁrst computing a projection of the brushed entity onto the

common dimensions with all the other entities. Such projections of

the brush are generated using the max operator, which produces, for

each set of items being aggregated along one aggregation coordinate,

the equivalent of a Boolean value indicating whether or not at least one

item was selected. This scheme also allows multiple selections to be

combined using Boolean logic, giving the user the necessary ﬂexibility

in building up expressive item selections.

Once a selection has been deﬁned, it can be used in two manners.

First, selections can be visually highlighted in the views, and thus com-

pared with the whole dataset or with other selections. Second, since

most of the views are built upon aggregated data-cubes, this aggrega-

tion can be steered, or ﬁltered, using a selection ( Fig. 2d ). By setting

a selection as aggregation ﬁlter, the aggregation is performed only us-

ing those items that are tagged in the selection. In this way, carefully

selected information from the dataset can be cross-checked with other

aspects, enabling the user to analyze virtually any aspect of the dataset.

4.4 Unrolling dimensions: a ﬁrst step toward iterated vi-

sual analysis

Using a system implementing our model interactively is a ﬂexible way

to cross-analyze a wide variety of information in such heterogeneous

datasets. In some cases, however, the analysis can beneﬁt from au-

tomating certain steps, like repeating selected tests or analyses using a

scheme deﬁned by the user on different data, or with varying parame-

ters or methods. This could be seen as extending a purely interactive

visual analysis metaphor by using it as a analysis-setup tool for deﬁn-

ing what type of actions to automate. The results of this extension

could be thought as an iterated visual analysis. A clarifying exam-

ple could be correlating age with subcortical region volume. The user

could ﬁrst deﬁne a selection, for example by ﬁltering speciﬁc ages,

or other parameters such as the IQ. This selection could then be used

to ﬁlter the aggregation, which could conclude the interactive analysis

step. Since it is also interesting to have details of how the volume

of each speciﬁc subcortical region correlates with age, the user might

want to combine his interactively speciﬁed selection with another one,

selecting only a speciﬁc subcortical region, and repeat the process for

every subcortical region. To ease this process, enabling at the same

time to produce comparable results, we propose a method to automat-

ically dissect and process the measures present in a speciﬁc view, by

iteratively slabbing each measure’s data-cube along those dimension

that are speciﬁc to the data-cube (e.g., not common). In the exam-

ple above, the only non-common dimension in a view containing only

age and subcortical region volume is the subcortical region, as both

the year and subject measures are common to both the entities (see

Fig. 1a). The expression unrolling a dimension here means automati-

cally generating a sequence of selections for an entity having such di-

mension, each selection containing only data items along one speciﬁc

coordinate of that dimension at a time. The user can choose one or

more of the non common dimensions in the view to unroll, and the au-

tomatically generated selection is combined with a user speciﬁed one,

if present, before aggregation and further analysis take place. When

performing dimension unrolling, however, a large amount of data is

being generated, and we currently deal with it by outputting only the

Interactive Visual Analysis of Heterogeneous Cohort-Study Data

Citations

Big data analytics in health sector: Theoretical framework, techniques and prospects

A Survey of Visual Analytics for Public Health

A review of the literature on big data analytics in healthcare

Interactive Visual Analysis of Image-Centric Cohort Study Data.

3D Regression Heat Map Analysis of Population Study Data

References

An overview of data warehousing and OLAP technology

Polaris: a system for query, analysis, and visualization of multidimensional relational databases

Polaris: a system for query, analysis and visualization of multi-dimensional relational databases

Practice-related effects demonstrate complementary roles of anterior cingulate and prefrontal cortices in attentional control.

Age-related decline in white matter tract integrity and cognitive performance: A DTI tractography and structural equation modeling study

Related Papers (5)

Interactive Visual Analysis of Image-Centric Cohort Study Data.

Hypothesis Generation by Interactive Visual Exploration of Heterogeneous Medical Data

Towards a conceptual framework for visual analytics of time and time-oriented data

Visual Analytics of Image-Centric Cohort Studies in Epidemiology

From visual data exploration to visual data mining: a survey

Frequently Asked Questions (16)

Q1. What have the authors contributed in "Interactive visual analysis of heterogeneous cohort study data" ?

Q2. What are the future works mentioned in the paper "Interactive visual analysis of heterogeneous cohort study data" ?

Q3. What are the key aspects that were regarded as useful in generating new hypotheses?

Q4. Why do analysts have to limit their attention to subsets of the data?

Q5. How fast can OLAP cubes perform on relational data?

Q6. What is the typical workflow approach to analyze the data coming from such studies?

Q7. What can be done to restrict the data to specific subsets?

Q8. What are the methods to evaluate specific hypotheses based on such cohort study data?

Q9. What are the dimensions and measures used to access?

Q10. What is the standard way to organize the data into a single cube?

Q11. What is the way to automate the analysis?

Q12. What is the common anatomical data that is often used to explain the anatomy?

Q13. How do the authors encode the values of the fibers?

Q14. How do you generate plots of the results?

Q15. How do the authors present statistics in physical space?

Q16. What is the correlation between the Stroop task scores and the cortical thickness?