scispace - formally typeset
Open AccessJournal ArticleDOI

Interactive Visual Analysis of Heterogeneous Cohort-Study Data

Reads0
Chats0
TLDR
This approach enables the visual exploration and analysis of large amounts of heterogeneous data, helping to generate and validate hypotheses, and uses a data-cube-based model to handle overlapping data subsets.
Abstract
Medical cohort studies enable the study of medical hypotheses with many samples. Often, these studies acquire a large amount of heterogeneous data from many subjects. Usually, researchers study a specific data subset to confirm or reject specific hypotheses. A new approach enables the interactive visual exploration and analysis of such data, helping to generate and validate hypotheses. A data-cube-based model handles partially overlapping data subsets during the interactive visualization. This model enables seamless integration of the heterogeneous data and the linking of spatial and nonspatial views of the data. Researchers implemented this model in a prototype application and used it to analyze data acquired in a cohort study on cognitive aging. Case studies employed the prototype to study aspects of brain connectivity, demonstrating the model's potential and flexibility.

read more

Content maybe subject to copyright    Report

City, University of London Institutional Repository
Citation: Angelelli, P., Oeltze, S., Turkay, C., Haasz, J., Hodneland, E., Lundervold, A.,
Hauser, H. and Preim, B. (2014). Interactive Visual Analysis of Heterogeneous Cohort Study
Data. IEEE Computer Graphics and Applications, PP(99), doi: 10.1109/MCG.2014.40
This is the unspecified version of the paper.
This version of the publication may differ from the final published
version.
Permanent repository link: https://openaccess.city.ac.uk/id/eprint/3846/
Link to published version: http://dx.doi.org/10.1109/MCG.2014.40
Copyright: City Research Online aims to make research outputs of City,
University of London available to a wider audience. Copyright and Moral
Rights remain with the author(s) and/or copyright holders. URLs from
City Research Online may be freely distributed and linked to.
Reuse: Copies of full items can be used for personal research or study,
educational, or not-for-profit purposes without prior permission or
charge. Provided that the authors, title and full bibliographic details are
credited, a hyperlink and/or URL is given for the original metadata page
and the content is not changed in any way.
City Research Online: http://openaccess.city.ac.uk/ publications@city.ac.uk
City Research Online

Interactive Visual Analysis of Heterogeneous Cohort Study Data
Paolo Angelelli, Steffen Oeltze, Judit Ha
´
asz, Cagatay Turkay, Erlend Hodneland,
Arvid Lundervold, Astri J. Lundervold, Bernhard Preim and Helwig Hauser
Abstract Cohort studies in medicine are conducted to enable the study of medical hypotheses in large samples. Often, a large
amount of heterogeneous data is acquired from many subjects. The analysis is usually hypothesis-driven, i.e., a specific subset of
such data is studied to confirm or reject specific hypotheses. In this paper, we demonstrate how we enable the interactive visual
exploration and analysis of such data, helping with the generation of new hypotheses and contributing to the process of validating
them. We propose a data-cube based model which handles partially overlapping data subsets during the interactive visualization.
This model enables seamless integration of the heterogeneous data, as well as linking spatial and non-spatial views on these data.
We implemented this model in an application prototype, and used it to analyze data acquired in the context of a cohort study on
cognitive aging. We present case-study analyses of selected aspects of brain connectivity by using the prototype implementation of
the presented model, to demonstrate its potential and flexibility. .
Index Terms—heterogeneous data, medical visualization, IVA
1 INTRODUCTION
Cohort studies in medicine become increasingly common, partly
thanks to the availability and to the recent improvements in medical
imaging technologies. Such studies are a type of observational study
that follows one or more groups of people (samples), called cohorts,
over time. They are used to evaluate medical hypotheses in samples
sharing common characteristics, for example being healthy, or present-
ing specific risk factors, to gain a better understanding of the absolute
risks of certain pathologies and of the pathology development. Cohort
study data is often acquired over longer time periods, following strictly
defined protocols, being therefore not trivial to set up. Because of that,
they are often designed to deliver a larger variety of data than the focus
of the initial study, which, later on, can be the basis for retrospective
analyses, evaluating further sets of hypotheses.
There are means to evaluate specific hypotheses, based on such co-
hort study data, often involving accordingly designed data extraction,
transformation, and fusion approaches. However, there is a lack of
technology to support the flexible and open-ended exploration of such
data, mostly because of its heterogeneity. This means collections of
image and non-image (quantitative, often image-derived) data, which
in turn can be categorical and numerical, and defined on domains that
only partly overlap. Due to the complexities posed by the data het-
erogeneity, analysts often have to limit their attention to subsets of
the data, making the analysis lose the overall relations within different
modalities. Integrating all the available data within one visual analysis
tool that allows to seamlessly combine them in an on demand fashion
is expected to support the experts in the exploration of heterogeneous
cohort study data and in the hypothesis generation and verification,
and to accelerate their research workflow.
The exploration and analysis of heterogeneous cohort study data
generates specific new challenges for visualization. The contribution
of this article is therefore two-fold. First, in Section 2, we charac-
terize these challenges, in relation to the substantial heterogeneity of
the data, and in relation to the analysis tasks, goals, and typical ana-
lysis workflow in the specific context of a cohort study on cognitive
Paolo Angelelli and Helwig Hauser are with the department of Informatics
at the University of Bergen. E-Mail: paolo.angelelli@uib.no .
Cagatay Turkay is with giCentre at City University, London.
Steffen Oeltze and Bernhard Preim are with the department of Informatics
at the University of Magdeburg.
Judit Ha
´
asz, Erlend Hodneland and Arvid Lundervold are with the
department of Biomedicine at the University of Bergen.
Astri J. Lundervold is with the department of Biological and Medical
Psychology at the University of Bergen.
aging. Second, in Section 4, we describe our solution, based on a new,
general multi data-cube model to support heterogeneous data, and that
can be also adapted to other situations of highly heterogeneous prob-
lems. Finally, in Section 5 we describe our prototype implementation
of our model, that, in Section 6, we use to exemplify how our novel
approach can enable the generation of new hypotheses, as well as the
swift analysis of relations between otherwise unconnected data parts,
thus improving the analysis and exploration process. In Section 6 we
also provide an evaluation of our method by two domain experts from
the medical and neuropsychological domain.
2 A SCENARIO OF HETEROGENEOUS DATA IN A COHORT
STUDY
One major goal of this work is to create a solution to enable the ex-
plorative visualization and analysis of data that was acquired as part
of a longitudinal study on cognitive aging. During this study, more
than 100 healthy individuals (mean age 60.8 (7.8), 65% females at in-
clusion) were recruited through advertisements in local newspapers.
At inclusion, all the subjects who responded were interviewed, to
exclude those reporting previous or present neurological or psychi-
atric disorders, a history of substance abuse, or other significant med-
ical conditions. The neuropsychological evaluation confirmed that the
participants showed no symptoms indicating mild cognitive impair-
ment (MCI) or dementia. Each participant was examined every three
years, starting in year 2004/2005, and then in 2008. The participants
were subjected to neuropsychological testing, genetic analysis (data
not available for this work), and multimodal MR imaging. The re-
sult of each examination consisted of data on white matter fiber in-
tegrity, expressed by anisotropy measures computed from diffusion
tensor imaging (DTI), cortical and subcortical gray matter measures,
automatically calculated from structural MR images, and a number
of neuropsychological tests, including the California Verbal Learn-
ing Test–Second Version (CVLT-II), the Color–Word Interference Test
(CWIT), the Digit Symbol Substitution Task from WAIS-R, and the
Mini Mental State Exam (MMSE). To summarize, each examination
(per subject and year) consists of:
white matter fiber bundles with anisotropy measures. Each in-
dividual fiber was divided into 100 segments of equal length for
the derivation of associated measures.
gray matter cortical and subcortical regions with quantitative
measures for each region.
scores from different neuropsychological tests.
For a detailed description of the study protocol and for previous se-
lected analyses of this longitudinal study please refer to Ystad et

Fig. 1. a) Illustration of the dimensions (red), measures (green) and entities (blue) in the dataset of the cohort study on cognitive aging. The
hierarchy in the figure is used only for presentation, as the presented model treats the dimensions independently. b) Simplified illustration of the
proposed model. User interactions are colored in red, automatic operations, transparent to the user, are green, information sources are blue,
and in black the components necessary to implement the model. Note that the selections require interaction to be used as filters, but are also
automatically re-aggregated upon measure changes in views, or br ush changes, and the result is automatically updated in the views. c) Illustration
of the projection operation. The dimensions which are not common (in red) are processed using a statistical estimator (e.g., average). This
operation can be steered by using a selection for each data-cube to filter the elements that are aggregated.
al. [14].
2.1 A heterogeneous dataset
Resulting from this study, a number of measures related to different
aspects are available. One specific challenge with respect to the data
exploration and analysis is that the measure’s domains overlap only
partially. Taking a scatterplot as an example, how should two hete-
rogeneous measures be combined? In our case, these measures could
be the fractional white matter fiber anisotropy (FA), that describes the
degree of anisotropy of water diffusion along a fiber, defined for each
segment of each fiber bundle, and the thickness of the cortex, available
for each cortical region in both left and right brain hemisphere. This
partial incompatibility of the data domains proved to be one if not
the key challenge of this work. To overcome this challenge we devel-
oped the method presented in this article, able to seamlessly combine
heterogeneous measures on the fly.
2.2 Abstract and physical data and their representation
In such studies certain measures, such as white matter FA or gray mat-
ter region volume, as well as others, are quantitative abstract measures
that relate to physical (anatomical) entities. These, for the example,
would be the white matter fibers or the gray matter regions. For these
entities additional qualitative data is often also acquired, such as the
bundles trajectories, or brain regions meshes or volumes. While anal-
yses are often performed on the quantitative measures, it also becomes
necessary to occasionally fetch and inspect the related anatomical data,
to explain, for example, data outliers, or to see what effects certain
conditions have on the anatomy. For these reasons domain experts
would benefit from a system that can link different types of data, and
bring up the appropriate sets on demand, e.g. in linked views.
In addition, when dealing with abstract views of measures related to
physical entities, domain experts often need to relate groups of entities,
such as selections, in abstract views to their physical location. To ease
this process we propose to use a view with an illustrative physical
model, or atlas, of the entities, which is linked to the other views.
Through this atlas, the content of the selections is put in its physical
context, to improve the understanding of such data. The definition of
this model for the specific case described in this article, and its use, are
described in Section 4.5.
3 RELATED WORK
While the majority of visualization research –in particular also medi-
cal visualization– was (and still is) focused on the visualization of in-
dividual datasets, the visualization of data from population studies has
not been a research topic until recently. One recent exception is the
work of Bruckner et al. [1], presenting a system to retrieve and visu-
alize anatomical brain data of Drosophila, covered in a large database
of such flies’ brains. This system enables a novel way to perform vi-
sual queries, combined with a volume rendering solution called Max-
imum Intensity Difference Accumulation (MIDA). Still in the biol-
ogy domain, Jeanquartier and Holzinger presented a visual analytics
approach for cell physiology to support the exploration and sense-
making process. [5]. Steenwijk et al. [10] also presented a novel visual
analytics framework to query and visualize data from a cohort study
, consisting of imaging and non-imaging data for each subject. Their
approach was to preprocess and store the imaging and non-imaging
data in a searchable relational database, to which a visual interface
would perform dynamic queries. Still in the healthcare domain, Si-
monic et al. [9] presented a visualization system to improve prediction
and treatment of patients based on longitudinal data.
More generally, few other visual analysis methods have been pro-
posed for the analysis of higher-dimensional and heterogeneous data.
One relevant related solution was presented by North et al. [7], who
introduced visualization schemas to achieve the concurrent analysis
of different sources of information in relational databases. Their sys-
tem enables building coordinated visualizations in a similar fashion as
when constructing relational data schemas. More recently, Weaver
uses a method called cross-filtered views [13] to interactively drill
down into multidimensional relations between multiple datasets. In
his method, different variables are visualized in particular views and
brushes in these multiple views are cross-filtered to discover complex
relations in the data.
4 A DATA-CUBE BASED MODEL TO ENABLE INTERACTIVE
VISUAL ANALYSIS
The typical workflow approach to analyze the data coming from such
studies is to manually extract the pieces of data to analyze from the
dataset (e.g., using custom scripts or programs for each analysis), and

then process them using mathematical and statistical packages. Fi-
nally, plots of the results are generated either using custom scripts, or
by importing the results into applications that can plot the data.
The first, and perhaps the biggest challenge in designing an inter-
active visualization system targeted at this problem is storing the data
acquired with such studies in a way that allows fast and flexible ac-
cess, retaining the meta-information expressing the relationships be-
tween the different pieces of data. Organizing the data in a relational
database, similarly to Steenwijk et al. [10], is probably the first solu-
tion at hand, and possibly the easiest to design from scratch.
However, organizing data in a relational database is relatively in-
flexible: the database schema is bound to the specific structure of a
particular study, together with the queries associated to it. Using a
system designed in such a way to analyze a different dataset would
require the redefinition of the database schema, as well as reprogram-
ming the logic for data access. In addition, processing the queried data
with mathematical or statistical methods that are not implemented in
the database itself would require an additional application layer into
which the data should be loaded, thus voiding the benefits of using a
relational database. Finally, from a performance point of view, using
a relational database to perform complex queries touching all the rows
on a large amount of data becomes quickly a performance bottleneck
in interactive operations, and this is even more problematic when item
selection and measure filtering based on multiple attributes, requiring
table joins, are used.
With Polaris, Stolte et al. [11] showed how visualization systems
can also ground on data organized in a n-dimensional, possibly hier-
archical, data-cube, which is also known as OLAP cube (for On-Line
Analytical Processing) in the field of data warehousing. It has been re-
ported that executing complex queries using OLAP cubes can perform
about hundred times faster than doing the same on relational data [4].
A single, hierarchical, data-cube organization however, shows its lim-
itations when the dataset, and its dimensionality, become heteroge-
neous.
4.1 Data-cubes: dimensions, entities and measures
In our model, data-cubes are constructed using categorical attributes
as dimensions, while quantitative numerical values are stored as mea-
sures [11]. The dimensions and measures can be thought of as in-
dependent and dependent variables, and dimension coordinates are
used to access the measures. Practically, after assigning an order to
the dimensions of a cube, a data-cube can be implemented as an in-
memory n-dimensional array. To make an example taken from the
system presented in this paper, a measure for segments of white mat-
ter fiber bundles in our dataset, e.g., FA, is represented as a floating
point n-dimensional array consisting of n = 4 dimensions: subject,
year, bundle, and segment.
Compared to the model proposed for Polaris, we also introduce a
third concept, called entity. An entity can be thought of as a row in
a database table, and quantitative row fields would be the measures
for that entity. In the example above, the measure fibersegment.fa (fa
for fractional anisotropy) would be related to the entity fibersegment,
being a measure of that entity. When, in our model, a data selection
is defined, it also contains selection values for entities, which are then
propagated to the measures related to it when it becomes necessary.
4.2 Multiple data-cubes and seamless dimension aggre-
gation
A challenging feature of the data acquired in cohort studies is their
heterogeneity. This means that measures are collected for different
entities, which do not share the same set of dimensions. In our spe-
cific case, when referring to entities, we can talk of white matter fiber
segments, grey matter subcortical regions and grey matter cortical re-
gions, as well as neuropsychological tests. As shown in Figure 1a,
the dimensions’ sets of the measures are only partially overlapping,
having all these entities in common only two dimensions, subject and
year. The standard way to organize these data into a single data-cube
would be to build a denormalized cube characterized by all the dimen-
sions in the dataset, which would contain all the data. When the data is
significantly heterogeneous, however, this strategy may lead to an ex-
plosion of the memory requirements caused by the denormalization.
In the model that we present here, the solution to this problem is
twofold: on one side we store all the data in multiple, normalized data-
cubes, to eliminate any kind of information redundancy and minimize
the memory occupancy. Secondly, we propose runtime aggregation
of the measures’ data-cubes, when data which are held in data-cubes
belonging to different entities have to be combined or cross-checked.
Such aggregation operation is also referred to as the projection of a
data-cube [11] (see Fig. 1c). Our model includes an engine to per-
form aggregation on-the-fly, for reducing the data-cubes’ dimensions
to their largest common subset, without having any embedded knowl-
edge of the relations between measures. In contrast, this would be nec-
essary when using a relational model for the data, as the system would
need to incorporate knowledge about each specific database schema,
together with logic for performing the operations.
In our model, when multiple measures are combined in a visualiza-
tion (e.g., in a scatterplot, a parallel coordinate view, curve view, etc.),
each measure is aggregated across those dimensions not belonging to
the intersection. For the moment we can consider the mean as mea-
sure aggregator, but there are several other options, such as different
statistic estimators which can be selected by the user.
In certain cases, it is also useful to change the level of detail. To
allow this, we enable toggling which common dimension to keep dur-
ing the aggregation. This is similar to a roll-up operation, with the
difference that the dimensions’ structure is treated as hierarchy-less.
Finally, even if some of the dimensions may embed a hierarchy, oth-
ers are independent from each other. For example, it is easy to imagine
that subject is independent from other dimensions, while bundle and
segment are logically nested, as segments are part of a bundle. How-
ever, an imposed dimension hierarchy for all the dimensions would
be useful to represent the data in a tree-like visualization, and let the
user navigate the dataset (as shown in Fig. 1a). To compute such a
hierarchy, we group entities recursively by the number of common
dimensions, with each group reflecting dimensions occurring in the
same number of entities. By letting the dimensions that occur in more
entities floating higher in the tree hierarchy, and then proceeding re-
cursively on subgroups, we can generate a complete hierarchy. Having
defined such a hierarchy, it is possible to represent the measures in our
cohort study data like in Fig. 1a.
4.3 Selections and selection-based filtering
In section 4.2 we explained how to create projections of a measure
by aggregating it over entire dimensions. Obtaining an aggregate of a
measure over a whole brain, however, may not always produce specific
enough data to answer questions of interest. To enable a more focused
analysis, selection techniques can be used in order to restrict the pro-
cessed or visualized data to specific subsets under investigation. An
example is the Polaris specifications [11], introduced for defining se-
lections. Interactive visual analysis has introduced the related concept
of brushing, a visual method to select items with certain characteris-
tics (e.g., fitting certain ranges on specific measures), by defining a
visual brush over a view on the data. These brushes normally contain
a value for each data item, either binary or a percentage value, to ex-
press if or how much the data item is selected. Our model makes use of
brushing to let the user define data selections. Using data-cubes, this
brush should be transformed into a data-cube itself, where each item
contains the tag information for the related entity. In our case, hav-
ing several entities in the dataset generates an additional challenge:
when tagging one entity, we must also propagate the selection to all
those other entities in the dataset sharing at least one dimension with
the tagged one. As a clarifying example, let us consider a selection
of only those white matter fiber segments above a certain FA thresh-
old. Such selection does not necessarily involve all the examinations,
or even all the subjects. Let us say the user wants to cross only items
in this selection with the cortical thickness. Then this selection has
to be propagated to the entity cortical region, knowing that the shared
dimensions between the entities cortical region and fiber segment are
subject and year. This has to be done in an appropriate manner, so

Fig. 2. Screen-shot of the prototype of the proposed model. The Measure Browser (a) lets the user drag desired measures into a view, and the
Selection Manager (b) allows to add new selections, activate them, enable one of them for editing, and drag them into views, to be used as filters.
The Dimension Brusher (c) enables slicing the data-cubes in the data collection, while the other views (d,e,f), in this setup a scatterplot, a curve
view and a histogram view, can be seen as projections of the data, and allow a more advanced definition of the selections, by means of brushing
ranges of measures. In each view a drop down menu lets the user adjust the aggregation dimensions as well as the additional analyses to perform.
Finally, the Atlas view (g) represents the selections in their anatomical context using a brain model. The two selections visualized contain, the first,
both the fibers and the brain region of the Corpus Callosum anterior, and the second both the fibers of the corticospinal tract and the brainstem
region (colors representing different bundles).
that only those ( subject, year) pairs selected in one entity are selected
also in the other one. In our model we propose a propagation scheme
where a brush on one entity is propagated to all the other entities in the
dataset that share dimensions with the brushed one. The propagation
is done by first computing a projection of the brushed entity onto the
common dimensions with all the other entities. Such projections of
the brush are generated using the max operator, which produces, for
each set of items being aggregated along one aggregation coordinate,
the equivalent of a Boolean value indicating whether or not at least one
item was selected. This scheme also allows multiple selections to be
combined using Boolean logic, giving the user the necessary flexibility
in building up expressive item selections.
Once a selection has been defined, it can be used in two manners.
First, selections can be visually highlighted in the views, and thus com-
pared with the whole dataset or with other selections. Second, since
most of the views are built upon aggregated data-cubes, this aggrega-
tion can be steered, or filtered, using a selection ( Fig. 2d ). By setting
a selection as aggregation filter, the aggregation is performed only us-
ing those items that are tagged in the selection. In this way, carefully
selected information from the dataset can be cross-checked with other
aspects, enabling the user to analyze virtually any aspect of the dataset.
4.4 Unrolling dimensions: a first step toward iterated vi-
sual analysis
Using a system implementing our model interactively is a flexible way
to cross-analyze a wide variety of information in such heterogeneous
datasets. In some cases, however, the analysis can benefit from au-
tomating certain steps, like repeating selected tests or analyses using a
scheme defined by the user on different data, or with varying parame-
ters or methods. This could be seen as extending a purely interactive
visual analysis metaphor by using it as a analysis-setup tool for defin-
ing what type of actions to automate. The results of this extension
could be thought as an iterated visual analysis. A clarifying exam-
ple could be correlating age with subcortical region volume. The user
could first define a selection, for example by filtering specific ages,
or other parameters such as the IQ. This selection could then be used
to filter the aggregation, which could conclude the interactive analysis
step. Since it is also interesting to have details of how the volume
of each specific subcortical region correlates with age, the user might
want to combine his interactively specified selection with another one,
selecting only a specific subcortical region, and repeat the process for
every subcortical region. To ease this process, enabling at the same
time to produce comparable results, we propose a method to automat-
ically dissect and process the measures present in a specific view, by
iteratively slabbing each measure’s data-cube along those dimension
that are specific to the data-cube (e.g., not common). In the exam-
ple above, the only non-common dimension in a view containing only
age and subcortical region volume is the subcortical region, as both
the year and subject measures are common to both the entities (see
Fig. 1a). The expression unrolling a dimension here means automati-
cally generating a sequence of selections for an entity having such di-
mension, each selection containing only data items along one specific
coordinate of that dimension at a time. The user can choose one or
more of the non common dimensions in the view to unroll, and the au-
tomatically generated selection is combined with a user specified one,
if present, before aggregation and further analysis take place. When
performing dimension unrolling, however, a large amount of data is
being generated, and we currently deal with it by outputting only the

Citations
More filters
Journal ArticleDOI

Big data analytics in health sector: Theoretical framework, techniques and prospects

TL;DR: A structured review regarding healthcare big data analytics focuses on how big data resources are utilised to create organization values/capabilities and presents a number of pragmatic examples to show how the advances in healthcare were made possible.
Journal ArticleDOI

A Survey of Visual Analytics for Public Health

TL;DR: In this article, the authors describe visual analytics solutions aiming to support public health professionals, and thus, preventive measures, and describe requirements, tasks and visual analytics techniques that are widely used in public health before going into detail with respect to applications.
Journal ArticleDOI

A review of the literature on big data analytics in healthcare

TL;DR: An overview of the BDA publication dynamics in the healthcare domain is provided to provide an overview of this scientific field through related examples and a sampling literature review has been conducted.
Journal ArticleDOI

Interactive Visual Analysis of Image-Centric Cohort Study Data.

TL;DR: An Interactive Visual Analysis approach that enables epidemiologists to rapidly investigate the entire data pool for hypothesis validation and generation is proposed and demonstrated by validating and generating hypotheses related to lower back pain as a qualitative evaluation.
Journal ArticleDOI

3D Regression Heat Map Analysis of Population Study Data

TL;DR: A visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease and new hypotheses about relations between breast density and breast lesions with breast cancer are derived.
References
More filters
Journal ArticleDOI

An overview of data warehousing and OLAP technology

TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.
Journal ArticleDOI

Polaris: a system for query, analysis, and visualization of multidimensional relational databases

TL;DR: Polaris is presented, an interface for exploring large multidimensional databases that extends the well-known pivot table interface that includes an interfaces for constructing visual specifications of table-based graphical displays and the ability to generate a precise set of relational queries from the visual specifications.
Proceedings ArticleDOI

Polaris: a system for query, analysis and visualization of multi-dimensional relational databases

TL;DR: Polaris is presented, an interface for exploring large multi-dimensional databases that extends the well-known Pivot Table interface that includes an interfaces for constructing visual specifications of table based graphical displays and the ability to generate a precise set of relational queries from the visual specifications.
Journal ArticleDOI

Practice-related effects demonstrate complementary roles of anterior cingulate and prefrontal cortices in attentional control.

TL;DR: In this paper, the authors used fMRI to examine practice-related changes in neural activity during a variant of the Stroop task and found that the DLPFC's activity decreased gradually as the need for control was reduced, while the ACC's activity dropped off rapidly.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What have the authors contributed in "Interactive visual analysis of heterogeneous cohort study data" ?

Cohort studies in medicine are conducted to enable the study of medical hypotheses in large samples. The analysis is usually hypothesis-driven, i. e., a specific subset of such data is studied to confirm or reject specific hypotheses. In this paper, the authors demonstrate how they enable the interactive visual exploration and analysis of such data, helping with the generation of new hypotheses and contributing to the process of validating them. The authors propose a data-cube based model which handles partially overlapping data subsets during the interactive visualization. The authors implemented this model in an application prototype, and used it to analyze data acquired in the context of a cohort study on cognitive aging. The authors present case-study analyses of selected aspects of brain connectivity by using the prototype implementation of the presented model, to demonstrate its potential and flexibility.. 

In future, the authors plan to continue in this research direction, and extend the capabilities of this tool. As future work the authors also plan to import genotype data for the subjects, that at the time being was not readily available, and to integrate 2D/3D graph views for representing the brain connectivity information. Finally, the authors plan to include the retrieval and visualization of patient-specific image data, to assess whether outliers originate from the image data, or whether they are the result of an erroneous derivation process. 

The key aspects that were regarded as most useful in generating new hypotheses are: having the whole data at hand in one tool, the ease of use, and being able to fire queries in the tool. 

Due to the complexities posed by the data heterogeneity, analysts often have to limit their attention to subsets of the data, making the analysis lose the overall relations within different modalities. 

It has been reported that executing complex queries using OLAP cubes can perform about hundred times faster than doing the same on relational data [4]. 

The typical workflow approach to analyze the data coming from such studies is to manually extract the pieces of data to analyze from the dataset (e.g., using custom scripts or programs for each analysis), andthen process them using mathematical and statistical packages. 

To enable a more focused analysis, selection techniques can be used in order to restrict the processed or visualized data to specific subsets under investigation. 

There are means to evaluate specific hypotheses, based on such cohort study data, often involving accordingly designed data extraction, transformation, and fusion approaches. 

The dimensions and measures can be thought of as independent and dependent variables, and dimension coordinates are used to access the measures. 

The standard way to organize these data into a single data-cube would be to build a denormalized cube characterized by all the dimensions in the dataset, which would contain all the data. 

In some cases, however, the analysis can benefit from automating certain steps, like repeating selected tests or analyses using a scheme defined by the user on different data, or with varying parame-ters or methods. 

While analyses are often performed on the quantitative measures, it also becomes necessary to occasionally fetch and inspect the related anatomical data, to explain, for example, data outliers, or to see what effects certain conditions have on the anatomy. 

The aggregated values are then encoded, upon normalization, by modifying the color saturation of each fiber segment (high values resulting in high saturation). 

plots of the results are generated either using custom scripts, or by importing the results into applications that can plot the data. 

To present selections and statistical information in physical space, the authors employ a brain atlas onto which aggregated statistics can be mapped. 

At this point the authors also wonder whether any relation between the Stroop task scores and the cortical thickness is present in the data, as thickness is another measure that has been shown to correlate with level of cognitive functions [3]