scispace - formally typeset
Open AccessJournal ArticleDOI

Compositional data analysis for physical activity, sedentary time and sleep research

Reads0
Chats0
TLDR
The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.
Abstract
The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.

read more

Content maybe subject to copyright    Report

Citation for published version:
Dumuid, D, Stanford, TE, Martin-Fernández, JA, Pedisic, Z, Maher, C, Lewis, L, Hron, K, Katzmarzyk, PT,
Chaput, J-P, Fogelholm, M, Hu, G, Lambert, EV, Maia, J, Sarmiento, OL, Standage, M, Barreira, TV, Broyles,
ST, Tudor-Locke, C, Tremblay, MS & Olds, T 2018, 'Compositional data analysis for physical activity, sedentary
time and sleep research', Statistical Methods in Medical Research, vol. 27, no. 12, pp. 3726-3738.
https://doi.org/10.1177/0962280217710835
DOI:
10.1177/0962280217710835
Publication date:
2018
Document Version
Peer reviewed version
Link to publication
Dorothea Dumuid, Tyman E Stanford, Josep-Antoni Martin-Fernández, Željko Pediši, Carol A Maher, Lucy K
Lewis, Karel Hron, Peter T Katzmarzyk, Jean-Philippe Chaput, Mikael Fogelholm, Gang Hu, Estelle V Lambert,
José Maia, Olga L Sarmiento, Martyn Standage, Tiago V Barreira, Stephanie T Broyles, Catrine Tudor-Locke,
Mark S Tremblay, Timothy Olds, Compositional data analysis for physical activity, sedentary time and sleep
research, Statistical Methods in Medical Research. Copyright © 2017 The Author(s). Reprinted by permission of
SAGE Publications.
University of Bath
Alternative formats
If you require this document in an alternative format, please contact:
openaccess@bath.ac.uk
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Download date: 10. Aug. 2022

1
1 Introduction: Daily activity behaviours are compositional by nature
1
2
Physical inactivity is considered to be a major risk factor for non-communicable disease
3
and premature death.
1
A global economic analysis has estimated the health-care system
4
cost of physical inactivity in 2013 alone to be $ (INT$) 53.8 billion.
2
To produce such
5
economic estimates, physical inactivity is defined as relatively lower amounts of
6
moderate-to-vigorous-intensity physical activity (MVPA), but underlying analyses fail
7
to account for the fact that when time in MVPA is reduced, a subsequent and equal
8
increase in time must be distributed to the remaining behaviour domains: sleep,
9
sedentary time and light-intensity physical activity (light PA), to represent the finite 24-
10
hours, or 1440 minutes, in any given day. These other non-MVPA behaviours may
11
themselves have positive or negative effects on health and mortality.
3-6
Therefore, the
12
health and economic burden of physical inactivity per se remains unclear. Similar
13
estimates for sedentary time
7
are uncertain for the same reason, i.e., they fail to
14
adequately account for other behaviours. Pedišić
8
argued that studies on health
15
outcomes of sleep duration and light PA can be put under the same scrutiny.
16
17
Time spent in MVPA represents one of the exhaustive and mutually exclusive
18
components of an individual’s 24-h day. The non-MVPA time remaining within an
19
individual’s day can be partitioned into light PA, sedentary time and sleep and all of
20

2
them can be considered as relative contributions to the overall time budget. The defined
1
behaviours (MVPA, light PA, sedentary time and sleep) are therefore compositional
2
data, and have important properties that must be respected.
9
3
4
Consider a vector 󰇟
󰇠
with positive components, where
,
5
and is the closure constant. The sample space of the vector can thus be represented
6
by the D-part simplex (S
D
), which is a (D-1)-dimensional subset of the real space (R
D
)
7
due to the constant sum constraint of . Compositional data are scale invariant
9
because
8
the application of a common factor a to the parts
where  ensures the
9
relative difference between the parts is maintained, as



. The
10
numerical value of the closure constant (e.g., 24 h, one week, one month) is irrelevant.
11
Daily behaviours could equivalently be measured in hours, minutes or percentages as
12
the data convey only relative information. The property of scale invariance means
13
compositional data are in fact elements of equivalence classes of proportional vectors.
10
14
Accordingly, the simplex is the sample space of representatives of compositional data
15
with a chosen constant sum constraint. Specific properties of compositional data are
16
followed by a natural geometry, known as the Aitchison geometry.
11
The closure
17
constant representation of compositions imposes perfect multi-collinearity among the
18
components, causing the covariance structure of the data to be negatively biased.
9
19
Accordingly, traditional statistical methods for unconstrained variables in real space
20

3
(e.g., t-tests, multiple linear regression) are not predicative with respect to the specific
1
geometry of the simplex sample space, and should not be used for absolute or raw
2
measures of time spent in daily behaviours.
12
3
4
2 The log-ratio approach for compositional data analysis
5
The invalidity of standard multivariate techniques for analyzing untransformed or raw
6
compositional data was recognized in scientific fields decades ago,
13
and in the 1980s
7
Aitchison proposed a new methodology for the analysis of compositional data.
9
The
8
methodology is based on the premise that any composition (e.g., an individual’s daily
9
time budget) can be expressed in terms of ratios of its parts (e.g., duration of sleep,
10
sedentary time, light PA and MVPA). The expression of compositional data as log-ratio
11
coordinates transfers them from the constrained simplex space to the unconstrained real
12
space, where traditional multivariate statistics may be applied.
14
The presence of zeros
13
in a compositional dataset prohibits applying log-ratio coordinates. Several methods
14
have been proposed to deal with zeros;
15
however they are beyond the scope of this
15
paper.
16
17
A number of log-ratio coordinate systems for compositional data have been described.
11
18
One such coordinate system, the additive log-ratio (alr), has coordinates, , defined by
19
󰇟

󰇠

󰇛
󰇜
󰇟󰇡
󰇢󰇡
󰇢󰇡

󰇢󰇠. (1)
20

4
However, the alr coordinates are asymmetric, because the components x
1
,x
2
,…, x
D-1
are
1
divided by the component x
D
. Moreover, they are not isometric, i.e., distances and
2
angles in the Aitchison geometry are violated by using the alr coordinates, limiting their
3
use in statistical applications. This means that the system of alr coordinates in the
4
Aitchison geometry is oblique, and traditional statistical methods which assume
5
orthogonal coordinates therefore cannot be directly applied. Another coordinate system
6
is the centred log-ratio (clr) coordinate system, , defined as
7
󰇟


󰇠

󰇛
󰇜
󰇣󰇡
󰇢󰇡
󰇢󰇡
󰇢󰇤󰇛󰇜
8
whereis the geometric mean of all the D components of the vector  The clr are
9
symmetric and isometric; they produce a singular covariance matrix because


10
. The clr are, strictly speaking, not coordinates but coefficients with respect to a
11
generating system. The covariance matrix of clr coefficients is singular, so the clr
12
coefficients cannot be fully utilized as independent variables in multiple regression
13
analysis.
14
15
The singularity problem of the clr can be overcome by the use of an isometric log-ratio
16
(ilr) coordinate system. Isometric log ratio coordinates form an isometric mapping of
17
the composition from the simplex sample space to the real space.
16
To construct the ilr
18
coordinates, an orthonormal basis coherent with the Aitchison geometry is created in the
19
(D-1)-dimensional hyperplane of the clr coordinates. Many possible orthonormal
20

Citations
More filters
Journal Article

Integrating sleep, sedentary behaviour, and physical activity research in the emerging field of time-use epidemiology: definitions, concepts, statistical methods, theoretical framework, and future directions

TL;DR: This paper defines the emerging research field's position among established branches of science, explains its main concepts and defines associated terms, recommends suitable data analysis methods, proposes a theoretical model for future research, and identifies key research questions.
Journal ArticleDOI

Health outcomes associated with reallocations of time between sleep, sedentary behaviour, and physical activity: a systematic scoping review of isotemporal substitution studies

TL;DR: It seems that reallocations of sedentary time to LPA or MVPA are associated with significant reduction in mortality risk, and the strongest association with health outcomes is observed when time is reallocated from sedentary behaviour to MVPA.
Journal ArticleDOI

The compositional isotemporal substitution model: A method for estimating changes in a health outcome for reallocation of time between sleep, physical activity and sedentary behaviour.

TL;DR: A way of applying compositional data analysis to estimate change in a health outcome when fixed durations of time are reallocated from one part of a particular time-use composition to another, while the remaining parts are kept constant, based on a multiple linear regression model on isometric log ratio coordinates is presented.
Journal Article

Reallocating time between sleep, sedentary and active behaviours: Associations with obesity and health in Canadian adults.

TL;DR: Findings confirm previous research indicating a strong association between MVPA and markers of obesity and health, particularly among older and overweight/obese individuals and provide evidence that increasing LPA is an important health promotion message for these two subpopulations.
Journal ArticleDOI

Compositional Data Analysis in Time-Use Epidemiology: What, Why, How.

TL;DR: It is illustrated how to comprehensively interpret the CoDA findings in a meaningful way and explore the relationship between daily time use (sleep, sedentary behavior and physical activity) and a health outcome (in this example, adiposity).
References
More filters
Journal ArticleDOI

Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy

TL;DR: In this article, the authors quantify the effect of physical inactivity on these major non-communicable diseases by estimating how much disease could be averted if inactive people were to become active and to estimate gain in life expectancy at the population level.
Journal ArticleDOI

Development of a WHO growth reference for school-aged children and adolescents

TL;DR: The new curves are closely aligned with the WHO Child Growth Standards at 5 years, and the recommended adult cut-offs for overweight and obesity at 19 years.

Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy

TL;DR: In this article, the authors quantify the effect of physical inactivity on these major non-communicable diseases by estimating how much disease could be averted if inactive people were to become active and to estimate gain in life expectancy at the population level.
Journal ArticleDOI

Systematic review of the health benefits of physical activity and fitness in school-aged children and youth

TL;DR: A systematic review of studies examining the relation between physical activity, fitness, and health in school-aged children and youth found that even modest amounts of physical activity can have health benefits in high-risk youngsters (e.g., obese).
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions in this paper?

Dumuid et al. this paper performed compositional data analysis for physical activity, sedentary time and sleep research. 

Four sets of ilr-18 coordinate systems were constructed, each time rotating the sequence of activity 19 behaviours, so that each behaviour was iteratively represented as the first compositional 20part. 

Since almost all previous analyses of the associations 3 between time use and health outcomes have used methods incompatible with 4 compositional data, they are all to some extent vitiated, and should be interpreted with 5 caution. 

1314 zBMI is predicted to increase by 0.057 when sleep is decreased by 5% from the 15 reference/starting composition, relative to remaining behaviours. 

The regression coefficient 𝛽1 represents the change in the response variable when the 11 first ilr coordinate is changed while the remaining ilr coordinates are all kept constant 12 and the total sum is maintained, i.e., ∑ 𝑥𝑖𝑗 𝐷 𝑗=1 = 1. 

The regression coefficient for the second ilr coordinate from the SBP (Table 2) 18 implies that the increase in sleep relative to sedentary time is associated with lower 19expected zBMI. 

The minute values can be 13 calculated from the linear models, as detailed in Supplementary file 2. 14 15BMI: Body Mass Index; SED: Sedentary Time; LPA: Light-Intensity Physical Activity; 4 MVPA: Moderate-to-Vigorous-Intensity Physical Activity. 

The log-ratio approach for compositional data analysis is well established in many 9scientific fields (e.g., geology, biology, hydrology, ecology and economics), and is 10 considered the gold-standard for analyzing compositional data.11