scispace - formally typeset
Open AccessJournal ArticleDOI

Understanding and using time series analyses in addiction research

TLDR
This paper provides addiction researchers with an overview of many of the methods available and guidance on when and how they should be used, sample size determination, reporting, and interpretation, and the importance of pre-registering hypotheses and analysis plans before the analyses are undertaken.
Abstract
Time series analyses are statistical methods used to assess trends in repeated measurements taken at regular intervals and their associations with other trends or events, taking account of the temporal structure of such data. Addiction research often involves assessing associations between trends in target variables (e.g. population cigarette smoking prevalence) and predictor variables (e.g. average price of a cigarette), known as a multiple time series design, or interventions or events (e.g. introduction of an indoor smoking ban), known as an interrupted time series design. There are many analytical tools available, each with its own strengths and limitations. This paper provides addiction researchers with an overview of many of the methods available (GLM, GLMM, GLS, GAMM, ARIMA, ARIMAX, VAR, SVAR, VECM) and guidance on when and how they should be used, sample size det ermination, reporting and interpretation. The aim is to provide increased clarity for researchers proposing to undertake these analyses concerning what is likely to be acceptable for publication in journals such as Addiction. Given the large number of choices that need to be made when setting up time series models, the guidance emphasizes the importance of pre-registering hypotheses and analysis plans before the analyses are undertaken.

read more

Content maybe subject to copyright    Report

This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process which may
lead to differences between this version and the Version of Record. Please cite this article as
doi: 10.1111/add.14643
This article is protected by copyright. All rights reserved.
Beard Emma (Orcid ID: 0000-0001-8586-1261)
Marsden John (Orcid ID: 0000-0002-1307-2498)
Brown Jamie (Orcid ID: 0000-0002-2797-5428)
Understanding and using time series analyses in addiction research
Emma Beard
1, 2
, John Marsden
3
, Jamie Brown
1,2
, Ildiko Tombor
2
, John Stapleton
1,3
, Susan Michie
1
, Robert West
2
1
Research Department of Clinical, Educational and Health Psychology, University College London,
London
2
Department of Behavioural Science and Health, University College London, London
3
Addictions Department, Institute of Psychiatry, Psychology and Neuroscience, King’s College
London, United Kingdom;
Abstract: 181 words
Main text: 7114 words
Suggested running head: Time series analysis in addiction research
Keywords: time series, ARIMA, ARIMAX, VAR, SVAR, VECM, addiction
___________

This article is protected by copyright. All rights reserved.
*
Corresponding author: Emma Beard, Senior Research Associate, Research Department of Clinical,
Educational and Health Psychology, University College London, London; e-mail address:
e.beard@ucl.ac.uk
ABSTRACT
Time series analyses are statistical methods used to assess trends in repeated measurements taken
at regular intervals and their associations with other trends or events taking account of the
temporal structure of such data. Addiction research often involves assessing associations between
trends in target variables (e.g. population cigarette smoking prevalence) and predictor variables
(e.g. average price of a cigarette) known as a multiple time series design, or interventions or events
(e.g. introduction of an indoor smoking ban) known as an interrupted time series design. There are
many analytical tools available, each with its own strengths and limitations. This paper provides
addiction researchers with an overview of many of the methods available (GLM, GLMM, GLS,
GAMM, ARIMA, ARIMAX, VAR, SVAR, VECM), and guidance on when and how they should be used,
sample size determination, reporting, and interpretation. The aim is to provide increased clarity for
researchers proposing to undertake these analyses concerning what is likely to be acceptable for
publication in journals such as Addiction. Given the large number of choices that need to be made
when setting up time series models, the guidance emphasises the importance of pre-registering
hypotheses and analysis plans before the analyses are undertaken.

This article is protected by copyright. All rights reserved.
INTRODUCTION
Time series analyses (TSA) are statistical methods for the analysis of multiple measurements of one
or more variables over time. Sometimes these data reflect responses collected from a single
research participant; but more commonly in social, behavioural and epidemiological research, TSA
are used to study a variable of interest aggregated for a group, region or country. TSA can be a
powerful tool for informing public health policy. There are comprehensive modules on TSA in
statistical software (e.g. R and STATA) and several textbooks (1-4). TSA require the researcher to
navigate a process of statistical modelling that requires a grasp of concepts, terminology and
parameters that will be new to many in the addiction sciences.
This article provides an introduction to the topic, tailored to addiction research, setting out when it
is appropriate to use each method and how to report and interpret findings. The paper is
structured in four parts: Part 1 covers uses of TSA and how to plan the analyses; Part 2 focuses on
TSA concepts and requirements; Part 3 looks at how to conduct TSA; and Part 4 describes how to
report the results. For space reasons, we limit the article to the main TSA approaches that are
supported by major statistical packages. The main types of analysis covered are: Generalised Least-
Squares (GLS) and Generalised Linear Mixed Models (GLMM), Generalised Additive Mixed Models
(GAMM), Autoregressive Integrated Moving Average (ARIMA) and Autoregressive Integrated
Moving Average with Exogenous Variables (ARIMAX) models, Vector Autoregression (VAR) and
Structural Vector Autoregressive models (SVAR), and Vector Error Correction Models (VECM).
PART 1: USES OF TSA AND PLANNING THE ANALYSES
Uses of TSA
There are several types of question that can be addressed by TSA. Simple ‘trend analysis’ assesses
whether there is evidence for a change in the level of a series over time. For example, a study of
first year college students used TSA to assess whether there was an increase in use of tobacco,
alcohol and cannabis at the beginning and end of the academic year (5).
‘Multiple TSA’ assess whether a temporal trend in a target variable is linked to trends in other
variables. In the smoking field, Beard et al examined whether the growth in prevalence of e-
cigarette use was linked to a decline in the use of licensed nicotine products such as nicotine skin
patches (6). Langley et al assess the temporal association between a standardised measure of
tobacco control advertising exposure on television and the number of calls to a national stop

This article is protected by copyright. All rights reserved.
smoking helpline (7). Brunt et al assess the association between changes in the price and quality of
cocaine and changes in the incidence of addiction treatment episodes and hospital admissions (8).
‘Interrupted TSA’ assess whether an event or shift in policy was associated with a change in the
trend of a target variable. For example, Holder and Wagenaar studied changes in the rate of road
traffic crashes after the introduction of a law on training for responsible alcohol-serving in licensed
premises (9). An interrupted TSA was used to evaluate the effect on smoking prevalence of the
partial tobacco point of sale display ban in large shops in England (10). Other studies have used
interrupted TSA to evaluate: the impact on smoking cessation of temporarily suspending large-scale
tobacco mass media campaigns (11); the introduction of the smoking cessation drug, varenicline,
on prescribing of smoking cessation medications (12); the impact of introducing flexible alcohol
trading hours on rates of violence, robbery and total crime (13); and changes in sale of alcohol
following a ban on discounted alcohol products in shops and supermarkets (14).
TSA are also used in ‘forecasting’: projecting forward from past values of a series. For example,
alcohol consumption over a prospective 10-year period was forecast for the Czech Republic (15). In
the United States, age-specific mortality rates for men and women have been forecast using the
decline in the prevalence of tobacco smoking (16).
Data sources and design issues
There are many suitable data sources for TSA, including official registries, repeated cross-sectional
surveys and longitudinal studies of cohorts and panels. Examples include the Swiss HIV Cohort
Study established in 1988 which continuously enrolled HIV-infected people who attended out-
patient clinics at seven centres (17), and The Health Improvement Network (THIN) database in
England (18, 19). TSA can be difficult to use in cohort and panel samples because of follow-up
attrition (20), but this situation is improving with the advent of mobile/smartphone technologies
(21-23).
At an individual level, data can be used from multiple repeated measurements from individuals in
n-of-1 studies. For example, ecological momentary assessment generates large quantities of data
from individuals (24) and digital mobile applications similarly can generate time series on usage,
inputs, processes and outcomes (25).
In multiple TSA the coefficient linking a given input series to an output series can be interpreted as
the association between the input and output series after adjusting for or removing any underlying
trend and other input series included in the analysis. In interrupted TSA, a ‘dummy’ variable (taking

This article is protected by copyright. All rights reserved.
the value 0 or 1) is used in the input series to reflect time points when events occur, or pre-versus
post-initiation of a policy etc. The resulting coefficient can be interpreted as the change in the value
or trend of the outcome variable linked to the presence or onset of the event or policy after
adjusting for or removing any underlying trend and other input variables (11). TSA can be used
with many types of data, including counts and percentages (e.g. number of heroin users arrested,
the number of fatal drug-related poisoning cases, and the prevalence of adult smoking), binary data
and continuous measures (e.g. amount spent on alcohol per week).
It is important to appreciate limitations of TSA. First, they can only assess associations at the
temporal granularity of the series. Thus if the data are weekly, TSA are assessing week-by-week
changes, not changes over a longer or shorter time frame. For example, showing an association
between monthly spend on anti-tobacco mass media campaigns and attempts to stop smoking
does not mean that a similar association would be found with annual spend. Secondly, TSA have
limited ability to detect associations between input and output variables that accumulate over a
long period. For example, anti-alcohol mass media campaigns may not have a detectable effect in
the short term but may contribute to cultural change that accumulates over a period of years.
Detecting such an effect would be very problematic for TSA.
Sample size
Statistical power calculations can inform study planning (26). Unfortunately, power calculations,
such as those provided for linear regression and ANOVA in G*Power(27, 28), are not suitable for
time series data. In TSA it is necessary to account for autocorrelation, seasonality and lag effects
(see below). Calculations are provided by McLeod and Vingili (29, 30) for interrupted time series
designs. However, the recommended method is to use a power simulation, and there are several
statistical packages that can do this (e.g. R). The procedure involves running many TSA models on
randomly generated data with expected parameter estimates and calculating power from the
proportion of observations which return results at a given level of statistical significance.
In general, the sample size required will increase with the number of parameters to be estimated,
and the amount of noise in the data. It is important to accurately reflect changes over time in the
simulation and to include all covariate estimates of interest. There should always be more time
points than the total number of variables, autocorrelation and lag terms and some experts
recommend at least 50-100 time points (31-33). At least two years of monthly data has been
proposed to allow adjustment for seasonality, with a preference for equal proportions of data
collection before and after the event or change in the input variable for an interrupted TSA (26).

Citations
More filters
Journal ArticleDOI

Encyclopedia of Statistics in Behavioral Science

Martin Guha
TL;DR: An apposite and eminently readable reference for all behavioral science research and development.
Journal ArticleDOI

Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions

TL;DR: In this article, an Autoregressive Integrated Moving Average (ARIMA) model is proposed to evaluate the impact of large-scale interventions when other approaches are not suitable, as it can account for underlying trends, autocorrelation and seasonality and allows for flexible modelling of different types of impacts.
Journal ArticleDOI

Association of prevalence of electronic cigarette use with smoking cessation and cigarette consumption in England: a time-series analysis between 2006 and 2017.

TL;DR: Changes in prevalence of e‐cigarette use in England have been positively associated with the overall quit rates and quit success rates but not clearly associated withThe prevalence of quit attempts and mean cigarette consumption.
Journal ArticleDOI

Impact of minimum unit pricing on alcohol purchases in Scotland and Wales: controlled interrupted time series analyses.

TL;DR: In the first half of 2020, the UK introduced a minimum unit price (MUP) of 50 British pence (p) per unit of alcohol (8 g) and Wales followed suit on March 2, 2020 with the same MUP as discussed by the authors.
Journal ArticleDOI

Innovative methods for observing and changing complex health behaviors: four propositions.

TL;DR: It is argued that health behavior change researchers should progressively transition from (i) low- to high-resolution behavioral assessments, (ii) group-only to group- and individual-level statistical inference, (iii) narrative theoretical models to dynamic computational models, and (iv) static to adaptive and continuous tuning interventions.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

TL;DR: In the new version, procedures to analyze the power of tests based on single-sample tetrachoric correlations, comparisons of dependent correlations, bivariate linear regression, multiple linear regression based on the random predictor model, logistic regression, and Poisson regression are added.
Journal ArticleDOI

Generalized autoregressive conditional heteroskedasticity

TL;DR: In this paper, a natural generalization of the ARCH (Autoregressive Conditional Heteroskedastic) process introduced in 1982 to allow for past conditional variances in the current conditional variance equation is proposed.
Journal ArticleDOI

Testing for a Unit Root in Time Series Regression

TL;DR: In this article, the authors proposed new tests for detecting the presence of a unit root in quite general time series models, which accommodate models with a fitted drift and a time trend so that they may be used to discriminate between unit root nonstationarity and stationarity about a deterministic trend.
Journal ArticleDOI

Statistical analysis of cointegration vectors

TL;DR: In this paper, the authors consider a nonstationary vector autoregressive process which is integrated of order 1, and generated by i.i.d. Gaussian errors, and derive the maximum likelihood estimator of the space of cointegration vectors and the likelihood ratio test of the hypothesis that it has a given number of dimensions.
Related Papers (5)
Trending Questions (1)
Why temporal analysis in conducting research?

Temporal analysis is crucial in research to assess trends over time, associations between variables, and effects of interventions, aiding in understanding addiction dynamics and informing effective interventions.