Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico

doi:10.1038/SREP33707

Evaluating the performance of infectious disease

forecasts: A comparison of climate-driven and

seasonal dengue forecasts for Mexico

Citation

Johansson, Michael A., Nicholas G. Reich, Aditi Hota, John S. Brownstein, and Mauricio

Santillana. 2016. “Evaluating the performance of infectious disease forecasts: A comparison

of climate-driven and seasonal dengue forecasts for Mexico.” Scientific Reports 6 (1): 33707.

doi:10.1038/srep33707. http://dx.doi.org/10.1038/srep33707.

Published Version

doi:10.1038/srep33707

Permanent link

http://nrs.harvard.edu/urn-3:HUL.InstRepos:29407787

Terms of Use

This article was downloaded from Harvard University’s DASH repository, and is made available

under the terms and conditions applicable to Other Posted Material, as set forth at

http://

nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Share Your Story

The Harvard community has made this article openly available.

Please share how this access benefits you.

Submit a story .

Accessibility

1

Scientific RepoRts | 6:33707 | DOI: 10.1038/srep33707

www.nature.com/scientificreports

Evaluating the performance of

infectious disease forecasts: A

comparison of climate-driven and

seasonal dengue forecasts for

Mexico

Michael A. Johansson

1,2

, Nicholas G. Reich

3

, Aditi Hota

4

, John S. Brownstein

4,5

&

Mauricio Santillana

4,5,6

Dengue viruses, which infect millions of people per year worldwide, cause large epidemics that strain

healthcare systems. Despite diverse eorts to develop forecasting tools including autoregressive

time series, climate-driven statistical, and mechanistic biological models, little work has been done

to understand the contribution of dierent components to improved prediction. We developed a

framework to assess and compare dengue forecasts produced from dierent types of models and

evaluated the performance of seasonal autoregressive models with and without climate variables

for forecasting dengue incidence in Mexico. Climate data did not signicantly improve the predictive

power of seasonal autoregressive models. Short-term and seasonal autocorrelation were key to

improving short-term and long-term forecasts, respectively. Seasonal autoregressive models captured a

substantial amount of dengue variability, but better models are needed to improve dengue forecasting.

This framework contributes to the sparse literature of infectious disease prediction model evaluation,

using state-of-the-art validation techniques such as out-of-sample testing and comparison to an

appropriate reference model.

Dengue is a substantial public health problem in most of the tropical and subtropical regions of the world

1

. In

most of these areas, dengue is endemic with cases occurring year-round, yet there is marked variation in inci-

dence of dengue both within and between years. In Mexico, for example, yearly reported incidence over the

last few decades has varied from several thousand cases to over 100,000 cases in 2009

2

. Even understanding the

current burden of dengue can be challenging. ere is oen an extended delay between symptom onset and o-

cial reports reecting care-seeking behavior or conrmed clinical reports of illness. In some settings, complex,

multi-tiered reporting systems may contribute to delays. Because of the vastly greater burden of disease in larger

epidemic years and diculty in understanding current and future needs, much eort has been placed on devel-

oping early warning systems to predict or detect large epidemics as early as possible with the hope of being able

to control epidemics in their early stages

3,4

.

Numerous mechanistic models have been developed to use detailed knowledge of dengue virus transmission

biology to predict the evolution of dengue epidemics

5,6

. However diculties emerge when attempting to use

mechanistic models for forecasting, oen the key mechanistic assumptions are unclear and the data needed to

parameterize and feed the model are oen dicult or impossible to obtain.

1

Dengue Branch, Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico.

2

Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts,

USA.

3

Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, Massachusetts, USA.

4

Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA.

5

Department

of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

6

J.A. Paulson School of Engineering and Applied

Sciences, Harvard University, Cambridge, MA, USA. Correspondence and requests for materials should be addressed

to M.A.J. (email: eyq9@cdc.gov) or M.S. (email: msantill@g.harvard.edu)

Received: 17 March 2016

Accepted: 24 August 2016

Published: 26 September 2016

OPEN

www.nature.com/scientificreports/

2

Scientific RepoRts | 6:33707 | DOI: 10.1038/srep33707

Models for large-scale dengue early warning systems have therefore mostly focused on two components, temporal

autocorrelation and an association with weather or climate. Temporal autocorrelation results from the infectious

nature of dengue viruses; cases are more likely in the near future when the current prevalence of infection

is high. e inuence of weather is due to the mosquito vectors of dengue viruses, Aedes aegypti and Ae. albopictus.

Temperature, humidity, and precipitation are important determinants of mosquito reproduction and longevity

7–9

,

and temperature has a strong inuence on the ability of the mosquitoes to transmit dengue viruses

10

.

ese two components, autocorrelation and weather or climate, form the basis for a wide variety of eorts to

predict dengue incidence in countries such as Australia

11

, Bangladesh

12

, Barbados

13

, Brazil

14–17

, Colombia

18–20

,

Costa Rica

21

, China

22

, Ecuador

23

, Guadeloupe

24

, India

25

, Indonesia

26,27

, Mexico

28

, New Caledonia

29

, Peru

30

,

Philippines

31,32

, Puerto Rico

33

, Singapore

34–38

, Sri Lanka

39

, Taiwan

40,41

, ailand

42–46

, and Vietnam

47

.

In this manuscript, we focus on developing prediction models for dengue incidence in Mexico based on

observed dengue incidence and weather. We focus explicitly on building models that directly predict dengue

incidence rather than those designed to classify future transmission, for example as low or high

29,30,34,42

. Although

classication may be more useful for public health decision-making, the classication process is subjective

48

,

making estimation of uncertainty a less straightforward process compared to direct prediction of incidence.

Autoregressive integrated moving average (ARIMA) models

49

have been used extensively in dengue predic-

tion eorts

14,18,19,35,37,40,45

, oen incorporating a seasonal component (SARIMA)

11,14,15,24,25,27,43,46,47

. We used the

SARIMA framework to dene a dynamic suite of prediction models for dengue in Mexico that is at once exible

and wide-ranging, while remaining manageable in size for the purposes of careful evaluation and comparison.

While the fundamental biology leading to autocorrelation and associations with weather and dengue incidence

is clear, the specic contributions of these two components to dengue prediction remains less so. In other words,

we wanted to answer the question, to what extent does incorporating climate data into models improve den-

gue forecasts at dierent spatial scales? erefore, we assessed three specic features of these dengue prediction

models: (i) the most important autoregressive and climatological components for predicting dengue incidence;

(ii) the variability in importance of these components across dierent geographical areas; and (iii) the limits of

prediction accuracy across models at dierent time horizons. Another important motivation of this study was

to establish a forecast assessment framework that can serve as a reference for any infectious disease forecasting

problem, including comparison to a non-naïve baseline model and validation on completely out-of-sample data.

Materials and Methods

Data. e number of dengue and dengue hemorrhagic fever cases reported for each month from January 1985

to December 2012 was obtained from the Mexican Health Secretariat

2

. To model these counts as a linear process,

we log-transformed the highly skewed monthly dengue cases aer adding one to the observed count. is elimi-

nates computational problems associated with taking the log of zero, and can be thought of as accounting for the

potential entrance of new cases

50

. Weather data from January 1985 to December 2012 were obtained from the

National Oceanic and Atmospheric Administration North American Regional Reanalysis dataset (www.esrl.noaa.

gov/psd/ accessed on May 1

st

, 2013). For each month and state, we extracted the average temperature (°C), daily

precipitation (mm), and relative humidity (%).

Models. We analyzed three dierent types of temporal models: linear models, autoregressive (AR) models,

and seasonal autoregressive models (SAR). We label these models as (p,d,q)(P,D,Q)

s

+ covar

L

. For the ARIMA

component, (p,d,q), p indicates the autoregressive order, d indicates dierencing, and q is the order of the moving

average

49

. For the seasonal component, (P,D,Q)

s

, s indicates the season length (s = 12 months for the monthly data

presented here), P indicates the autoregressive order, D indicates dierencing, and Q is the order of the moving

average. e nal component, + covar

L

, indicates a particular covariate at a lag of L months included as a linear

regression term.

We used a systematic procedure to select SARIMA models to t to the data and evaluate for predictive per-

formance. Each time series was assessed for stationarity using the Augmented Dickey-Fuller test

51

. We used

domain-specic knowledge about infectious disease models to create a limited model space that we could explore

fully and justify scientically. Dening the model space in this way mitigated the risk of overtting the model in

the training period. Specically, we chose to focus on models that (a) included lagged observations of up to three

months (p = 1, 2, or 3) and three years (P = 0, 1, 2, or 3), (b) included dierencing terms of up to order 1 (i.e. d or

D = 0 or 1), and (c) excluded all consideration of moving averages (i.e. q and Q both xed at 0).

Prediction models that include a dierencing term (i.e. d = 1 or D = 1) have a term added to the model for-

mula that captures the dierence between the lag-1 and lag-2 case counts. ese models therefore incorporate

information about the most recently observed slope of dengue incidence: e.g. are reported cases increasing or

decreasing? In some formulations, this can be considered a crude approximation to the reproductive rate of the

disease, an important parameter for judging the trajectory of the outbreak

52

. As an example of a formula for a

model including dierencing, a (1,0,0)(2,1,0)

12

multiplicative SARIMA model can be expressed as:



αφ αφ

φαφ

=+ −+ −− −

+−−−+

−−−−−−−

−− −−

XX XX XX XX

XX XX

()()()

()()

(1)

tt tt tt tt

tt tt t

12 11 13 11224111325

22436122537

where

X

t

are the observed numbers of cases of dengue at time

t

;



t

are the residual error terms, assumed to be

normally distributed; and the coecients

α

1

,

φ

1

, and

φ

2

are determined to minimize the residual squared error.

To evaluate the added predictive value of weather variables, we examined the best SARIMA models from the

model space dened above by adding individual weather variables as covariates, with lags of 1, 2, and 3 months

and the month with the maximum Pearson correlation with dengue incidence (labeled with the sub-script MX

in the gures), and assessing the change in predictive performance. Our aim was to single out the eect of each

www.nature.com/scientificreports/

3

Scientific RepoRts | 6:33707 | DOI: 10.1038/srep33707

variable (precipitation, relative humidity, and temperature). Additional models including multiple weather vari-

ables at once were considered. Ultimately, the models presented represent a given SARIMA model plus the most

highly correlated lagged weather covariate. All models were tted in R

53

, using the arima function from the stats

package.

Model evaluation. e data was separated into three subsets. e rst ve years of data (1985-1989) was

reserved for model training. Models trained on that data were used to dynamically predict dengue incidence

over an 18-year model evaluation period (1990–2007). e nal ve-year dataset (2008–2012) was reserved for a

complete out-of-sample validation aer model selection.

At the end of the training period, predictions were made for the next 1–6 months (e.g., predictions are made

for January–June based on data through December of the preceding year). en the model was retted with data

from the next month (i.e., January) and predictions were made for the following 1–6 months (February–July).

is process was repeated over all months of the evaluation period. Predictions were evaluated by comparing

observed incidence with model-predicted incidence using two metrics: mean absolute error (MAE) and the coef-

cient of determination (R

2

). For comparisons of predictions across dierent locations and predictions hori-

zons, we calculated the change in MAE relative to the best prediction among all models for a given location and

horizon:

=mm MrelMAE() MAE( )/min(MAE( ))

(2)

ii

where m

i

is a single model in the set of models M for a given time and location. us the model with the best point

prediction has relMAE = 1 and predictions further from the observations will have increasing relMAE values.

To compare two specic models (m

1

and m

2

), we calculated the relMAE:

=mm mmrelMAE(, )MAE()/MAE()

(3)

12 12

State-level models. We systematically assessed the predictive performance of a wide range of models at

dierent spatial scales, using data aggregated at the national level and at the level of states. e primary interest

of this manuscript is to assess models for forecasting dengue incidence in endemic locations, so we restricted the

analysis to the 17 states reporting cases in at least half of the months in the training and development time period

(1985–2007). Some of the other states, such as Sonora, Coahuila, and San Luis Potosi, had high median annual

incidence over this period due to sporadic outbreaks but reported cases in less than 50% of the months.

Using the selection algorithm described briey above in the “Models” subsection and in more detail below

in the “National dengue forecasts” subsection, we tted and evaluated 39 models at each spatial scale. e same

selection process was repeated for each location separately. We assessed the predictions of all models across each

location using the average relative MAE and R

2

for each prediction horizon. For each state we selected an optimal

model for short-term forecasts by identifying the model with minimum average relative MAE across the 1–3

month prediction horizons, for each state we refer to this model as the “local” model. Assessing model perfor-

mance across all spatial scales in the training period, we identied a single “common” model that performed well

across all spatial scales. During the prospective out-of-sample forecast phase, aer model selection, we compared

the performance between the common and local models.

Results

National dengue forecasts. During the 5-year training period (1985–1989) and 18-year evaluation period

(1990–2007) the monthly dengue incidence in Mexico showed strong seasonality (Fig.1A). Incidence also varied

substantially between years, with less than 2,000 cases reported in 2000 and more than 50,000 reported in 1997.

We first compared two naïve models for predicting national dengue incidence several months into the

future, one estimating that future incidence follows the historical average incidence and another assuming that

future incidence for a specic month will be the historical average for that particular month of the calendar year

(Fig.1B). ese models were implemented at the beginning of the evaluation period, with predictions made for

months 1 through 6 of that period. An additional month of data was then acquired (month 1), the model was

retted, and predictions for the next 6 months (i.e., months 2–7) were made, ensuring that all forecasts were made

on out-of-sample data. As predictions were made at each month throughout the 18-year evaluation period, a total

of 216 predictions were analyzed for each prediction horizon and each model. For all models, we used the same

dynamically increasing training set strategy to calculate the respective out-of-sample predictions, strictly avoiding

the use of forward-looking information. is allowed the use of predictive metrics to compare models directly.

Across all of these out-of-sample predictions, the model using the month of the year outperforms the

long-term mean model by both MAE (lower error) and R

2

(higher correlation). Despite being relatively naïve,

predictions from the seasonal model have an R

2

of approximately 0.24.

We then assessed twelve linear models including data on average temperature, daily precipitation, and relative

humidity in previous months. e maximum cross-correlation between each of these weather variables and den-

gue incidence was 1 month for relative humidity, 3 months for precipitation, and 4 months for temperature. We

assessed lags of 1–3 months for each variable and also a 4-month lag for temperature. e 4-month lag tempera-

ture model performed best, but did not outperform the simple monthly model by either metric.

Next we assessed sixteen models with only autocorrelation and dierencing terms: eight models including

only short-term autocorrelation and eight models including both short-term and seasonal autocorrelation. e

rst-order AR model (1,0,0)(0,0,0)

12

, substantially improved the 1-month predictions compared to all previous

models. At 2-months, the predictions were less accurate than those from the monthly model and some of the

covariate models. Increasing the order of the AR model slightly improved the predictions, while dierencing

www.nature.com/scientificreports/

4

Scientific RepoRts | 6:33707 | DOI: 10.1038/srep33707

decreased their accuracy. Adding a seasonal component markedly improved the predictions; the (1,0,0)(1,0,0)

12

SAR model outperformed all of the previous models with lower MAE and higher R

2

at all prediction times.

Increasing the short-term AR component slightly decreased accuracy and 1-month dierencing decreased accu-

racy substantially. Adding a seasonal dierencing term however, improved accuracy by both metrics for longer

prediction horizons (e.g. (1,0,0)(1,1,0)

12

). Increasing the order of the seasonal AR term to (1,0,0)(3,1,0)

12

further

improved predictions.

Finally, we assessed nine AR and SAR models with weather covariates. At all prediction horizons, models

containing rst order autoregressive terms and covariates consistently outperformed models containing only the

covariate information (e.g. (1,0,0)(0,0,0)

12

+ temp

MX

versus temp

MX

) and simple AR models (e.g. (1,0,0)(0,0,0)

12

+

temp

MX

versus (1,0,0)(0,0,0)

12

). e SAR models with covariates have very similar accuracy to the SAR models

without covariates at 1–4 month prediction horizons. For 5- and 6-month horizons, the covariate models could

not make predictions as they extended beyond the 4-month lag for temperature. Additional analyses showed that

once about 12 years of training data were used, the predictive performance of the models reached a stable average

value (between 0.85 and 0.9 correlation), although year-specic values showed some variation around the average

(data not shown). Finally, in a non-exhaustive search, we noticed that using all weather variables at once as input

in a given model did not yield noticeable improvements.

Overall, the best performing models were (1,0,0)(3,1,0)

12

for horizons of 1–2 months, (1,0,0)(3,1,0)

12

+ temp

MX

for the 3-month horizon, (1,0,0)(1,1,0)

12

for the 4-month horizon, and (2,0,0)(1,1,0)

12

for 5- and 6-month hori-

zons (Fig.1C). e best model across all horizons was (1,0,0)(3,1,0)

12

, followed by (1,0,0)(1,1,0)

12

and (1,0,0)

(2,1,0)

12

.

State-level dengue forecasts. Dengue incidence varies substantially between states within Mexico

(Fig.2). Similar to the national-level results, the rst-order AR model (1,0,0)(0,0,0)

12

outperformed the monthly

model at shorter prediction horizons of 1–3 months (Fig.3, Supplementary Figure 1). However across all hori-

zons and states, the most accurate model was the (1,0,0)(2,1,0)

12

model, which we now refer to as the “common”

model. e coecients for the (1,0,0)(2,1,0)

12

model in each state varied, but the pattern was consistent across all

Figure 1. National-level forecast metrics. National-level incidence is shown during the training period

(1985–1989) and evaluation period (1990–2007). (A) For each of 39 models considered, the MAE (B) and R

2

(C) values for prospective forecasts over the entire evaluation period are shown for each prediction horizon

(dark red to yellow, corresponds to 1–6 months). For models including lagged weather covariates, forecasts

were not possible at prediction horizons beyond the lag and are not shown. An equivalent plot for each state

is shown in Figure S1.

Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico

Citations

Cites background from "Evaluating the performance of infec..."

References

Related Papers (5)