scispace - formally typeset
Open AccessPosted ContentDOI

SARIMA Forecasts of Dengue Incidence in Brazil, Mexico, Singapore, Sri Lanka, and Thailand: Model Performance and the Significance of Reporting Delays

Reads0
Chats0
TLDR
This work robustly assess timeseries- based forecasting approaches against a null model (historical average incidence) for the forecasting of incidence up to four months ahead and found that the time series methods are more accurate than the null model across all populations, especially for 1- and 2-month ahead forecasts.
Abstract
Timely and accurate knowledge of Dengue incidence is of value to public health professionals because it helps to enable the precise communication of risk, improved allocation of resources to potential interventions, and improved planning for the provision of clinical care of severe cases. Therefore, many national public health organizations make local Dengue incidence data publicly available for individuals and organizations to use to manage current risk. The availability of these data has also resulted in active research into the forecasting of Dengue incidence as a way to increase the public health value of incidence data. Here, we robustly assess time-series-based forecasting approaches against a null model (historical average incidence) for the forecasting of incidence up to four months ahead. We used publicly available data from multiple countries: Brazil, Mexico, Singapore, Sri Lanka, and Thailand; and found that our time series methods are more accurate than the null model across all populations, especially for 1-and 2-month ahead forecasts. We tested whether the inclusion of climatic data improved forecast accuracy and found only modest, if any improvements. We also tested whether national timeseries forecasts are more accurate if made from aggregate sub-national forecasts, and found mixed results. We used our forecasting results to illustrate the high value of increased reporting speed. This framework and test data are available as an R package. The non-mechanistic approaches described here motivates further research into the use of disease-dynamic models to increase the accuracy of medium-term Dengue forecasting across multiple populations. Author summary Dengue is a mosquito-borne disease caused by the Dengue virus. Since the Second World War it has evolved into a global problem, securing a foothold in more than 100 countries. Each year, hundreds of millions of people become infected, and upwards of 10,000 die from the disease. Thus, being able to accurately forecast the number of cases likely to emerge in particular locations is vital for public health professionals to be able to develop appropriate plans. In this study, we have refined a technique that allows us to forecast the number of cases of Dengue in a particular location, up to four months in advance. We test the approach using state-level and national-level data from Brazil, Mexico, Singapore, Sri Lanka, and Thailand. We found that the model can generally make useful forecasts, particularly on a two-month horizon. We tested whether information about climatic conditions improved the forecast, and found only modest improvements to the forecast. Our results highlight the need for both timely and accurate reports. We also anticipate that this approach may be more generally useful to the scientific community; thus, we are releasing a framework, which will allow interested parties to replicate our work, as well as apply it to other sources of Dengue data, as well as other infectious diseases in general.

read more

Content maybe subject to copyright    Report

SARIMA Forecasts of Dengue Incidence in Brazil, Mexico,
Singapore, Sri Lanka, and Thailand: Model Performance and
the Significance of Reporting Delays
Pete Riley
1,5
, Michal Ben-Nun
1,5
, James Turtle
1
, David Bacon
3
, and Steven Riley
4
1 Predictive Science Inc., San Diego, CA, U.S.A.
2 Dengue Branch, Division of Vector-Borne Diseases, Centers for Disease Control and
Prevention, San Juan, Puerto Rico.
3 Leidos, Arlington, VA, U.S.A.
4 Imperial College London, London, England, U.K.
5 These authors contributed equally to this work.
* pete@predsci.com
Abstract
Timely and accurate knowledge of Dengue incidence is of value to public health
professionals because it helps to enable the precise communication of risk, improved
allocation of resources to potential interventions, and improved planning for the
provision of clinical care of severe cases. Therefore, many national public health
organizations make local Dengue incidence data publicly available for individuals and
organizations to use to manage current risk. The availability of these data has also
resulted in active research into the forecasting of Dengue incidence as a way to increase
the public health value of incidence data. Here, we robustly assess time-series-based
forecasting approaches against a null model (historical average incidence) for the
forecasting of incidence up to four months ahead. We used publicly available data from
multiple countries: Brazil, Mexico, Singapore, Sri Lanka, and Thailand; and found that
our time series methods are more accurate than the null model across all populations,
especially for 1- and 2-month ahead forecasts. We tested whether the inclusion of
PLOS 1/28
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.26.20141093doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

climatic data improved forecast accuracy and found only modest, if any improvements.
We also tested whether national timeseries forecasts are more accurate if made from
aggregate sub-national forecasts, and found mixed results. We used our forecasting
results to illustrate the high value of increased reporting speed. This framework and
test data are available as an R package. The non-mechanistic approaches described here
motivates further research into the use of disease-dynamic models to increase the
accuracy of medium-term Dengue forecasting across multiple populations.
Author summary
Dengue is a mosquito-borne disease caused by the Dengue virus. Since the Second
World War it has evolved into a global problem, securing a foothold in more than 100
countries. Each year, hundreds of millions of people become infected, and upwards of
10,000 die from the disease. Thus, being able to accurately forecast the number of cases
likely to emerge in particular locations is vital for public health professionals to be able
to develop appropriate plans. In this study, we have refined a technique that allows us
to forecast the number of cases of Dengue in a particular location, up to four months in
advance. We test the approach using state-level and national-level data from Brazil,
Mexico, Singapore, Sri Lanka, and Thailand. We found that the model can generally
make useful forecasts, particularly on a two-month horizon. We tested whether
information about climatic conditions improved the forecast, and found only modest
improvements to the forecast. Our results highlight the need for both timely and
accurate reports. We also anticipate that this approach may be more generally useful to
the scientific community; thus, we are releasing a framework, which will allow interested
parties to replicate our work, as well as apply it to other sources of Dengue data, as well
as other infectious diseases in general.
Introduction 1
Forecasting the near- and long-term evolution of Dengue incidence within a country has
2
obvious value for policy makers. Dengue is a mosquito-borne disease caused by the 3
Dengue virus, affecting most tropical regions of the world [1]. Each year, between 50 4
PLOS 2/28
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.26.20141093doi: medRxiv preprint

and 500 million people are infected with Dengue. Of these between 10,000 and 20,000 5
people die [2, 3]. In spite of the disease being endemic, seasons vary dramatically from 6
one to the next, sometimes by more than an order of magnitude [4]. Knowledge of the 7
estimated total number of cases, the timing of the peak, and near-term incidence can 8
allow public health personnel to allocate limited resources appropriately, particularly 9
when Dengue may be competing with other diseases. Accurate predictions of large 10
increases in incidence would allow health care managers to prepare for a surge of 11
patients, as well as more proactive interventions, such as vector control. 12
A number of statistical and mechanistic models have been developed with the aim of
13
modeling or forecasting Dengue in various settings [5–13]. While ultimately, it is likely 14
that mechanistic approaches [10,14], should outperform statistical [15, 16] and machine
15
learning (ML) approaches [17], our current understanding of the complex dynamics 16
associated with vector-borne diseases, as well as the limitations in available data, 17
suggest that statistical techniques should be considered first. Of the statistical 18
approaches, Seasonal Autoregressive Integrated Moving Average (SARIMA) models 19
have received the most attention [18
20]. Recently, these models have been applied in a
20
pseudo-forecasting mode to assess their performance. In one study, the best overall 21
performing model relied on lagged observations of one month, with three yearly lag 22
terms and the first yearly difference [6], suggesting that there was a long term trend in
23
the data and that the average of the last three years observations for a given month was
24
a good model if adjusted by a single most recent observation from the current year. 25
Notably, climate data did not appreciably improve the power of the SARIMA models. 26
In this study, we describe a statistical technique for predicting Dengue incidence 27
rates from one to four months in the future using a family of SARIMA models. We 28
apply the model to districts/provinces/states within five distinct countries (Brazil, 29
Mexico, Singapore, Sri Lanka, and Thailand) for which reliable monthly or weekly data
30
are available. 31
Results 32
We first examined the data for evidence of seasonality. We then applied a reasonably 33
exhaustive set of SARIMA models to these data, first without, and then with the 34
PLOS 3/28
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.26.20141093doi: medRxiv preprint

addition of co-variate data. Next we explored the effects of direct versus aggregate 35
forecasting, and finally, we investigated the effects of reporting time delays. 36
Periodicity/Seasonality in the incidence data 37
There is a clear seasonal component to the incidence profiles for Brazil (data publicly 38
available for the interval 2001-2012), Thailand (2007-2018), Mexico (1985-2017), and 39
Singapore (2005-2019) (Fig. 1). For Sri Lanka (2010-2019), the picture is more complex.
40
This is, at least in part, complicated by the fact that the peak values in 2017 were more
41
than six times higher than the average peak values over the previous decade. To explore
42
whether possible seasonal signatures exist, we applied a wavelet transform to district 43
level data of Sri Lanka (Fig. 2). With the exception of Ratnapura, there is no evidence
44
for a stable, annual peak (with a frequency of 12 months). On the other hand, there is
45
some evidence for a sustained signal at 28-32 months that is present in all of the top-five
46
districts (and most of the other 19 districts). In the right column we explore the idea 47
that outbreaks spread out from the capital, Colombo (blue line in each panel), to other
48
districts (red line in each panel) by plotting the average phase of the amplitude with a 49
frequency of 10-14 months; however, we find no consistent evidence for a lead/lag. 50
SARIMA Analysis 51
Application of the leading eight SARIMA models to each of the 52
states/provinces/regions within Brazil, Thailand, Mexico, and Singapore generally 53
demonstrated that the (1, 0, 0)(3, 0, 0)
12
SARIMA model performed best. For example, 54
comparison of eight SARIMA models across 76 Thai provinces with our null historical 55
model, showed that a simple monthly historical average with a 1-month-3-year lagged 56
regression model of either the direct observations or their first difference (i.e., 57
(1, 0, 0)(3, 0, 0)
12
or (1, 0, 0)(3, 1, 0)
12
, respectively) performed best across all provinces 58
(Fig. 3). Unsurprisingly, the Mean Absolute Error (MAE), tended to decrease moving 59
from the most populous to least populous provinces, while the Mean Relative Absolute
60
Error (MRAE) remained approximately constant from one province to another. Using 61
the ratio of MAE(SARIMA) to MAE(NULL) as a measure of Skill Score (
SS
) for each
62
SARIMA model, we infer that almost all eight of the SARIMA models outperformed 63
PLOS 4/28
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.26.20141093doi: medRxiv preprint

(d)
Brazil
2001−2012
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Região Norte
Região Nordeste
Região Sudeste
Região Sul
Região Centro−Oeste
1
2
3
4
5
Mexico
1985−2017
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
Aguascalientes
Baja California
Baja California Sur
Campeche
Chiapas
Chihuahua
Coahuila
Colima
Durango
Guerrero
Hidalgo
Jalisco
México
Michoacán
Morelos
Nayarit
Nuevo León
Oaxaca
Puebla
Querétaro
Quintana Roo
San Luis Potosí
Sinaloa
Sonora
Tabasco
Tamaulipas
Veracruz
Yucatán
0
1
2
3
4
Sri Lanka
2010−2019
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Ampara
Anuradhapura
Badulla
Batticaloa
Colombo
Galle
Gampaha
Jaffna
Kalutara
Kandy
Kegalle
Kilinochchi
Kurunegala
Mannar
Matale
Matara
Moneragala
Mullaitivu
Nuwara Eliya
Polonnaruwa
Puttalam
Ratnapura
Trincomalee
Vavuniya
0
1
2
3
Thailand
2007−2018
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Northern Region
Central Region
North−Eastern Region
Southern Region
2.0
2.5
3.0
3.5
4.0
1.5 2.0 2.5 3.0
Singapore
2005−2019
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
(a)
(b)
(c)
(e)
Fig 1.
Heat maps for provinces or regions within (a) Brazil, (b) Mexico, (c) Sri Lanka
and (d) Thailand. Monthly cadence is shown in all four panels. (e) Weekly national
level incidence data for Singapore. In each panel, values represent
Log
10
(
I
+ 1), where
I
is the number of monthly or weekly cases in that region (or country).
PLOS 5/28
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.26.20141093doi: medRxiv preprint

Figures
Citations
More filters
Posted ContentDOI

Predicting Dengue Fever in Brazilian Cities

TL;DR: In this article, the authors developed a model to predict the number of Dengue Fever cases in Brazilian cities one month ahead by comparing different machine learning approaches as well as different sets of input features based on epidemiological and meteorological data.
Proceedings ArticleDOI

Applicability of SARIMA Model in Tokyo Population Migration Forecast

TL;DR: In this article, the authors analyzed the trend of people moving in and out of Tokyo using government statistics by e-Stat and outline the accuracy of prediction by applying the SARIMA model, which is used to predict time series data.
Posted ContentDOI

A systematic review of dengue outbreak prediction models: Current scenario and future directions

TL;DR: In this paper , a systematic review of the published literature that used quantitative models to predict dengue outbreaks and provide insights about the current practices was conducted, using the Ovid MEDLINE, EMBASE, Scopus and Web of Science databases without time or geographical restrictions.
Posted Content

Neural Networks for Dengue Prediction: A Systematic Review.

TL;DR: In this paper, the authors provide an introduction to the neural networks relevant to Dengue forecasting and review their applications in the literature, and summarize the relative performance of neural networks and comparator models, model architectures and hyperparameters, as well as choices of input features.
References
More filters
Journal ArticleDOI

A systematic review of mathematical models of mosquito-borne pathogen transmission: 1970–2010

Robert C. Reiner, +54 more
TL;DR: In this article, a bibliography of 325 publications from 1970 through 2010 that included at least one mathematical model of mosquito-borne pathogen transmission and then used a 79-part questionnaire to classify each of the associated models according to its biological assumptions.
Journal ArticleDOI

Interactions between serotypes of dengue highlight epidemiological impact of cross-immunity

TL;DR: This is the first quantitative evidence that short-term cross-protection exists since human experimental infection studies performed in the 1950s and will impact strategies for designing dengue vaccine studies, future multi-Strain modelling efforts, and the understanding of evolutionary pressures in multi-strain disease systems.
Journal ArticleDOI

Weather as an effective predictor for occurrence of dengue fever in Taiwan.

TL;DR: Weather variability was identified as a meaningful and significant indicator for the increasing occurrence of dengue fever in this study, and it might be feasible to be adopted for predicting the influences of rising average temperature on the occurrence of infectious diseases of such kind at a city level.
Related Papers (5)

An open challenge to advance probabilistic forecasting for dengue epidemics.

Michael A. Johansson, +85 more