scispace - formally typeset
Search or ask a question
Posted ContentDOI

CoViD--19: An Automatic, Semiparametric Estimation Method for the Population Infected in Italy

18 Mar 2020-medRxiv (Cold Spring Harbor Laboratory Press)-
TL;DR: Results show that, while official data at March the 12th report 12.839 cases in Italy, people infected with the SARSCoV2 could be as high as 105.789, which is designed to be robust, automatic and suitable to generate estimations at regional level.
Abstract: To date, official data on the number of people infected with the SARS-CoV-2 - responsible for the CoViD–19 - have been released by the Italian Government just on the basis of a non representative sample of population which tested positive for the swab. However a reliable estimation of the number of infected, including asymptomatic people, turns out to be crucial in the preparation of operational schemes and to estimate the future number of people, who will require, to different extents, medical attentions. In order to overcome the current data shortcoming, this paper proposes a bootstrap–driven, estimation procedure for the number of people infected with the SARS-CoV-2. This method is designed to be robust, automatic and suitable to generate estimations at regional level. Obtained results show that, while official data at March the 12th report 12.839 cases in Italy, people infected wiyh the SARS-CoV-2 could be as high as 105.789.

Summary (2 min read)

1. Introduction

  • Cases of COVID-19 break out in Italy where it is first attested a capillary spread of this disease in the European continent after the Asian one: the scenario that is developing in these days is creating an example that unfortunately will certainly be repeated in other states all over the world.
  • This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
  • The presented procedure is designed to overcome these problems.

2. The proposed method

  • While the former does not pose problems in terms of DOF, the latter clearly does.
  • DOF–saving strategy is also the driving force of the choice not to consider as an exogenous parameter the georeferencing of Regions or to include the regional population in a regression–like scheme but to implicitly assumed these variable embedded in the dynamic of the time series in question.

3. Data and contageon indicator

  • The paper makes use of official data published by Italian Authorities, on the following two variables of interest 1. number of deaths from CoViD–19 (denoted by the Latin letter M) 2. number of currently positive cases recorded after the administration of the test (denoted by the Latin letter C).
  • The total number of Italian regions considered is 20.
  • Two different subsets are built from Ω i.e.
  • Ω, w is the ratio between current positive cases (C) and number of deaths (M) (2), τ the average doubling time for the CoViD–19 (i.e. the average span of time needed for the virus to double the cases) and δ the average time for an infected person to die.
  • These two constant terms have been kept fixed as estimated according the data so far available worldwide (see Pueyo (2020)).

4. The Resampling Method

  • The bootstrap scheme adopted proved to be a real asset for the problem at hand.
  • Given the pivotal role played it will be briefly presented.
  • In essence, the choice of the most appropriate resampling method is far from being an easy task, especially when the identical and independent distribution iid assumption (Efron’s initial bootstrap method) is violated.
  • This is especially true under the actual conditions (small sample sizes).
  • Technically, MEB algorithm can be broken down, following Koutris et al. (2008), in 8 steps.

5. The application of the maximum entropy bootstrap

  • Once the data become available, one has just to divide them according to the subsets Ω i.e. Ω Page 5 of 14 L. Fenga COVID-19 Estimation and the code will process the new data in an automatic way.
  • The procedure is also very fast as the computing time needed for the generation of the bootstrap samples requires less than 2 minutes.
  • Both code and data used for this Paper are freely made available for any researcher who would consider using it.

6. Empiricical evidences

  • Note that sudden variations are due to the little number of test administrated (denominator of the variable CT (2)).
  • That said, the main result of the paper is summarized by Table 2, where three estimates of the number of infected people are reported by region.
  • The regions belonging to the set Ω◦ (i.e. no deaths) are in Italics (all the others belong to the set Ω).
  • In the column “Mean” and Lower Bounds the bootstrap estimates computed according to Eqn 5 and 6 and the Lower Bounds the lower bootstrap CIs are respectively reported.
  • The column denominated “Official Cases” accounts for the number of official cases released by the Italian Authorities whereas the column “Morbidity” expresses the percentage ratio between µ (5) or µ◦ (6) and the actual population of each region.

7. Conclusions

  • It is widespread opinion in the scientific community that current official data on the diffusion of SARS-CoV-2, responsible of the correlated disease, COIVD-19,among population, are likely to suffer from a strong downward bias.
  • The aim of this instant paper is twofold: fist, it can compute realistic figures on the effective number of people infected with SARS-CoV-2 in Italy; Page 6 of 14 L. Fenga COVID-19 Estimation second, it can provide a methodology, which improves current state of art and can be used to compute similar figures in other countries.
  • The entire procedure has been written in the programming language R and uses official data as published by the Italian Government.
  • To overcome the crisis, international solidarity together wit, strong and coordinated actions among countries will be crucial.
  • Stay at home, and, if you can, do research on this topic, every contribution could be crucial.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

L. Fenga COVID-19 Estimation
CoViD–19: An Automatic,
Semiparametric Estimation Method
for the Population Infected in Italy
Livio Fenga
Italian National Institute of Statistics
ISTAT, Rome, Italy 00184
livio.fenga@istat.it
Abstract: To date, official data on the number of people infected
with the SARS-CoV-2 - responsible for the CoViD–19 - have been re-
leased by the Italian Government just on the basis of a non repre-
sentative sample of population which tested positive for the swab.
However a reliable estimation of the number of infected, including
asymptomatic people, turns out to be crucial in the preparation of
operational schemes and to estimate the future number of people,
who will require, to different extents, medical attentions. In order to
overcome the current data shortcoming, this paper proposes a boot-
strap–driven, estimation procedure for the number of people infected
with the SARS-CoV-2. This method is designed to be robust, auto-
matic and suitable to generate estimations at regional level. Obtained
results show that, while official data at March the 12th report 12.839
cases in Italy, people infected wiyh the SARS-CoV-2 could be as high
as 105.789.
KEYWORDS: Autoregressive metric; CoViD–19; maximum entropy bootstrap; model un-
certainty; number of Italian people infected
1. Introduction
Cases of COVID-19 break out in Italy where it is first attested a capillary spread of this
disease in the European continent after the Asian one: the scenario that is developing in
these days is creating an example that unfortunately will certainly be repeated in other
states all over the world. In this framework, the availability of a reliable data sources
on the diffusion of SARS-CoV-2 the virus responsible for this disease - is crucial in
many ways. It is needed to maximize coordination among emergency services located
in different parts of the County and within EU, it is crucial for the preparation of
operational schemes, and pivotal to allow a proper prediction of the development of
the pandemic.
At the moment, official data on the infection in Italy are based on non random,
non representative samples of the population: as a matter of fact people are tested for
SARS-CoV-2 on the condition that some symptoms related to the virus are present.
These data can ensure a proper estimation of total deaths and total hospitalizations
due to the virus-related disease: this is crucial to proceed in terms of optimization
available resources, of rationalization of accesses to hospitals, of other health facilities
and so forth. Nonetheless, form a pure statistical point of view they are not suitable to
provide a reliable source of information on the real number of infected people (there-
after “positive cases”).
Page 1 of 14
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted March 18, 2020. ; https://doi.org/10.1101/2020.03.14.20036103doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

L. Fenga COVID-19 Estimation
Starting from the number of deaths and the number of people tested positive
to the virus and improving on the methodology originally proposed by Pueyo (2020),
this paper aims to estimate the real number of people infected by the SARS-CoV-2,
simply called CORONAVIRUS, in each of the 20 Italian regions.
Small sample size which is suitable to lead to a strong bias in asymptotic re-
sults and which is very likely to imply the construction of incorrect confidence intervals
and the distortion of the sample introduced by the mentioned testing strategy are the
two mayor obstacles in reliable estimations.
The presented procedure is designed to overcome these problems. As it will
be detailed in the sequel, in order to reduce the impact of biasing components on the
parameter estimations, a recent bootstrap scheme, called Maximum Entropy Bootstrap
and proposed by Vinod et al. (2009), has been employed. In addition to that, a distance
measure based on the theory of stochastic processes and proposed by Piccolo (1990)
has been employed to guarantee statistical coherence among all the Italian regions.
2. The proposed method
In small data sets it is essential to save degrees of freedom (DOF). In this perspective,
the adopted model of the type semiparametric consists of two parts: a purely non-
parametric and a parametric one. While the former does not pose problems in terms
of DOF, the latter clearly does. However, the sacrifice in terms of DOF is very limited
as an autoregressive model of order 1 (employed in a suitable distance function, as
below illustrated) has proved sufficient for the purpose. DOF–saving strategy is also
the driving force of the choice not to consider as an exogenous parameter the georef-
erencing of Regions or to include the regional population in a regression–like scheme
but to implicitly assumed these variable embedded in the dynamic of the time series in
question.
3. Data and contageon indicator
The paper makes use of official data published by Italian Authorities, on the following
two variables of interest
1. number of deaths from CoViD–19 (denoted by the Latin letter M)
2. number of currently positive cases recorded after the administration of the test
(denoted by the Latin letter C).
The data set includes 18 daily datapoints collected at regional level during the
period of February 24
th
to March 12
th
. The total number of Italian regions considered
is 20. However, one special administrative area (Trentino Alto Adige) is divided in two
subregions, i.e. Trento and Bolzano. Therefore, the set containing all the Italian regions
called has cardinality || = 22 (the cardinality function is denoted by the symbol
| · | = 22). Two different subsets are built from i.e.
containing the regions for
which at least one death, out of the group of tested people, has been recorded and
(no recorded deaths):
1.
P iemonte, Lombardia, V eneto, F riuli, Liguria, Emilia, T oscana, Marche,
Lazio, Abbruzzo, V alleAosta, Bolzano, Campania, P uglia, Sicilia
2.
T rento, Umbria, Molise, Basilicata, Calabria, Sardegna,
being
. In what follows, the two superscripts
and
will be
always used respectively with reference to the regions {r
1
, r
2
, . . . r
15
}
and in
{s
1
, s
2
, . . . s
6
}
. The time span is denoted as {1, 2, . . . , T }.
Page 2 of 14
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted March 18, 2020. ; https://doi.org/10.1101/2020.03.14.20036103doi: medRxiv preprint

L. Fenga COVID-19 Estimation
In the case of the regions included in
, following Pueyo (2020), estimates
the total number of people infected by CoViD-19 as follows:
y
j,T
= p 2
τ
δ
, (1)
w
T
=
C
T
M
T
(2)
where the superscript identifies the regions {r
1
, r
2
, . . . r
15
}
, w is the
ratio between current positive cases (C) and number of deaths (M) (2), τ the average
doubling time for the CoViD–19 (i.e. the average span of time needed for the virus
to double the cases) and δ the average time for an infected person to die. These two
constant terms have been kept fixed as estimated according the data so far available
worldwide (see Pueyo (2020)). They are as follows: τ = 17.3 and δ = 6.2.
The case of the regions belonging to
is more complicated. The approach
adopted is as follows:
1. Given the s
j
a series c
π
minimizing of a suitable distance function
denoted by the Greek letter π(·) is found. In symbols: c
π
= argmin
(c
)
π(s, c);
2. the estimated number of infected at the population level found for c
π
, say I
c
π
becomes the weight for which the total cases recorded for s
j
, i.e.
I
c
π
C
s
j
C
r
j
Therefore, the estimate of the variable of interest for this case is as follows:
y
j,T
=
I
c
π
C
s
j
C
r
j
(3)
The distance function adopted (π), called AR distance, has been introduced by
Piccolo (2007)). Briefly, the series of interest are considered a realization of an ARMA
(Autoregressive Moving Average) model (see, e.g. Makridakis and Hibon (1997)) so
that, each of them can be expressed as an autoregressive model of infinite order, i.e.
AR() whose infinite sequence of AR parameters is α
1
, α
2
, . . . .
Without loss of generality, the distance between the series s and c π(s, c) (Eqn
3) is expressed as
π(s, c) =
p
(
X
j=1
α
j
(s) α
j
(c)) (4)
4. The Resampling Method
The bootstrap scheme adopted proved to be a real asset for the problem at hand. Given
the pivotal role played it will be briefly presented. In essence, the choice of the most ap-
propriate resampling method is far from being an easy task, especially when the iden-
tical and independent distribution iid assumption (Efron’s initial bootstrap method) is
violated. Under dependence structures embedded in the data, simple sampling with re-
placement has been proved see, for example Carlstein et al. (1986) to yield subopti-
mal results. As a matter of fact, iid–based bootstrap schmes are not designed to capture,
and therefore replicate, dependence structures. This is especially true under the actual
conditions (small sample sizes). In such cases, selecting the “right” resampling scheme
becomes a particularly challenging task. Several ad hoc methods have been therefore
Page 3 of 14
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted March 18, 2020. ; https://doi.org/10.1101/2020.03.14.20036103doi: medRxiv preprint

L. Fenga COVID-19 Estimation
proposed, many of which now freely and publicly available in the form of powerful rou-
tines working under software package such as Python
R
or R
R
. In more details, while
in the classic bootstrap an ensemble represents the population of reference the ob-
served time series is drawn from, in MEB a large number of ensembles (subsets), say
{ω
1
, . . . , ω
N
} becomes the elements belonging to , each of them containing a large
number of replicates {x
1
, . . . , x
J
}. Perhaps, the most important characteristic of the
MEB algorithm is that its design guarantees the inference process to satisfy the ergodic
theorem. Formally, denoting by the symbol | · | the cardinality function (counting func-
tion) of a given ensemble of time series {x
t
ω
i
; i = 1, . . . , N}, the MEB procedure
generates a set of disjoint subsets
N
ω
1
ω
1
· · · ω
N
s.t. E
N
µ(x
t
), being µ(·)
the sample mean. Furthermore, basic shape and probabilistic structure (dependency)
is guaranteed to be retained x
t,j
ω
i
.
MEB resampling scheme has not negligible advantages over many of the avail-
able bootstrap methods: it does not require complicated tune up procedures (unavoid-
able, for example, in the case of resampling methods of the type Block Bootstrap) and it
is effective under non-stationarity. MEB method relies on the entropy theory and the re-
lated concept of (un)informativeness of a system. In particular, the Maximum Entropy
of a given density δ(x), is chosen so that the expectation of the Shannon Information
H = E( log δ(x)), is maximized, i.e.
max
(δ)
H = E( log δ(x)).
Under mass and mean preserving constraints, this resampling scheme gener-
ates an ensemble of time series from a density function satisfying (4). Technically, MEB
algorithm can be broken down, following Koutris et al. (2008), in 8 steps. They are:
1. a sorting matrix of dimension T × 2, say S
1
, accommodates in its first column the
time series of interest x
t
and an Index Set i.e. I
ind
= {2, 3, . . . , T } in the other
one;
2. S
1
is sorted according to the numbers placed in the first column. As a result,
the order statistics x
(t)
and the vector I
ord
of sorted I
ind
are generated and
respectively placed in the first and second column;
3. compute “intermediate points”, averaging over successive order statistics, i.e.
c
t
=
x
(t)
+x
(t+1)
2
, t = 1, . . . T 1 and define intervals I
t
constructed on c
t
and r
t
,
using ad hoc weights obtained by solving the following set of equations:
i)
f(x) =
1
r
1
exp(
[x c
1
]
r
1
); x I
1
; r
1
=
3x
(1)
4
+
x
(2)
4
ii)
f(x) =
1
c
k
c
k1
; x (c
k
; c
k+1
)],
r
k
=
x
(k1)
4
+
x
(k)
2
+
x
(k+1)
4
; k = 1, . . . , T 1;
iii)
f(x) =
1
r
T
exp
[c
T 1
x]
r
T
; x I
T
; r
T
=
x
T 1
4
+
3x
T
4
;
Page 4 of 14
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted March 18, 2020. ; https://doi.org/10.1101/2020.03.14.20036103doi: medRxiv preprint

L. Fenga COVID-19 Estimation
4. from a uniform distribution in [0, 1], generate T pseudorandom numbers and
define the interval R
t
= (t/T ; t + 1/T ] for t = 0, 1, . . . , T 1, in which each p
j
falls;
5. create a matching between R
t
and I
t
according to the following equations:
x
j,t,me
= c
T 1
|θ| ln(1 p
j
) if p
j
R
0
,
x
j,t,me
= c
1
|θ||ln(1 p
j
)| if p
j
R
T 1
,
so that a set of T values {x
j,t
}, as the j
th
resample is obtained. Here θ is the
mean of the standard exponential distribution;
6. a new T × 2 sorting matrix S
2
is defined and the T members of the set {x
j,t
}
for the j
th
resample obtained in Step 5 is reordered in an increasing order of
magnitude and placed in column 1. The sorted I
ord
values (Step 2) are placed in
column 2 of S
2
;
7. matrix S
2
is sorted according to the second column so that the order {1, 2, . . . , T }
is there restored. The jointly sorted elements of column 1 is denoted by {x
S,j,t
},
where S recalls the sorting step;
8. Repeat Steps 1 to 7 a large number of times.
5. The application of the maximum entropy bootstrap
In what follows, the proposed procedure is presented in a step-by-step fashion.
1. For each time series y
t
and y
t
the bootstrap procedure is applied so that B=
100 “bona fide” replications are available, i.e. ˜y
t,b
; b = 1, 2, . . . B and ˜y
t,b
; b =
1, 2, . . . B;
2. for both the series, the row vector related to the last observation T is extracted,
i.e. {v
= ˜y
T,1
, ˜y
T,2
. . . ˜y
T,B
} and {v
= ˜y
T,1
, ˜y
T,2
. . . ˜y
T,B
}
3. the expected values (E(v
) E(v
)) are then extracted as well as the 95% confi-
dence intervals (CI
and CI
), computed according to the t–percentile method.
The explanation of the T–percentile method goes beyond the scope of this paper,
therefore the interested reader is referred to the excellent paper by Berkowitz
and Kilian (2000).
In particular, the lower (upper) CIs will be the lower (upper) bounds of our
estimator while the quantities E(v
) E(v
) are estimated through the mean operator,
i.e.
µ
=
6
X
j=1
v
j
(5)
and
µ
=
6
X
j=1
v
j
(6)
At this point, it is worth emphasizing that the procedure not only, as just seen,
requires very little in terms of data but can be run in an automatic fashion. Once the
data become available, one has just to divide them according to the subsets i.e.
Page 5 of 14
. CC-BY-NC-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted March 18, 2020. ; https://doi.org/10.1101/2020.03.14.20036103doi: medRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors investigated a slightly generalized version of the same model and proposed a scheme for fitting the parameters of the model to real data using the time series only of the deceased individuals.

42 citations

Journal ArticleDOI
TL;DR: It appears that the spreading reached saturation in China, due to the strong containment policy of the national government, and in Singapore a large growth rate, recently observed, suggests the start of a new strong spreading.
Abstract: To evaluate the effectiveness of the containment on the epidemic spreading of the new Coronavirus disease 2019, we carry on an analysis of the time evolution of the infection in a selected number of different Countries, by considering well-known macroscopic growth laws, the Gompertz law, and the logistic law. We also propose here a generalization of Gompertz law. Our data analysis permits an evaluation of the maximum number of infected individuals. The daily data must be compared with the obtained fits, to verify if the spreading is under control. From our analysis it appears that the spreading reached saturation in China, due to the strong containment policy of the national government. In Singapore a large growth rate, recently observed, suggests the start of a new strong spreading. For South Korea and Italy, instead, the next data on new infections will be crucial to understand if the saturation will be reached for lower or higher numbers of infected individuals.

37 citations

Journal ArticleDOI
TL;DR: This article proposes an alternative method to a classical SIRD model for the evaluation of the Sars-Cov-2 epidemic, and studies the behavior of the ratio infected over swabs for Italy, Germany and USA, to recover the generalized Logistic model used in [1].
Abstract: In a previous article [1] we have described the temporal evolution of the Sars-Cov-2 in Italy in the time window February 24-April 1. As we can see in [1] a generalized logistic equation captures both the peaks of the total infected and the deaths. In this article our goal is to study the missing peak, i.e. the currently infected one (or total currently positive). After the April 7, the large increase in the number of swabs meant that the logistical behavior of the infected curve no longer worked. So we decided to generalize the model, introducing new parameters. Moreover, we adopt a similar approach used in [1] (for the estimation of deaths) in order to evaluate the recoveries. In this way, introducing a simple conservation law, we define a model with 4 populations: total infected, currently positives, recoveries and deaths. Therefore, we propose an alternative method to a classical SIRD model for the evaluation of the Sars-Cov-2 epidemic. However, the method is general and thus applicable to other diseases. Finally we study the behavior of the ratio infected over swabs for Italy, Germany and USA, and we show as studying this parameter we recover the generalized Logistic model used in [1] for these three countries. We think that this trend could be useful for a future epidemic of this coronavirus.

19 citations

Journal ArticleDOI
TL;DR: Macroscopic growth laws describe in an effective way the underlying complex dynamics of the spreading of infections, as in the case of Covid-19, where the counting of the cumulative number N(t) of...
Abstract: Macroscopic growth laws describe in an effective way the underlying complex dynamics of the spreading of infections, as in the case of Covid-19, where the counting of the cumulative number N(t) of ...

16 citations

Journal ArticleDOI
TL;DR: The temporal evolution of the pandemic Sars-Cov-2 in Italy by means of dynamic population models and suggests that a different analysis, region by region, would be more sensible than one on the whole Italy, because the region Lombardy has a behaviour very fast compared to the other ones.
Abstract: In this article we study the temporal evolution of the pandemic Sars-Cov-2 in Italy by means of dynamic population models. The time window of the available population data is between February 24, and March 25. After we upgrade the data until April 1. We perform the analysis with 4 different models and we think that the best candidate to correctly described the italian situation is a generalized Logistic equation. We use two coupled differential equations that model the evolution of the severe infected and the dead. This choice is due to the fact that in Italy the pharyngeal swabs are made only to severe infected, therefore we have no information about asymptomatic people. Moreover, an important observation is that the virus spreads between Regions with some delay. Indeed, we suggest that a different analysis, region by region, would be more sensible than one on the whole Italy. In particular the region Lombardy has a behaviour very fast compared to the other ones. We show the fit and forecast of the dead and total severe infected for Italy and five regions: Lombardy, Piedmont, Emilia-Romagna, Veneto and Tuscany. Finally we perform an analysis of the peak (intended, in our study, as the maximum of the daily total severe infected) and an estimation of how many lives have been saved by means of the LockDown.

15 citations


Additional excerpts

  • ...Finally we ave dI dt = r 0 I(1 − I K ) + A t = March 23 , (5)...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed a variance estimator for a general statistic, where the subseries values are used as replicates to model the sampling variability of the sample variance.
Abstract: Let $Z_i:-\infty

733 citations


"CoViD--19: An Automatic, Semiparame..." refers methods in this paper

  • ...Under dependence structures embedded in the data, simple sampling with replacement has been proved -see, for example Carlstein et al....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the block size plays an important role in determining the success of the block bootstrap, and a data-based block size selection procedure is proposed, which would account for lag order uncertainty in resampling.
Abstract: In recent years, several new parametric and nonparametric bootstrap methods have been proposed for time series data. Which of these methods should applied researchers use? We provide evidence that for many applications in time series econometrics parametric methods are more accurate, and we identify directions for future research on improving nonparametric methods. We explicitly address the important but often neglected issue of model selection in bootstrapping. In particular, we emphasize the advantages of the AIC over other lag order selection criteria and the need to account for lag order uncertainty in resampling. We also show that the block size plays an important role in determining the success of the block bootstrap, and we propose a data-based block size selection procedure.

321 citations


"CoViD--19: An Automatic, Semiparame..." refers methods in this paper

  • ...For each time series y • t and y • t the bootstrap procedure is applied so that B= 100 "bona fide" replications are available, i.e.ỹ The explanation of the T-percentile method goes beyond the scope of this paper, therefore the interested reader is referred to the excellent paper by Berkowitz and Kilian (2000) ....

    [...]

  • ...The explanation of the T–percentile method goes beyond the scope of this paper, therefore the interested reader is referred to the excellent paper by Berkowitz and Kilian (2000)....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that AR(1), AR(2) and ARMA(1,1) models can produce more accurate post-sample forecasts than those found through the application of Box‐ Jenkins methodology.
Abstract: The purpose of this paper is to apply the Box‐Jenkins methodology to ARIMA models and determine the reasons why in empirical tests it is found that the post-sample forecasting the accuracy of such models is generally worse than much simpler time series methods. The paper concludes that the major problem is the way of making the series stationary in its mean (i.e. the method of diAerencing) that has been proposed by Box and Jenkins. If alternative approaches are utilized to remove and extrapolate the trend in the data, ARMA models outperform the models selected through Box‐Jenkins methodology. In addition, it is shown that using ARMA models to seasonally adjusted data slightly improves post-sample accuracies while simplifying the use of ARMA models. It is also confirmed that transformations slightly improve post-sample forecasting accuracy, particularly for long forecasting horizons. Finally, it is demonstrated that AR(1), AR(2) and ARMA(1,1) models can produce more accurate post-sample forecasts than those found through the application of Box‐ Jenkins methodology. #1997 by John Wiley & Sons, Ltd.

306 citations


"CoViD--19: An Automatic, Semiparame..." refers methods in this paper

  • ...Briefly, the series of interest are considered a realization of an ARMA (Autoregressive Moving Average) model (see, e.g. Makridakis and Hibon (1997)) so that, each of them can be expressed as an autoregressive model of infinite order, i.e. AR(∞) whose infinite sequence of AR parameters is α1, α2, .…...

    [...]

  • ...Briefly, the series of interest are considered a realization of an ARMA (Autoregressive Moving Average) model (see, e.g. Makridakis and Hibon (1997) ) so that, each of them can be expressed as an autoregressive model of infinite order, i.e. AR(∞) whose infinite sequence of AR parameters is α 1 , α 2 , ....

    [...]

Journal ArticleDOI
TL;DR: In this article, a parametric approach is proposed in order to introduce a well-defined metric on the class of autoregressive integrated moving-average (ARIMA) invertible models as the Euclidean distance between their auto-gressive expansions.
Abstract: . In a number of practical problems where clustering or choosing from a set of dynamic structures is needed, the introduction of a distance between the data is an early step in the application of multivariate statistical methods. In this paper a parametric approach is proposed in order to introduce a well-defined metric on the class of autoregressive integrated moving-average (ARIMA) invertible models as the Euclidean distance between their autoregressive expansions. Two case studies for clustering economic time series and for assessing the consistency of seasonal adjustment procedures are discussed. Finally, some related proposals are surveyed and some suggestions for further research are made.

269 citations


"CoViD--19: An Automatic, Semiparame..." refers methods in this paper

  • ...In addition to that, a distance measure -based on the theory of stochastic processes and proposed by Piccolo (1990) -has been employed to guarantee statistical coherence among all the Italian regions....

    [...]

  • ...In addition to that, a distance measure – based on the theory of stochastic processes and proposed by Piccolo (1990) – has been employed to guarantee statistical coherence among all the Italian regions....

    [...]

Journal ArticleDOI
TL;DR: The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference and its scope is illustrated by means of several guided applications.
Abstract: The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference. Stationarity is not required and the ensemble satisfies the ergodic theorem and the central limit theorem. The meboot R package implements such algorithm. This document introduces the procedure and illustrates its scope by means of several guided applications.

172 citations


"CoViD--19: An Automatic, Semiparame..." refers methods in this paper

  • ...As it will be detailed in the sequel, in order to reduce the impact of biasing components on the parameter estimations, a recent bootstrap scheme, called Maximum Entropy Bootstrap and proposed by Vinod et al....

    [...]

Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Covid–19: an automatic, semiparametric estimation method for the population infected in italy" ?

In order to overcome the current data shortcoming, this paper proposes a bootstrap–driven, estimation procedure for the number of people infected with the SARS-CoV-2. Obtained results show that, while official data at March the 12th report 12.