What is the effect of boosting on the coe cients?

In addition to automatically selecting the most important input variables, boosting also regularizes the non-zero coe cients, i.e., the coe cients are shrunk compared to their maximumlikelihood values.

What is the CRPSS for dierent training data lengths?

For shorter training data lengths the number of selected input variables decreases but is still proportionally high compared to the training data length.

What is the coe cient of the daily mean maximum temperature ensemble mean?

After all coe cients being zero in the beginning, the daily mean maximum temperature ensemble mean (tmax2m dmean mean) is the first variable that gets a non-zero coe cients which indicates that it explains the observations best.

What is the coe cient for the log-scale?

After approximately 20 iterations the ensemble standard deviation of daily maximum evaporation (ske dmax sd) enters with a negative coe cient for the log-scale.

What is the RMSE of the forecasts?

While the RMSE shows the deterministic performance, the authors employ the continuous ranked probability score (CRPS; Hersbach 2000) to measure the probabilistic quality of the forecasts.

What is the optimum forecast for the log-scale?

For the log-scale (Figure 3 bottom), ensemble standard deviations of various variables are selected but also ensemble mean forecasts (e.g., of 1000 hPa divergence d1000 dmax mean) seem to contain forecast uncertainty information.

What is the direct predictor of the temperature ensemble?

The direct predictor, the daily minimum minimum temperature ensemble mean, is clearly the most relevant variable over all lead times unlike for maximum temperatures.

What is the RMSE of the location forecasts?

Figure 5 shows the root mean squared error (RMSE) of the location forecasts (µ in Equation 2) of NGB, NGR, and the subset NGR, which is an NGR with the non-zero coe cients from boosting as input.

What are the main characteristics of the weather forecasting model?

many variables such as temperatures, have strongly pronounced seasonal patterns that probably a↵ect the statistical properties of forecasts and observations.

(Open Access) Nonhomogeneous Boosting for Predictor Selection in Ensemble Postprocessing (2017) | Jakob W. Messner

Q: What contributions have the authors mentioned in the paper "Non-homogeneous boosting for predictor selection in ensemble post-processing" ?

This paper proposes a boosting algorithm to estimate the regression coe cients while automatically selecting the most relevant input variables by restricting the coe cients of less important variables to zero. A case study with ensemble forecasts from the European Centre for Medium-Range Weather Forecasts ( ECMWF ) shows that this approach e↵ectively selects important input variables to clearly improve minimum and maximum temperature predictions at 5 central European stations.

Q: What is the main purpose of ensemble forecasts?

The set of potentially useful input variables is huge and includes, among others, ensemble forecasts for other variables or locations, deterministic forecasts, current observations, transformations and interactions of all of these.

Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim

Working Paper

Non-homogeneous boosting for predictor selection in

ensemble post-processing

Working Papers in Economics and Statistics, No. 2016-04

Provided in Cooperation with:

Institute of Public Finance, University of Innsbruck

Suggested Citation: Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim (2016) : Non-

homogeneous boosting for predictor selection in ensemble post-processing, Working Papers in

Economics and Statistics, No. 2016-04, University of Innsbruck, Research Platform Empirical

and Experimental Economics (eeecon), Innsbruck

This Version is available at:

http://hdl.handle.net/10419/146121

Standard-Nutzungsbedingungen:

Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen

Zwecken und zum Privatgebrauch gespeichert und kopiert werden.

Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle

Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich

machen, vertreiben oder anderweitig nutzen.

Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen

(insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten,

gelten abweichend von diesen Nutzungsbedingungen die in der dort

genannten Lizenz gewährten Nutzungsrechte.

Documents in EconStor may be saved and copied for your

personal and scholarly purposes.

You are not to copy documents for public or commercial

purposes, to exhibit the documents publicly, to make them

publicly available on the internet, or to distribute or otherwise

use the documents in public.

If the documents have been made available under an Open

Content Licence (especially Creative Commons Licences), you

may exercise further usage rights as specified in the indicated

licence.

Non-homogeneous boosting for

predictor selection in ensemble

post-processing

Jakob W. Messner, Georg J. Mayr, Achim Zeileis

Working Papers in Economics and Statistics

2016-04

University of Innsbruck

http://eeecon.uibk.ac.at/

University of Innsbruck

Working Papers in Economics and Statistics

The series is jointly edited and published by

- Department of Banking and Finance

- Department of Economics

- Department of Public Finance

- Department of Statistics

Contact address of the editor:

Research platform “Empirical and Experimental Economics”

University of Innsbruck

Universitaetsstrasse 15

A-6020 Innsbruck

Austria

Tel: + 43 512 507 7171

Fax: + 43 512 507 2970

E-mail: eeecon@uibk.ac.at

The most recent version of all working papers can be downloaded at

http://eeecon.uibk.ac.at/wopec/

For a list of recent papers see the backpages of this paper.

Non-Homogeneous Boosting for Predi ctor Selection
in Ensemble Post-Processing
Jakob W. Messner
Universit
¨
at Innsbruck
Georg J. Mayr
Universit
¨
at Innsbruck
Achim Z eileis
Universit
¨
at Innsbruck
Abstract
Non-homogeneous regression is often used to statistically post-process ensemble fore-
casts. Usually only ensemble forecasts of the predictand variable are used as input but
other potentially useful information sources are ignored. Although it is straightforward
to add further input variables, overﬁtting can easily deteriorate the forecast performance
for increasing numbers of input variables. This paper proposes a b oosting algorithm to
estimate the regression coeﬃcients while automatically selecting the most relevant input
variables by restricting the coeﬃcients of less important variables to zero. A case study
with ensemble forecasts from the European Centre for Medium-Range Weather Forecasts
(ECMWF) shows that this approach e↵ectively selects important input variables to clearly
improve minimum and maximum temperature predictions at 5 central European stations.
Keywords: non-homogeneous regression, variable selection, boosting, statistical ensemble post-
processing.
1. Introduction
Over the past decades ensemble forecasts have become an important tool for estimating
the uncertainty of numerical weather prediction models. To account for initial condition
and model errors, numerical models are integrated several times with slightly di↵erent initial
conditions and sometimes di↵erent parameterization schemes. However, because of insuﬃcient
representation of these errors such ensembles of predictions are often biased and do not fully
represent the forecast uncertainty. Therefore ensemble forecasts are often statistically post-
processed to obtain unbiased and calibrated probabilistic forecasts.
Over the past years a variety of di↵erent ensemble post-processing methods have been pro-
posed. Aside from e.g., ensemble dressing (Roulston and Smith 2003), Bayesian model aver-
aging (Raftery, Gneiting, Balabdaoui, and Polakowski 2005), or (extended) logistic regression
(Hamill, Whitaker, and Wei 2004; Wilks 2009; Messner, Zeileis, Mayr, and Wilks 2014b),
non-homogeneous regression (Gneiting, Raftery, Westveld, and Goldman 2005) is particu-
larly popular. It assumes a parametric predictive distribution and models the distribution
parameters as linear functions of predictor variables such as the ensemble mean and ensemble
standard deviation. In recent years it has been used for several di↵erent f orecast variables
(e.g., Thorarinsdottir and Gneiting 2010; Scheuerer 2014; Scheuerer and Hamill 2015) and
has been extended to account for covariance structures (Pinson 2012; Schuhen, Thorarinsdot-
tir, and Gneiting 2012; Schefzik, Thorarinsdottir, and Gneiting 2013; Feldmann, Scheuerer,

2 Non-Homogeneous Boosting for Predictor Selection in Ensemble Post-Processing
and Thorarinsdottir 2015) or to predict full spatial ﬁelds (Scheuerer and B
¨
uermann 2014;
Feldmann et al. 2015). In most publications only the ensemble forecast of the predictand
variable was used as input for the non-homogeneous regression model. However, Scheuerer
(2014) and Scheuerer and Hamill (2015) showed that additional input variables can be easily
incorporated and can clearly improve the forecast performance. The set of potentially useful
input variables is huge and includes, among others, ensemble forecasts for other variables
or locations, deterministic forecasts, current observations, transformations and interactions
of all of these. Since using too many input variables can deteriorate the forecast accuracy
through overﬁtting, the input variables should be selected carefully. Doing this by hand can
be a cumbersome task that requires expert knowledge and should be done separately for each
forecast variable, station and lead time.
For post-processing of deterministic predictions, stepwise regression has commonly been used
to automatically select the most important input variables (e.g., Glahn and Lowry 1972;
Wilson and Vall´e 2002). However, to our knowledge, automatic variable selection has not
yet been used for ensemble post-processing with non-homogeneous regression. In this paper
we propose a boosting algorithm to automatically select the most relevant predictor vari-
ables in non-homogeneous regression. Bo osting has originally been proposed for classiﬁcation
problems (Freund and Schapire 1997) but has also been extended and used for regression
(Friedman, Hastie, and Tibshirani 2000; B
¨
uhlmann and Yu 2003; B
¨
uhlmann and Hothorn
2007; Hastie, Tibshirani, and Friedman 2013). Like other optimization algorithms boosting
ﬁnds the minimum of the loss function iteratively but in each step it only updates the coeﬃ-
cient that improves the current ﬁt most. Thus, if it is stopped before convergence, only the
most important predictor variables have non-zero coeﬃcients so that less relevant variables
are ignored.
To investigate this novel boosting approach and to compare its performance against ordi-
nary non-homogeneous regression we use maximum and minimum temperature forecasts at
ﬁve stations in central Europe. As potential input variables we use ensemble forecasts for
di↵erent weather variables from the European Centre for Medium-Range Weather Forecasts
(ECMWF).
The remainder of this paper is structured as follows: The following section describes the
non-homogeneous regression approach and introduces the boosting algorithm to estimate the
regression coeﬃcients. Subsequently Section 3 describes the data that is used to c ompute the
results that are presented in Section 4. Finally, Section 5 provides a summary and conclusion.
2. Methods
This section ﬁrst describes the non-homogeneous r egression approach of Gneiting et al. (2005)
and subsequently presents a boosting algorithm to automatically select the most relevant input
variables.
2.1. Non-homogeneous regression
Non-homogeneous regression, sometimes also called ensemble model output statistics, was ﬁrst
proposed by Gneiting et al. (2005) for normally distributed predictands such as temperature
and sea level pressure. Later publications extended this method t o variables described by
non-normal distributions, e.g., wind (truncated normal: T horarinsdottir and Gneiting 2010),

Nonhomogeneous Boosting for Predictor Selection in Ensemble Postprocessing

Figures

Citations

Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R

Neural Networks for Postprocessing Ensemble Weather Forecasts

Neural networks for post-processing ensemble weather forecasts.

BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond)

Statistical Postprocessing for Weather Forecasts -- Review, Challenges and Avenues in a Big Data World

References

R: A language and environment for statistical computing.

Regression Shrinkage and Selection via the Lasso

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Statistical Methods in the Atmospheric Sciences

Related Papers (5)

Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation

Strictly Proper Scoring Rules, Prediction, and Estimation

Probabilistic forecasts, calibration and sharpness

Using Bayesian Model Averaging to Calibrate Forecast Ensembles

Statistical Postprocessing of Ensemble Precipitation Forecasts by Fitting Censored, Shifted Gamma Distributions*

Frequently Asked Questions (11)

Q1. What contributions have the authors mentioned in the paper "Non-homogeneous boosting for predictor selection in ensemble post-processing" ?

Q2. What is the effect of boosting on the coe cients?

Q3. What is the CRPSS for dierent training data lengths?

Q4. What is the coe cient of the daily mean maximum temperature ensemble mean?

Q5. What is the coe cient for the log-scale?

Q6. What is the RMSE of the forecasts?

Q7. What is the main purpose of ensemble forecasts?

Q8. What is the optimum forecast for the log-scale?

Q9. What is the direct predictor of the temperature ensemble?

Q10. What is the RMSE of the location forecasts?

Q11. What are the main characteristics of the weather forecasting model?