scispace - formally typeset
Open AccessJournal ArticleDOI

LASSO vector autoregression structures for very short-term wind power forecasting

TLDR
A forecasting methodology that explores a set of different sparse structures for the vector autoregression (VAR) model using the Least Absolute Shrinkage and Selection Operator (LASSO) framework to create a scalable forecasting method supported by parallel computing and fast convergence.
Abstract
The deployment of smart grids and renewable energy dispatch centers motivates the development of forecasting techniques that take advantage of near real-time measurements collected from geographically distributed sensors. This paper describes a forecasting methodology that explores a set of different sparse structures for the vector autoregression (VAR) model using the Least Absolute Shrinkage and Selection Operator (LASSO) framework. The alternating direction method of multipliers is applied to fit the different VAR-LASSO variants and create a scalable forecasting method supported by parallel computing and fast convergence, which can be used by system operators and renewable power plant operators. A test case with 66 wind power plants is used to show the improvement in forecasting skill from exploring distributed sparse structures. The proposed solution outperformed the conventional autoregressive and vector autoregressive models, as well as a sparse-VAR model from the state of the art.LASSO Vector Autoregression Structures for Very Short-term Wind Power Forecasting

read more

Content maybe subject to copyright    Report

WIND ENERGY
Wind Energ.
2017; 20:657–675
Published online 19 September 2016 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/we.2029
RESEARCH ARTICLE
LASSO vector autoregression structures for very
short-term wind power forecasting
Laura Cavalcante
1
, Ricardo J. Bessa
1
, Marisa Reis
1
and Jethro Browell
2
1
INESC Technology and Science (INESC TEC), Campus da FEUP, Rua Dr. Roberto Frias, Porto 4200-465, Portugal
2
Royal College Building, University of Strathclyde, 204 George Street, Glasgow, Scotland
ABSTRACT
The deployment of smart grids and renewable energy dispatch centers motivates the development of forecasting techniques
that take advantage of near real-time measurements collected from geographically distributed sensors. This paper describes
a forecasting methodology that explores a set of different sparse structures for the vector autoregression (VAR) model using
the least absolute shrinkage and selection operator (LASSO) framework. The alternating direction method of multipliers is
applied to fit the different LASSO-VAR variants and create a scalable forecasting method supported by parallel computing
and fast convergence, which can be used by system operators and renewable power plant operators. A test case with 66
wind power plants is used to show the improvement in forecasting skill from exploring distributed sparse structures. The
proposed solution outperformed the conventional autoregressive and vector autoregressive models, as well as a sparse VAR
model from the state of the art. Copyright © 2016 John Wiley & Sons, Ltd.
KEYWORDS
wind power; vector autoregression; scalability; sparse; renewable energy; parallel computing
Correspondence
Ricardo J. Bessa, INESC Technology and Science (INESC TEC), Campus da FEUP, Rua Dr. Roberto Frias, Porto 4200-465, Portugal.
E-mail: ricardo.j.bessa@inesctec.pt
Received 8 March 2016; Revised 11 July 2016; Accepted 23 August 2016
1. INTRODUCTION
Operating a power system with high integration levels of wind power is challenging and demands for a continuous improve-
ment of wind power forecast tools.
1, 2
Furthermore, the participation of wind power in the electricity market also requires
accurate forecasts in order to mitigate financial risks associated to energy imbalances.
3, 4
The recent advent of smart grid technologies will increase the monitoring capability of the electric power system.
5
Furthermore, the investment in renewable energy dispatch centers enables real-time acquisition of time series measurements
from wind power plants (WPPs).
6
The availability of the most recent WPP measurements improves the forecast skill during
the first lead times, commonly called very short-term horizon.
7
For this time horizon, it is generally established that statistical models are more accurate than physical models, while
for longer time horizons the most relevant inputs come from numerical weather prediction (NWP) models.
7
Even recent
advances in physical models, such as the high-resolution rapid refresh model developed by U.S. National Oceanic and
Atmospheric Administration, are outperformed by statistical models that use recent WPP observations.
8
In the state of the art, a broad family of statistical models are available for the very short-term horizon. Two examples
are the conditional parametric autoregression (AR) and regime-switching models that incorporate online observed local
variables (i.e., wind speed and direction) to reduce the wind power forecast error for 10 min-ahead forecasting.
9
Another
example is the use of automatic self-tuning Kalman filters that incorporate NWP information.
10
In this context, information from WPP time series distributed in space can be used to improve the forecast skill of each
WPP. The first results were presented by Gneiting et al. for 2 h-ahead wind speed forecasting.
11
The authors showed that
a regime-switching space-time diurnal model that takes advantage of temporal and spatial correlation from geographically
dispersed meteorological stations as off-site predictors can have a root mean square error (RMSE) 28.6% lower than the
persistence forecasts. Expert knowledge and empirical results were used to select the predictors. In Hering and Genton,
12
two additional statistical models are proposed: trigonometric direction diurnal model and bivariate skew-t model. These
Copyright © 2016 John Wiley & Sons, Ltd.
657

LASSO vector autoregression structures for very short-term wind power forecasting L. Cavalcante
et al.
Figure 1. Groups of models from the state of the art.
results were generalized by Tastu et al. by studying the spatiotemporal propagation of wind power forecast errors.
13
The
authors showed evidences of cross-correlation functions with significant dependency in lags of a few hours.
These works motivated the appearance of recent research that explores information from neighboring WPP. Figure 1
groups the state-of-the-art methods applied to wind power by category.
The first group consists of machine learning methods, such as artificial neural networks. To the authors’ knowledge,
there is little research concerning the application of machine learning models to this problem. In Kou et al.,
14
it is described
a online sparse Bayesian model based on warped Gaussian process to generate probabilistic wind power forecasts. A spar-
sification strategy is used to reduce the computational cost, and the model includes wind speed observations from nearby
WPP and NWP data. Also in this category, but applied to solar power forecasting, in Vaz et al.,
15
multilayer perceptron
neural networks are used to combine measurements of neighboring PV systems, and in Bessa et al.,
16
component-wise
gradient boosting is used to explore PV observations from a smart grid.
The following limitations were identified for this first group: (i) a separated model is fitted to each location, which
increases the computational time; (ii) the scalability of the solution decreases when the number of predictor increases and
(iii) with the exception of the sparse Bayesian model, the others do not provide a sparse vector of coefficients.
The second group consists of random fields. To the authors’ knowledge, the only work that explores this theory is from
Wytock and Kolter.
17
The model is based on sparse Gaussian conditional random field and uses a new second-order active
set method to solve the problem. The main limitation of the method is that it requires a copula transformation in order to
have a Gaussian marginal distribution, which might not solve the boundary problem of variables with limited support (e.g.,
wind power between zero and rated power). Moreover, the computational time for a solution with high accuracy is around
160 min for a case study with seven WPPs.
18
The third group is related to classical time series theory. Tastu et al. extended their previous work in Tastu et al.
13
to
the multivariate framework,
19
i.e., from an AR to a vector AR (VAR) model. The VAR coefficients are allowed to vary
with external variables, average wind direction in this case. The main limitation is a non-sparse matrix of coefficients
since feature selection is not performed. A similar methodology was applied in Tastu et al.
20
to generate probabilistic
forecast based on geographically distributed sensors. Also in this case, the predictors are manually selected based on
cross-correlation analysis.
He et al. presents a two-stage approach:
21
(i) offline spatial–temporal analysis carried out on historical data with multiple
finite-state Markov chains and (ii) online forecasting by feeding a Markov chain with real-time measurements of the wind
turbines. Similar to previous works, different sparse structures of the spatial–temporal relations are not fully explored. The
same authors in He et al.
22
propose a different approach based on VAR model fitting with sparsity-constrained maximum
likelihood. The main limitation of this approach is that the sparse coefficients are not automatically defined, instead, expert
knowledge and partial correlation analysis are employed.
Aiming to generate forecasts on a large spatial scale, e.g. hundreds of locations, Dowell and Pinson proposed the
sparse-VAR (sVAR) approach for 5 min-ahead forecasts.
23
The sVAR method generates probabilistic forecasts based on the
logit-normal distribution,
24
whose mean is estimated with a VAR model and variance by a modified exponential smooth-
ing. A state-of-the-art technique from Davis et al.
25
is employed to fit a VAR model with a sparse coefficient matrix. The
work proposed in the present paper is closely related to the sVAR and provides the following original contributions:
Wind Energ.
2017; 20:657–675 © 2016 John Wiley & Sons, Ltd.
658
DOI: 10.1002/we

L. Cavalcante
et al.
LASSO vector autoregression structures for very short-term wind power forecasting
1. Explores a set of different sparse structures f or the VAR framework using the least absolute shrinkage and selection
operator (LASSO) framework;
26
2. Applies the alternating direction method of multipliers (ADMM)
27
to fit the different LASSO-VAR variants;
3. Proposes a scalable forecasting method based on parallel computing, fast convergence optimization algorithm and
matrix calculations.
The proposed method will be compared with the sVAR approach in terms of advantages and limitations, applied to a
case study with 66 WPPs located in the same control area. It should be stressed that the proposed approach is compatible
with previous works from the literature. For instance, it can be used for spatial–temporal correction of forecast errors,
13
extended to conditional VAR,
19
or used to generate probabilistic forecasts based on the logit-normal distribution.
23
The paper is organized as follows. Section 2 presents the different sparse structures for the VAR model. Section 3
describes the application of the ADMM method to fit the VAR model in its different LASSO variants. The test case results
are presented in Section 4. Section 5 presents the conclusion and future work.
2. SPARSE STRUCTURES FOR THE VAR MODEL
The VAR model allows a simultaneous forecast of the wind generation in several neighboring sites combining time series
information. However, forecasting with VAR models may be intractable for high-dimensional data since the non-sparse
coefficients matrix grows quadratically with the number of series included in the model. In order to overcome this limitation,
in Hsu et al.,
28
it is proposed the combination of LASSO and VAR frameworks, which is further explored in this paper for
very short-term forecasting of wind power.
2.1. Formulation of the forecasting problem
The VAR model allows us to model the joint dynamic behavior of a collection of WPPs by capturing the linear interde-
pendencies between its time series. In this multivariate (or spatiotemporal) framework, the future trajectory of output from
each WPP in the model is based on its own past values (lagged values) and the past values of the other WPPs included in
the model.
Suppose y
i,t
is the time series containing the average power measured at WPP i and time interval t. Using an autore-
gressive (AR) process of order p (ARŒp) it is possible to describe a future trajectory based on its past observations
as
y
i,t
D C
p
X
lD1
ˇ
.l/
y
i,tl
C
t
(1)
where ˇ
.1/
, :::, ˇ
.p/
are the model coefficients, is a constant (or intercept) term, p is the order of the AR model, and
t
is a contemporaneous white noise (or residuals) with zero mean and constant variance
2
.
Let fY
t
gDf.y
1,t
, y
2,t
, :::, y
k,t
/
0
g, denote a k-dimensional vector time series. Modeling it as a vector AR process of order
p (VAR
k
Œp), we obtain an expression relating the future observations at each of the k WPPs to the past observations of all
WPPs in the model, given by
Y
t
D C
p
X
lD1
B
.l/
Y
tl
C e
t
(2)
in which is a vector of constant terms, each B
.l/
2 R
kk
represents a coefficient matrix related to the lag l and e
t
.0,
e
/
denotes a white noise disturbance term.
In order to obtain a compact matrix notation, let Y D .Y
1
, Y
2
, :::, Y
T
/ define the k T response matrix, B D
B
.1/
, B
.2/
, :::, B
.p/
the k kp matrix of coefficients, Z D .Z
1
, Z
2
, :::, Z
T
/ the kp T matrix of explanatory (or pre-
dictors) variables, in which Z
t
D .Y
0
t1
, Y
0
t2
, :::, Y
0
tp
/ and E D .e
1
, e
2
, :::, e
T
/ the k T error matrix. To simplify the
notation, consider m D kp. Then it is possible to express (2) as
Y D 1
0
C BZ C E (3)
with 1 denoting a T 1 vector of ones.
The matrix of unknown coefficients needs to be correctly estimated to obtain the model that ‘best’ characterizes the
data. Commonly, this is achieved using the least squares statistical methodology by choosing the coefficients that minimize
the sum of squared errors. The predictor that will be deduced gives, for a given sample, the in-sample forecasts of the
variable of interest.
Wind Energ.
2017; 20:657–675 © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/we
659

LASSO vector autoregression structures for very short-term wind power forecasting L. Cavalcante
et al.
Usually this methodology is applied with centered variables instead of the original ones. This allows simplifications in
the calculation, including the model handling without intercept term. The intercept can be easily estimated after the model
has been fitted. As a result, and assuming centered variables Y and Z, will no longer appear in the least squares objective
function.
The multi-period forecasts can be generated with two alternative strategies, iterative or direct approach.
29
In this paper,
a direct approach, in which a specific model is created for each lead time, is adopted to generate 6 h-ahead wind power
forecasts.
2.2. Spar se structures with LASSO
This section presents a set of different sparse structures for the LASSO-VAR model, inspired by Nicholson et al.,
30
to
capture the dynamics of the underlying system.
The LASSO framework is powerful and convenient to use when handling high-dimensional data. The loss function is
a regularized version of least squares that introduces an L
1
penalty on the coefficients. The penalty function shrink some
of the coefficients to zero, performing variable selection and producing a sparse solution. Instead of assuming that all the
predictors are contributing to the model, this framework extracts the most important predictors, i.e., those with the s trongest
contribution to the prediction of the target variable.
Let k.k
r
represents both vector and matrix L
r
norms. The standard LASSO-VAR (sLV) loss function is expressed as
30
1
2
kY BZk
2
2
C kBk
1
(4)
where >0 is a scalar regularization (or penalty) parameter controlling the amount of shrinkage.
The L
1
penalty works as a sparsity-inducing term over the individual entries of the coefficient matrix B, zeroing some
of them in a element-wise manner.
Since the same predictors are available for each target variable (each WPP), the VAR coefficients can be estimated with
ordinary least squares applied independently for the regression of each individual target variable.
31
The problem is then
re-formulated for each row of the matrix Y, with a different penalization parameter for each, resulting in a separable loss
function for each variable.
The main advantage of this approach, here called row LASSO-VAR (rLV), is the possibility of distributed computing,
since each equation can be solved in parallel. Its loss function can be expressed as
1
2
kY
i
B
i
Zk
2
2
C kB
i
k
1
(5)
where Y
i
and B
i
, i D 1, :::, k, correspond to the ith rows of the Y and B matrices, respectively.
An alternative to deal with model’s coefficients individually, which results in an unstructured sparsity pattern, is to make
some simple modifications to the sLV penalty in order to capture different sparsity patterns accordingly to the inherent
structure of the VAR.
30
These modifications produce more interpretable models that offer great flexibility in the detection
of the true underlying dynamics of the system, which is especially fruitful in the high-dimensional context.
To take into account characteristics such as lag selection, within-group sparsity, delineation between a component’s own
lags and those of another component and evaluate which variables add forecast improvement, the following LASSO-VAR
sparse structures are explored: lag-group LASSO-VAR (lLV), lag-sparse-group LASSO-VAR (lsLV), own/other-group
LASSO-VAR (ooLV) and causality-group LASSO-VAR (cLV). These LASSO schemes look through the sparsity in distinct
group structures trying to find the ideal sparsity pattern.
The lLV model considers the coefficients grouped by their time lags and looks for time lags that add forecast
improvement. Its objective function is
1
2
kY BZk
2
2
C
p
X
lD1
kB
.l/
k
2
(6)
where each B
.l/
is a sub-matrix containing the lag l coefficients.
This structure can be relevant if the interest is to perform lag selection. However, although it is advantageous when all
time series tend to exhibit similar dynamics, it might be too restrictive for certain applications since all the coefficients of
some lags are not considered in the prediction, and sometimes inefficient by including the entire lag if only few coefficients
are significant.
In an attempt to overcome some of these limitations, the lsLV model adds within-group (or lag) sparsity to the lLV
through the loss function
1
2k
kY BZk
2
2
C .1 ˛/
p
X
lDl
kB
.l/
k
2
C ˛kBk
1
(7)
Wind Energ.
2017; 20:657–675 © 2016 John Wiley & Sons, Ltd.
660
DOI: 10.1002/we

L. Cavalcante
et al.
LASSO vector autoregression structures for very short-term wind power forecasting
Figure 2. Example of sparsity patterns produced by LASSO-VAR structures.
where 0 ˛ 1 is a parameter regulating the trade-off between the group and within-group importance.
As can be easily seen, the lLV and the sLV are obtained considering ˛ D 0and˛ D 1, respectively. Here, as proposed
in Nicholson et al.,
30
the wihin-group sparsity is estimated based on the number of time series/variables, and set as ˛ D
1=.k C 1/. In this sense, as the number of variables increases, the greater the group-wise sparsity and smaller the sparsity
within-group. This variant allows to explore the significance of each lag and, at the same time, access the importance of
each coefficient within each lag.
The ooLV model concerns with the possibility that, in many settings, the prediction of a variable is more influenced
by their own past observations than by past observations of other variables. To address this question in a lag context, the
coefficients of each B
l
are grouped by the diagonal entries representing variable’s own lags, and by off-diagonal entries
representing cross dependencies with other variables, using the loss function
1
2
kY BZk
2
2
C
p
k
p
X
lDl
diag
B
.l/
2
C
p
k.k 1/
p
X
lDl
kB
.l/
k
2
(8)
where B
.l/
DfŒB
.l/
ij
: i ¤ jg. Since the groups differ in cardinality, it is necessary to weight the penalty accordingly to
avoid favoring the larger groups of off-diagonal entries.
If all time series do not share the same dynamics, one may be interested in finding which of them do. Recent studies
have been addressing these question considering causal structures in multivariate series, also called Granger causality. The
idea is that a time series y
i
is Granger-caused by other time series y
j
if knowing the past values of y
j
helps to improve the
prediction of y
i
.
32
With the intention of learn a causal inference from the data, the cLV model
33
groups the coefficients by the corresponding
variables (that they affect). Its loss function is
1
2
kY BZk
2
2
C
X
i¤j
B
.1/
ij
B
.2/
ij
:::
B
.p/
ij
2
(9)
The L
2
norm of ptuple of
B
.l/
ij
is a composite penalty that will force all p matrices B
.l/
s to share the same sparsity
pattern, as can be observed in Figure 2. This structure can be useful to detect which locations can promote the forecasts at
some location.
For a better understanding of the presented LASSO-VAR variants, Figure 2 illustrates an example of corresponding
generated sparsity patterns.
Wind Energ.
2017; 20:657–675 © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/we
661

Citations
More filters
Journal ArticleDOI

Improving Renewable Energy Forecasting With a Grid of Numerical Weather Predictions

TL;DR: A forecasting framework to explore information from a grid of numerical weather predictions (NWP) applied to both wind and solar energy is described, which combines the gradient boosting trees algorithm with feature engineering techniques that extract the maximum information from the NWP grid.
Repository

Forecasting: theory and practice

Fotios Petropoulos, +84 more
- 04 Dec 2020 - 
TL;DR: A non-systematic review of the theory and the practice of forecasting, offering a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts.
Journal ArticleDOI

Forecasting: theory and practice

TL;DR: In this paper , the authors provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organize, and evaluate forecasts.
Journal ArticleDOI

Correlation-Constrained and Sparsity-Controlled Vector Autoregressive Model for Spatio-Temporal Wind Power Forecasting

TL;DR: A sparsity-controlled vector autoregressive (SC-VAR) model is introduced to obtain sparse model structures in a spatio-temporal wind power forecasting framework by reformulating the original VAR model into a constrained mixed integer nonlinear programming (MINLP) problem.
Journal ArticleDOI

The future of forecasting for renewable energy

TL;DR: A brief overview of the state‐of‐the‐art of forecasting wind and solar energy is presented, and approaches in statistical and physical modeling for time scales from minutes to days ahead are described, for both deterministic and probabilistic forecasting.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Posted Content

Comparing Predictive Accuracy

TL;DR: The authors describes the advantages of these studies and suggests how they can be improved and also provides aids in judging the validity of inferences they draw, such as multiple treatment and comparison groups and multiple pre- or post-intervention observations.
ReportDOI

Comparing Predictive Accuracy

TL;DR: In this article, explicit tests of the null hypothesis of no difference in the accuracy of two competing forecasts are proposed and evaluated, and asymptotic and exact finite-sample tests are proposed, evaluated and illustrated.
BookDOI

New Introduction to Multiple Time Series Analysis

TL;DR: This reference work and graduate level textbook considers a wide range of models and methods for analyzing and forecasting multiple time series, which include vector autoregressive, cointegrated, vector Autoregressive moving average, multivariate ARCH and periodic processes as well as dynamic simultaneous equations and state space models.
Related Papers (5)