Improving Precipitation Estimation Using Convolutional Neural Network
Summary (4 min read)
1. Introduction
- The modeling of the atmosphere is typically based on a particular set of partial differential equations, which is derived by applying the conservation laws and thermodynamic laws on the continuous “control volume” of the atmosphere (Bjerknes, 1906; Holton & Hakim, 2012).
- Precipitation estimation involves explicit and implicit representations of the cloud physics, such as the water vapor convection, phase change, and particle coalescence.
- In numerical models, such unresolved processes are inferred from the resolved dynamics on the computational grid (Kalnay, 2003).
- Accordingly, the model input/output, resolution, usage, and complexity of parameterization schemes and SD are different.
- The model is described and tested thereafter.
2.1. Statistical Downscaling
- Following the survey in Maraun et al. (2010), SD approaches are classified into perfect prognosis (PP), model output statistics (MOS), and weather generators.
- The simplest form is linear regression, which estimates precipitation using an optimized linear combination of the local circulation features (Hannachi et al., 2007; Jeong et al., 2012; Li & Smith, 2009; Murphy, 2000).
- The predictors usually consist of PAN ET AL.
- 2302 the raw variables or the leading principal components (PCs) of the moisture, pressure, and wind field (Wilby & Wigley, 2000).
- A typical application of MOS is to correct the biases of the numerical model's raw precipitation estimates (Jakob Themeßl et al., 2011).
2.2. DNNs and Their Applications for Physical Processes
- DNNs belong to the domain of ML, which covers a general scope of computer-aided statistical modeling.
- On the other hand, DNNs, together with a broader family of representation learning approaches (Bengio et al., 2013), offer an “end-to-end” modeling workflow:.
- The feature extraction process is integrated into the modeling process, which allows the model to learn customized features rather than subject to the pre-engineered features.
- For the modeling of the natural physical processes, where the authors have established principled solutions through analytic descriptions of the scientist's prior knowledge of the underlying processes (de Bezenac et al., 2017), dynamical simulations are preferred to ML-based approaches.
- While many recent research works have started to explore the applicability of DNN for parameterizing the unresolved processes in fluid and geofluid modeling (Ling et al., 2016; Rasp et al., 2018), it remains a question how DNN can translate the big data of observations and numerical simulations into precipitation estimation improvements (Pan et al., 2017).
3. Problem Formulation
- To formulate the precipitation estimation problem, the authors first clarify the context by introducing a real-world precipitation scenario.
- The well-established models offer accessible concepts for describing the circulation-precipitation connection.
- Dynamically, the precipitation process in Figure 1 is associated with the extratropical cyclone.
- (1) In equation (1), E denotes the expected value, P denotes the precipitation estimates, X denotes the predictors, and C denotes the local climate condition.
- The authors point out two common deficiencies when applying this conventional approach for weather-scale precipitation estimation.
4.1. Convolutional Neural Network
- CNNs share many similarities with regular neural networks.
- For a regular neural network, a statistical connection between the inputs and the outputs is constructed through hierarchical connected layers of neurons.
- Each convolution operation is performed by computing the element-wise dot product between the tensor and different patches of the input, which is represented as a c × x × y array.
- Following previous works (Wilby & Wigley, 2000), the predictors consist of the circulation constraint and moisture constraint.
- To estimate the total daily precipitation, the authors usually have several snapshots of its surrounding dynamical field at different hours through the day.
4.2.1. Regularization
- DNNs usually have much more complicated structures and more parameters than conventional ML algorithms, which make it possible for models to perform exceptionally well on the training data but predict the test data poorly.
- Regularization refers to the strategies to avoid overfitting and make the model generalize better to unseen data.
- The idea of dropout is to assign a probability of existence to the neurons and their associated connections.
- This prevents neurons from coadapting and has shown significant improvements in reducing overfitting (Srivastava et al., 2014).
- Batchnormalization addresses the problem of internal covariate shift in training DNNs.
4.2.2. Loss Function and Skill Metrics
- The root mean square error (RMSE) between the precipitation simulations and observations is used as loss function here: RMSE = √ 1 n ∑ (Pobser − Psimu)2. (2) Here Pobser denotes the observed daily precipitation records, and Psimu denotes the simulated daily precipitation records.
- The Pearson correlation coefficient (r) between simulated and observed daily precipitation is also used as supplementary skill metric for measuring model performance: r = cov(Pobser,Psimu) 𝜎Pobser𝜎Psimu .
- (3) Here cov denotes covariance, and 𝜎 denotes standard deviation.
- The method requires estimating the partial derivative of the loss function with respect to each parameter in the network, including those from both the convolutional and dense layers.
- The parameters are then adjusted along the gradient descent direction by a predefined stride, which is named “learning rate.”.
4.3. Model Implementation
- The authors implement the network using the Wolfram Mathematica V11.3 Deep Learning Platform (Wolfram, 2018).
- The authors use the Nvidia Quadro P5000 GPU (Graphics Processing Unit) to accelerate model training.
5.1. Data
- The predictors used for building the network models are the GPH and PW field data from the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (NARR) data set (Mesinger et al., 2006).
- The data set is generated by regional downscaling of the NCEP Global Reanalysis for the North America region, using the NCEP Eta Model and the 3-D Variational Data Assimilation System.
- The data set covers 1979 to near present and is provided every 3 hr, with spatial resolution of 32 km/45 vertical layers.
- Besides the pressure and moisture data, the precipitation product from the NARR is used as baseline here.
- It poses a high challenge for the DNN model to provide comparable precipitation estimates.
5.2. Experiments Design
- To test the applicability of the model for different climate conditions, the authors selected 14 sample grids that roughly cover the characteristic climate divisions of the contiguous United States.
- Here𝜇 and 𝜎 are scalar values that are calculated based on the flattened circulation field for the entire data set.
- The training and validation sets are used to calibrate the model parameters and prevent overfitting.
- The network simulation results are evaluated against the CPC precipitation records, using skill metrics of RMSE and r.
6. Results
- The CNN estimated precipitation (PCNN) and the NARR estimated precipitation are compared against the CPC precipitation records.
- Without a careful tuning of hyperparameters, the CNN models perform relatively well compared to the NARR precipitation product.
- As indicated by the two skill scores, PCNN outperforms PNARR for most sample points from the west and east coast, where precipitation is more copious than the other areas.
- Different implementations of CNN show similar skills.
- The overfitting may due to the fact that for PAN ET AL.
7.1. Network Architecture
- The results above are achieved using a same default network architecture as presented in Figure 4.
- Here the authors focus on two dominant configurations in CNN design, namely, the receptive field and the network depth.
- For processing convenience, the authors use a single geogrid to carry out the experiments.
- The authors maintain the two other network configurations the same as the default setting.
- The above experiments verified that an explicit encoding of local spatial circulation structures enhances the estimation of precipitation.
7.1.2. Network Depth: Shallow Or Deep?
- The network depth can be roughly represented as how many layers there are in the neuron network.
- These layers learn representations of the data with multiple levels of abstraction (LeCun et al., 2015).
- The shallower CNN model is constructed by removing the latter two convolutional layers and the last pooling layer from the default network in Figure 4.
- Compared to the deeper network models, the model with single convolutional layer achieves significant lower skill scores in estimating precipitation.
- The model with 5 convolutional layers achieves optimal performance for the training and test set.
7.2. Model Interpretations
- The network models applied here involve much more complicated structures and more parameters compared to the existing SD approaches.
- In response to this requirement, many approaches for understanding CNNs have been developed in recent years (Erhan et al., 2009; Simonyan et al., 2013; Zeiler & Fergus, 2014).
- Zeiler and Fergus (2014) offered an excellent example in illustrating how layer activation can be used for interpreting and diagnosing CNNs.
- Similar distinctions within same channel for two events can be depicted in Conv 2.
7.2.2. Perturbation Sensitivity
- For image classification problems, the occlusion sensitivity analysis tells the impact of different portions of the image on the classification result.
- The rescaling matrix is multiplied to different portions of the input.
- The relation between perturbation location and model output change is visualized in Figure 7.
- This is the area where the target geogrid point lies.
- The surrounding dynamics also provide important context for inferring precipitation.
7.3. Comparison Experiments
- Previous sections have compared the CNN precipitation estimates with (1) NARR precipitation product and (2) precipitation estimates using fully connected DNN.
- For each of the model, the authors adopt same input variables as for CNN, with optional feature extractions before feeding the input to the model.
- The best performance in the comparison experiments is achieved by the linear regression model using input of the leading 16 PCs of the circulation field (r = 0.81, RMSE = 6.98).
- The skill can be further improved if the authors apply the convolution and pooling PAN ET AL.
- To sum up, the comparison experiments empirically suggest that CNN is competitive in making precipitation estimations based on the resolved surrounding atmospheric dynamics.
8. Conclusion
- Precipitation estimation provides fundamental information to better understand the land-atmosphere water budget, improve water resources management, and aid in preparation for increasingly extreme hydrometeorological events.
- The authors introduce the CNN model to overcome these two deficiencies in improving precipitation estimation.
- The authors focus on a single geogrid point to examine the influence of the network architecture on model performance.
- By varying the network depth, the authors found that deep networks generally have better performance compared to shallow networks.
- The performance improvement provides important implications for improving precipitation-related parameterization schemes using a data-driven approach.
Did you find this useful? Give us your feedback
Citations
185 citations
110 citations
Cites methods from "Improving Precipitation Estimation ..."
...Pan et al. (2019) introduced the CNN model to predict daily precipitation and tested it at 14 sites across the contiguous United States and proved that, if provided with sufficient data, the precipitation estimates from the CNNmethod are better than the reanalysis precipitation products and…...
[...]
104 citations
91 citations
78 citations
References
[...]
79,257 citations
"Improving Precipitation Estimation ..." refers methods in this paper
...Details of the models and feature extraction are given in the supporting information (Breiman, 2001; Louppe, 2014; Wolfram, 2018; Pearson, 1901; Shlens, 2014)....
[...]
[...]
46,982 citations
"Improving Precipitation Estimation ..." refers background in this paper
...Through down-sampling, the higher layer convolutions work on extracted local features, which enables learning higher-level abstractions on the expanded receptive field (LeCun et al., 2015)....
[...]
...…features through building multiple levels of representation of the data, which are achieved by composing simple but nonlinear modules (named as neurons) that each transform the representation at one level into a representation at a higher, slightly more abstract level (LeCun et al., 2015)....
[...]
...A comprehensive review is beyond the scope of this work and can be found in LeCun et al. (2015), Schmidhuber (2015), and Goodfellow et al. (2016)....
[...]
...These layers learn representations of the data with multiple levels of abstraction (LeCun et al., 2015)....
[...]
...In a canonical ML modeling process, the raw form data, which quantify certain attributes of the study object, should be transformed into a suitable feature vector before being effectively processed for the learning objective (Goodfellow et al., 2016; LeCun et al., 2015)....
[...]
42,067 citations
40,257 citations
[...]
38,208 citations
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the future works mentioned in the paper "Improving precipitation estimation using convolutional neural network" ?
In the following studies, the authors plan to make more comprehensive examination on the impact of different information processing unions in the network. Also, the authors wish to explore novel network architectures and advanced regularization approaches to support more accurate and high-resolution precipitation estimation.
Q3. How do the authors disintegrate the impact of the cyclone?
to disintegrate the impact of the cyclone geometric shape and position, the authors adopt the convolution mechanism in the network modeling.
Q4. How many PCs are used in the comparison experiments?
The best performance in the comparison experiments is achieved by the linear regression model using input of the leading 16 PCs of the circulation field (r = 0.81, RMSE = 6.98).
Q5. What are the two modules used to improve the performance of the model?
The authors include the dropout (Srivastava et al., 2014) and batchnormalization (Ioffe & Szegedy, 2015) modules to enhance the model's performance.
Q6. How is the CNN model used to extract the salient features from the resolved dynamical field?
the kernels that are used to extract the salient features from the resolved dynamical field are optimized by backpropagating the precipitation estimation error through the convolutional layers.
Q7. What is the main reason for using a data-driven model?
The computational demanding components in numerical simulations can be replaced by data-driven model counterparts to accelerate the simulation without significant loss of accuracy.
Q8. How many PCs are used in the simulations?
The authors carry out simulations using input composed of the leading 2, 8, 16, 64, and 256 PCs of the circulation field data, as well as simulations using the raw circulation field data.
Q9. What is the way to guarantee the model's robustness?
To guarantee model's robustness with respect to parameter initialization, the authors carry out several implementations with different parameter initializations.
Q10. How many sample grids are used to test the applicability of the model for different climate?
2308To test the applicability of the model for different climate conditions, the authors selected 14 sample grids that roughly cover the characteristic climate divisions of the contiguous United States.
Q11. How many convolutional layers are included in the default network?
The kernel size of the included convolutional layers is set to 20 × c × 4 × 4, where c is the channel number of the previous layer.
Q12. What are the predictors used for building the network models?
The predictors used for building the network models are the GPH and PW field data from the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (NARR) data set (Mesinger et al., 2006).
Q13. How do the authors determine the accuracy of the CNN model?
By varying the receptive field of the convolutional layers, the authors verify that the CNN model outperforms conventional fully connected ANN SD in estimating precipitation through explicit encoding of local spatial circulation structures.
Q14. What is the reason why the CNN model shows worse performance?
For the middle part of the continent, the CNN model shows slightly worse performance, which can be attributed to model overfitting when there are limited precipitation samples for training the model.
Q15. How deep is the CNN model constructed?
The deeper CNN models are constructed by adding two/four extra convolutional layers before the first pooling layer for the default network architecture in Figure 4.
Q16. How does the DNN improve the capacity of the model to process high-dimensional data?
This is achieved by utilizing the inner structure of the data to reduce the model structural redundancy and foster effective information extraction.