# A large sample analysis of seasonal river flow correlation and its physical drivers

## Summary (6 min read)

### 1 Introduction

- Recent analyses for the Po River and the Danube River highlighted that catchments may exhibit significant correlation between peak river flows and average flows in the previous months (Aguilar et al., 2017).
- The presence of long-term persistence in streamflow has been known for a long time, since the pioneering works of Hurst (1951), and has been actively studied ever since (e.g. Koutsoyiannis, 2011; Montanari, 2012; O’Connell et al., 2016 and references therein).
- While a number of seasonal flow forecasting methods have been explored in the literature (e.g. Bierkens and van Beek, 2009; Dijk et al., 2013), attempts to explicitly exploit streamflow persistence in seasonal forecasting through information from past flows have been, in general, limited.
- These questions are relevant for gaining a better comprehension of catchment dynamics and planning mitigation strategies for natural hazards.

### 2 Methodology

- The above steps are described in detail in the following sections.

### 2.1 Season identification

- Season identification is performed algorithmically to identify the high-flow season (HFS) and low-flow season (LFS) for each river time series.
- In all other cases, the method allows for the search of a second peak month and the identification of a minor HFS, but the authors do not further elaborate on this analysis here, because they are only interested in the most extreme seasons for the purpose of predicting high and low flows.
- The method proposed by Lee et al. (2015) has several advantages that make it suitable for the purpose of this research.
- Most importantly, it is capable of handling conditions of bimodality, which is usually a major issue for traditional methods, e.g. directional statistics (Cunderlik et al., 2004).
- The LFS is herein identified as the 1-month period with the lowest amount of mean monthly flow.

### 2.2.1 Correlation analysis

- In the case of HFS, a correlation is sought between the maximum daily flow occurring in the HFS period and the mean flow in the previous months, before the onset of HFS.
- For LFS, correlation is computed between the mean flow in the LFS itself and the mean flow in the previous months.
- The authors use the mean flow in the previous month as a robust proxy of “storage” in the catchment that is expected to reflect the state of the catchment, i.e. wetter or drier than usual.

### 2.2.2 Analysis of physical drivers

- Catchment, geological, and climatic descriptors.
- As catchment descriptors, the authors consider the basin area (A), the baseflow index (BI), the mean specific runoff (SR), the percentage of basin area covered by lakes (percentage of lakes – PL) and glaciers (percentage of glaciers – PG), and altitude as candidates for explanatory variables for streamflow correlation.
- BI is calculated from the daily flow series of the rivers following the hydrograph separation procedure detailed in Gustard et al. (2008).
- Corresponding gridded data are retrieved from the WorldClim database (http://www.worldclim.org/, last access: 20 March 2017) at a spatial resolution of 10 arcminutes (approximately 18.55 km).
- These properties can be efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for the first two principal components along with the relative position of the p variables as vectors in the two-dimensional space.

### 2.3 Technical experiment: real-time updating of the frequency distribution of high and low flows

- In order to evaluate the usefulness of the information provided by the 1-month-lag seasonal correlation for flow signatures in HFS and LFS, the authors perform a real-time updating of the frequency distribution of high and low flows based on the average river flow in the previous month.
- A similar analysis for the high flows was carried out by Aguilar et al. (2017) for the Po and Danube Rivers.
- In detail, a bi-variate meta-Gaussian probability distribution (Kelly and Krzysztofowicz, 1997; Montanari and Brath, 2004) is fitted between the observed flow signatures, i.e. peak flow in the HFS, QP, average flow in the LFS, QL, and the average flow in the pre-HFS and LFS months,Qm.
- The authors define the generic random variable NQfs to represent any dependent flow signature, i.e.; NQP and NQL in their case.
- Ρ(NQm,NQfs)NQm(t −h), (3) σ(NQfs(t))= (1− ρ 2(NQm,NQfs)) 0.5. (4) To derive the probability distribution of Qfs(t) conditioned to the observed Qm(t −h), the authors first apply the inverse NQT, i.e. they use linear segments to connect the points of the previous discrete quantile mapping of the original quantiles into the Gaussian domain, and accordingly, obtain Qfs(t) for any NQfs(t).

### 3 Data and catchment description

- The dataset includes 224 records spanning more than 50 years of daily river flow observations from gauging stations, mostly from non-regulated streams.
- The rest of the Swedish catchments are impacted by a Dfc climatic type, i.e. a snow climate, fully humid with cool summers.
- 1. A summary of the river basins under study, in terms of the selected descriptors, is also provided in Table 1, showing that the investigated rivers cover a wide range of catchment area sizes, flow regimes, and climatic conditions.
- Given that detailed information is generally lacking on the impact of regulation (Kuentz et al. 2017), the authors assume stationarity of the river flows for all the catchments herein considered and, additionally, assume that river management does not significantly affect the identification of the physical drivers.
- Hydrol.
- Earth Syst. Sci., 23, 73–91, 2019.

### 4.1 Season identification

- Approximately half of the 224 rivers are characterized by at least one high-flow season with medium or higher significance (PAMF of HFS≥ 60 %).
- Bimodality regimes are found with low and moderate significance in rivers located mostly in Austria and Sweden, but the authors focus here on the major high-flow season, as they are interested in the most extreme events.
- A minor HFS analysis would be perhaps relevant in other regions of the world where bimodal flood regimes are more prominent, as suggested by the analysis of Lee et al. (2015).
- Regarding the LFS identification, the two considered approaches (see Sect. 2.1) agree for 139 out of 224 stations, but the first method, i.e. the 1-month period with the lowest amount of mean monthly flow, is selected as being more relevant to the purpose of computing mean flow correlations.
- Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/.

### 4.2 Seasonal correlation

- LFS correlation is markedly higher than the corresponding HFS correlation for lags 1–6, and its median remains higher than 0 for more lags (see Fig. 2).
- For the case of HFS correlation, the authors focus only on the most significant first lag, for which 73 rivers are found to have correlation significantly higher than 0 at a 5 % significance level.
- In Fig. 3, the autocorrelation of the whole monthly series is compared to the LFS correlation for lag of 1 and 2 months, in order to prove that the seasonal correlation for LFS is significantly higher than its counterpart computed by considering the whole year.
- Figure 4 shows the spatial pattern of HFS and LFS streamflow correlations.
- It is interesting to notice the emergence of spatial clustering in the correlation magnitude, which implies its dependence on different spatially varying physical mechanisms.

### 5 Physical interpretation of correlation

- To attribute the detected correlations to physical drivers, the authors define six groups of potential drivers of seasonal correlation magnitude: basin size, flow indices, the presence of lakes and glaciers, catchment elevation, catchment geology, and hydroclimatic forcing.
- For some of the descriptors the information is only available for a few countries.
- In what follows, the authors will use the term “positive impact on correlation” to imply that an increasing value of the considered descriptor is associated with increasing correlation.
- For each descriptor, the authors also report, between parentheses, the Spearman’s rank correlation coefficient rs (Spearman, 1904) between its value and the considered (LFS or HFS) correlation and the p value of the null hypothesis rs = 0.
- Www.hydrol-earth-syst-sci.net/23/73/2019/.

### 5.1 Catchment area – descriptor A

- Figure 5 shows that there is only a weak positive impact of the catchment area (log transformed) on correlation for HFS (rs = 0.17, p = 0.01) but a more significant positive one for LFS (rs = 0.27, p = 5.5× 10−5).

### 5.2 Flow indices – descriptors BI and SR

- For SR (Fig. 6b), it appears that both LFS and HFS streamflow correlations drop for increasing wetness (rs =−0.4,p = 4× 10−10, and rs =−0.28,p = 2.8× 10−5 respectively).

### 5.3 Presence of lakes and glaciers – descriptors PL and

- PG Detailed information on the presence of lakes is available for the 69 Swedish catchments, while the areal extension of glaciers is known for the 108 Austrian catchments.
- Figure S1 in the Supplement shows that the impact of lake area (Fig. S1a) on correlation for LFS and HFS is not significant but positive (rs = 0.10,p = 0.399, and rs = 0.12,p = 0.347).
- The results for glaciers show a positive impact for LFS (rs = 0.28,p = 0.081) but a negative impact for HFS (rs =−0.34,p = 0.032).
- Thus the observed result for LFS more likely portrays the impact of low temperature (low evapotranspiration) and snow accumulation, the latter generally being a slowly varying process.
- For HFS, which typically occurs in the summer months for the considered catchments, flows are mainly determined by snowmelt, which is associated to reduced persistence (Fig. S1b).

### 5.4 Catchment elevation

- The areal coverage of the SRTM data is limited to 60◦ N and 54◦ S, therefore data for the northern part of the Swedish catchments are not available.
- The rest of the rivers are divided in three regions based on proximity: Region I, including the central and eastern part of the Alps and encompassing Austrian, Slovenian, and Italian catchments; Region II, including the western part of the Alps and encompassing French and Spanish territory; and Region III, including the southern part of Sweden.
- For HFS correlation there is not a prevailing pattern.
- Figure 8 confirms that there is a positive correlation pattern emerging with elevation for LFS.
- Based on local climatological information, it can be concluded that the spatial pattern for LFS correlation is reflective of the timing and strength of seasonality of the low flows in Austria, where dry months occur in lowlands during the summer due to increased evapotranspiration and in the mountains during winter (mostly February) due to snow accumulation which is characterized by stronger seasonality compared to the lowlands flow regime (Parajka et al., 2016; see Fig. 1).

### 5.5 Catchment geology – descriptors PK and PF

- Two different geological behaviours are identified which may impact river correlation.
- Figure 9 shows box plots of the estimated lag-1 correlation coefficient for both HFS and LFS against rivers where PK < 50 %.
- It is clear that there is a significant decrease in correlation where karstic areas dominate for both for HFS and LFS.
- In a second analysis, the authors focus on Austrian catchments and investigate the relationship between correlation and percentage of flysch coverage, PF.
- Figure S2 shows that there is not a prevailing pattern in either case (rs = 0.13,p = 0.6 for LFS, and rs =−0.19,p = 0.446 for HFS).

### 5.6 Atmospheric forcing – descriptors P and T

- Figure 10 shows the lag-1 HFS and LFS correlations against estimates of the annual precipitation P and annual mean temperature T as well as the IDM.
- LFS correlation appears to be more sensitive than HFS to the above climatic indices, www.hydrol-earth-syst-sci.net/23/73/2019/.
- Hydrol. Earth Syst. Sci., 23, 73–91, 2019 showing a decrease with increasing temperature and also a decrease with increasing precipitation (rs =−0.44,p = 3.1× 10−12 for P , and rs =−0.57,p = 1.8× 10−20 for T ).
- The IDM (Fig. 10c) shows a mild decrease of both LFS (rs =−0.06,p = 0.368) and HFS correlation with increas- ing IDM (rs =−0.17,p = 0.01), while for the latter there seems to be a clearer trend (lower correlation with higher IDM) in very humid areas (dark blue points in Fig. 10c).

### 5.7 Physical drivers of high correlation

- To gain further insight into the results the authors select the 20 catchments with the highest streamflow seasonal correlation coef- Hydrol.
- Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/ ficients for both HFS and LFS periods in order to investigate their physical characteristics in relation to the remaining set of rivers.
- Table 2 summarizes statistics for selected descriptors in order to identify dominant behaviours.
- More robust considerations can be drawn for the LFS; higher seasonal correlation is found for larger catchments with a higher baseflow index and lower specific runoff, precipitation, and wetness.
- The presence of lakes plays a significant role, both for lag-1 and lag-2 correlations, with the latter also being significantly influenced by the presence of glaciers.

### 6 Principal component analysis of the predictors and linear regression

- To avoid the impact of multicollinearity in the regression while additionally summarizing river information, the authors apply PCA (see Sect. 2.2).
- The authors avoid including highly correlated variables in the analysis.
- Earth Syst. Sci., 23, 73–91, 2019 transformation is applied to the basin area to reduce the impact of outliers.
- Slovenian rivers cluster towards the direction of increasing SR and T , whereas Swedish rivers cluster towards the opposite direction of increasing BI and decreasing T .
- The coefficients for the first three PCs are found significantly different from zero at a 0.1 % significance level and are included in the regression (see Table 4).

### 7 Real-time updating of the frequency distribution of high and low flows for the Oise River

- The authors apply the technical experiment (see Sect. 2.3) for high and low flows to the Oise River in France and assess the difference in the estimated flood and low-flow magnitudes.
- The Oise River (55 years of daily flow values) at Sempigny in France has a basin area of 4320 km2, and its gauging station at Sempigny is part of the French national realtime monitoring system (https://www.vigicrues.gouv.fr/, last access: 23 July 2018), which is in place to monitor and forecast floods in the main French rivers.
- The selected river has a high technical relevance, since it experiences both types of extremes with large impacts.
- It is characterized by HFS correlation ρ = 0.54, which is the third largest lag-1 correlation for the HFS in their dataset, and LFS correlation ρ = 0.80, which stands for the 70 % quantile of the sample lag-1 correlation for LFS.
- Figure 13c, d shows the conditioned and unconditioned probability distributions of peak and low flows in the Gaussian domain.

### 8 Discussion

- The methodology presented herein aims to progress their physical understanding of seasonal river flow persistence for the sake of exploiting the related information to improve probabilistic prediction of high and low flows.
- The correlation of average flow in the previous months with the LFS flow and HFS peak flow was found to be relevant, with the former prevailing over the latter.
- It was postulated that this is due to wet catchments showing increased short-term variability compared to drier catchments (Szolgayova et al., 2014) and having a faster response to rainfall due to saturated soil.
- Yet, these studies refer to generally humid regions and cannot be extrapolated to more arid climates.
- Yet the mountainous, glacierdominated rivers still show increased LFS correlation compared to rivers in the lowlands, which agrees well with other studies that have found less uncertainty in the rainfall–runoff modelling in this regime owing to the greater seasonality of the runoff process and the decreased impact of rainfall compared to the rainfall-dominated regime of the lowlands (e.g. Parajka et al., 2016).

### 9 Conclusions and outlook

- This research investigates the presence of persistence in river flow at the seasonal scale, the associated physical drivers, and the prospect for employing the related information to improve probabilistic prediction of high and low flows by exploring a large sample of European rivers.
- – Storage mechanisms, groundwater-dominated basins, and slower catchment response time, as reflected by www.hydrol-earth-syst-sci.net/23/73/2019/.
- Indeed, the presence of river memory at the seasonal scale represents a possible opportunity to improve the prediction of water-related natural hazards by reducing uncertainty of associated estimates and allowing significant lag time for decision-making and hazard prevention.
- Nejc Bezak gratefully acknowledges funding by the Slovenian Research Agency (grants J2-7322 and P2-0180).
- Edited by: Louise Slater Reviewed by: three anonymous referees.

Did you find this useful? Give us your feedback

##### Citations

^{1}, Berit Arheimer

^{1}, Yeshewatesfa Hundecha

^{1}, Thorsten Wagener

^{2}•Institutions (2)

62 citations

1 citations

### Cites methods from "A large sample analysis of seasonal..."

...3.7 Boxplots Box plots as an exploratory data analysis are widely used to describe data and identify any outliers (McGill et al. 1978; Aihua et al. 2016; Ouarda et al. 2016; Iliopoulou et al. 2018)....

[...]

##### References

6,150 citations

### "A large sample analysis of seasonal..." refers background in this paper

...169 Additionally, the Köppen-Geiger climatic classification (Kottek et al., 2006) of the rivers is also assessed....

[...]

4,638 citations

### "A large sample analysis of seasonal..." refers background in this paper

...The presence of long-54 term persistence in streamflow has been known for a long time since the pioneering works of Hurst (1951) 55 and has been actively studied ever since (e.g. Koutsoyiannis, 2011; Montanari, 2012; O’Connell et al., 2016 56 and references therein)....

[...]

3,267 citations

2,498 citations

### "A large sample analysis of seasonal..." refers background or methods in this paper

...These properties can be 188 efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for 189 the first two principal components along with the relative position of the p variables as vectors in the two-190 dimensional space....

[...]

...These properties can be 188 efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for 189 the first two principal components along with the relative position of the p variables as vectors in the two190...

[...]

2,136 citations

### "A large sample analysis of seasonal..." refers background in this paper

...…by the 276 Kolmogorov-Smirnov test for both LFS lags (corresponding p-values, plag1 < 2.2 ×10 –6 and plag2 < 2.2 ×10 –6 277 for the null hypothesis that the LFS correlation coefficients are not higher than the corresponding values for 278 the monthly series autocorrelation; Conover, 1971)....

[...]