scispace - formally typeset

Posted ContentDOI

A large sample analysis of seasonal river flow correlation and its physical drivers

03 Apr 2018-Hydrology and Earth System Sciences Discussions (Copernicus GmbH)-pp 1-34

Abstract: . The geophysical and hydrological processes governing river flow formation exhibit persistence at several timescales, which may manifest itself with the presence of positive seasonal correlation of streamflow at several different time lags. We investigate here how persistence propagates along subsequent seasons and affects low and high flows. We define the High Flow Season (HFS) and the Low Flow Season (LFS) as the three-month and the one-month periods which usually exhibit the higher and lower river flows, respectively. A dataset of 224 European rivers spanning more than 50 years of daily flow data is exploited. We compute the lagged seasonal correlation between selected river flow signatures, in HFS and LFS, and the average river flow in the antecedent months. Signatures are peak and average river flow for HFS and LFS, respectively. We investigate the links between seasonal streamflow correlation and various physiographic catchment characteristics and hydro-climatic properties. We find persistence to be more intense for LFS signatures than HFS. To exploit the seasonal correlation in flood frequency estimation, we fit a bivariate Meta-Gaussian probability distribution to peak HFS flow and average pre-HFS flow in order to condition the peak flow distribution in the HFS upon river flow observations in the previous months. The benefit of the suggested methodology is demonstrated by updating the flood frequency distribution one season in advance in real-world cases. Our findings suggest that there is a traceable physical basis for river memory which in turn can be statistically assimilated into flood frequency estimation to reduce uncertainty and improve predictions for technical purposes.
Topics: Streamflow (52%)

Summary (6 min read)

1 Introduction

  • Recent analyses for the Po River and the Danube River highlighted that catchments may exhibit significant correlation between peak river flows and average flows in the previous months (Aguilar et al., 2017).
  • The presence of long-term persistence in streamflow has been known for a long time, since the pioneering works of Hurst (1951), and has been actively studied ever since (e.g. Koutsoyiannis, 2011; Montanari, 2012; O’Connell et al., 2016 and references therein).
  • While a number of seasonal flow forecasting methods have been explored in the literature (e.g. Bierkens and van Beek, 2009; Dijk et al., 2013), attempts to explicitly exploit streamflow persistence in seasonal forecasting through information from past flows have been, in general, limited.
  • These questions are relevant for gaining a better comprehension of catchment dynamics and planning mitigation strategies for natural hazards.

2 Methodology

  • The above steps are described in detail in the following sections.

2.1 Season identification

  • Season identification is performed algorithmically to identify the high-flow season (HFS) and low-flow season (LFS) for each river time series.
  • In all other cases, the method allows for the search of a second peak month and the identification of a minor HFS, but the authors do not further elaborate on this analysis here, because they are only interested in the most extreme seasons for the purpose of predicting high and low flows.
  • The method proposed by Lee et al. (2015) has several advantages that make it suitable for the purpose of this research.
  • Most importantly, it is capable of handling conditions of bimodality, which is usually a major issue for traditional methods, e.g. directional statistics (Cunderlik et al., 2004).
  • The LFS is herein identified as the 1-month period with the lowest amount of mean monthly flow.

2.2.1 Correlation analysis

  • In the case of HFS, a correlation is sought between the maximum daily flow occurring in the HFS period and the mean flow in the previous months, before the onset of HFS.
  • For LFS, correlation is computed between the mean flow in the LFS itself and the mean flow in the previous months.
  • The authors use the mean flow in the previous month as a robust proxy of “storage” in the catchment that is expected to reflect the state of the catchment, i.e. wetter or drier than usual.

2.2.2 Analysis of physical drivers

  • Catchment, geological, and climatic descriptors.
  • As catchment descriptors, the authors consider the basin area (A), the baseflow index (BI), the mean specific runoff (SR), the percentage of basin area covered by lakes (percentage of lakes – PL) and glaciers (percentage of glaciers – PG), and altitude as candidates for explanatory variables for streamflow correlation.
  • BI is calculated from the daily flow series of the rivers following the hydrograph separation procedure detailed in Gustard et al. (2008).
  • Corresponding gridded data are retrieved from the WorldClim database (http://www.worldclim.org/, last access: 20 March 2017) at a spatial resolution of 10 arcminutes (approximately 18.55 km).
  • These properties can be efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for the first two principal components along with the relative position of the p variables as vectors in the two-dimensional space.

2.3 Technical experiment: real-time updating of the frequency distribution of high and low flows

  • In order to evaluate the usefulness of the information provided by the 1-month-lag seasonal correlation for flow signatures in HFS and LFS, the authors perform a real-time updating of the frequency distribution of high and low flows based on the average river flow in the previous month.
  • A similar analysis for the high flows was carried out by Aguilar et al. (2017) for the Po and Danube Rivers.
  • In detail, a bi-variate meta-Gaussian probability distribution (Kelly and Krzysztofowicz, 1997; Montanari and Brath, 2004) is fitted between the observed flow signatures, i.e. peak flow in the HFS, QP, average flow in the LFS, QL, and the average flow in the pre-HFS and LFS months,Qm.
  • The authors define the generic random variable NQfs to represent any dependent flow signature, i.e.; NQP and NQL in their case.
  • Ρ(NQm,NQfs)NQm(t −h), (3) σ(NQfs(t))= (1− ρ 2(NQm,NQfs)) 0.5. (4) To derive the probability distribution of Qfs(t) conditioned to the observed Qm(t −h), the authors first apply the inverse NQT, i.e. they use linear segments to connect the points of the previous discrete quantile mapping of the original quantiles into the Gaussian domain, and accordingly, obtain Qfs(t) for any NQfs(t).

3 Data and catchment description

  • The dataset includes 224 records spanning more than 50 years of daily river flow observations from gauging stations, mostly from non-regulated streams.
  • The rest of the Swedish catchments are impacted by a Dfc climatic type, i.e. a snow climate, fully humid with cool summers.
  • 1. A summary of the river basins under study, in terms of the selected descriptors, is also provided in Table 1, showing that the investigated rivers cover a wide range of catchment area sizes, flow regimes, and climatic conditions.
  • Given that detailed information is generally lacking on the impact of regulation (Kuentz et al. 2017), the authors assume stationarity of the river flows for all the catchments herein considered and, additionally, assume that river management does not significantly affect the identification of the physical drivers.
  • Hydrol.
  • Earth Syst. Sci., 23, 73–91, 2019.

4.1 Season identification

  • Approximately half of the 224 rivers are characterized by at least one high-flow season with medium or higher significance (PAMF of HFS≥ 60 %).
  • Bimodality regimes are found with low and moderate significance in rivers located mostly in Austria and Sweden, but the authors focus here on the major high-flow season, as they are interested in the most extreme events.
  • A minor HFS analysis would be perhaps relevant in other regions of the world where bimodal flood regimes are more prominent, as suggested by the analysis of Lee et al. (2015).
  • Regarding the LFS identification, the two considered approaches (see Sect. 2.1) agree for 139 out of 224 stations, but the first method, i.e. the 1-month period with the lowest amount of mean monthly flow, is selected as being more relevant to the purpose of computing mean flow correlations.
  • Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/.

4.2 Seasonal correlation

  • LFS correlation is markedly higher than the corresponding HFS correlation for lags 1–6, and its median remains higher than 0 for more lags (see Fig. 2).
  • For the case of HFS correlation, the authors focus only on the most significant first lag, for which 73 rivers are found to have correlation significantly higher than 0 at a 5 % significance level.
  • In Fig. 3, the autocorrelation of the whole monthly series is compared to the LFS correlation for lag of 1 and 2 months, in order to prove that the seasonal correlation for LFS is significantly higher than its counterpart computed by considering the whole year.
  • Figure 4 shows the spatial pattern of HFS and LFS streamflow correlations.
  • It is interesting to notice the emergence of spatial clustering in the correlation magnitude, which implies its dependence on different spatially varying physical mechanisms.

5 Physical interpretation of correlation

  • To attribute the detected correlations to physical drivers, the authors define six groups of potential drivers of seasonal correlation magnitude: basin size, flow indices, the presence of lakes and glaciers, catchment elevation, catchment geology, and hydroclimatic forcing.
  • For some of the descriptors the information is only available for a few countries.
  • In what follows, the authors will use the term “positive impact on correlation” to imply that an increasing value of the considered descriptor is associated with increasing correlation.
  • For each descriptor, the authors also report, between parentheses, the Spearman’s rank correlation coefficient rs (Spearman, 1904) between its value and the considered (LFS or HFS) correlation and the p value of the null hypothesis rs = 0.
  • Www.hydrol-earth-syst-sci.net/23/73/2019/.

5.1 Catchment area – descriptor A

  • Figure 5 shows that there is only a weak positive impact of the catchment area (log transformed) on correlation for HFS (rs = 0.17, p = 0.01) but a more significant positive one for LFS (rs = 0.27, p = 5.5× 10−5).

5.2 Flow indices – descriptors BI and SR

  • For SR (Fig. 6b), it appears that both LFS and HFS streamflow correlations drop for increasing wetness (rs =−0.4,p = 4× 10−10, and rs =−0.28,p = 2.8× 10−5 respectively).

5.3 Presence of lakes and glaciers – descriptors PL and

  • PG Detailed information on the presence of lakes is available for the 69 Swedish catchments, while the areal extension of glaciers is known for the 108 Austrian catchments.
  • Figure S1 in the Supplement shows that the impact of lake area (Fig. S1a) on correlation for LFS and HFS is not significant but positive (rs = 0.10,p = 0.399, and rs = 0.12,p = 0.347).
  • The results for glaciers show a positive impact for LFS (rs = 0.28,p = 0.081) but a negative impact for HFS (rs =−0.34,p = 0.032).
  • Thus the observed result for LFS more likely portrays the impact of low temperature (low evapotranspiration) and snow accumulation, the latter generally being a slowly varying process.
  • For HFS, which typically occurs in the summer months for the considered catchments, flows are mainly determined by snowmelt, which is associated to reduced persistence (Fig. S1b).

5.4 Catchment elevation

  • The areal coverage of the SRTM data is limited to 60◦ N and 54◦ S, therefore data for the northern part of the Swedish catchments are not available.
  • The rest of the rivers are divided in three regions based on proximity: Region I, including the central and eastern part of the Alps and encompassing Austrian, Slovenian, and Italian catchments; Region II, including the western part of the Alps and encompassing French and Spanish territory; and Region III, including the southern part of Sweden.
  • For HFS correlation there is not a prevailing pattern.
  • Figure 8 confirms that there is a positive correlation pattern emerging with elevation for LFS.
  • Based on local climatological information, it can be concluded that the spatial pattern for LFS correlation is reflective of the timing and strength of seasonality of the low flows in Austria, where dry months occur in lowlands during the summer due to increased evapotranspiration and in the mountains during winter (mostly February) due to snow accumulation which is characterized by stronger seasonality compared to the lowlands flow regime (Parajka et al., 2016; see Fig. 1).

5.5 Catchment geology – descriptors PK and PF

  • Two different geological behaviours are identified which may impact river correlation.
  • Figure 9 shows box plots of the estimated lag-1 correlation coefficient for both HFS and LFS against rivers where PK < 50 %.
  • It is clear that there is a significant decrease in correlation where karstic areas dominate for both for HFS and LFS.
  • In a second analysis, the authors focus on Austrian catchments and investigate the relationship between correlation and percentage of flysch coverage, PF.
  • Figure S2 shows that there is not a prevailing pattern in either case (rs = 0.13,p = 0.6 for LFS, and rs =−0.19,p = 0.446 for HFS).

5.6 Atmospheric forcing – descriptors P and T

  • Figure 10 shows the lag-1 HFS and LFS correlations against estimates of the annual precipitation P and annual mean temperature T as well as the IDM.
  • LFS correlation appears to be more sensitive than HFS to the above climatic indices, www.hydrol-earth-syst-sci.net/23/73/2019/.
  • Hydrol. Earth Syst. Sci., 23, 73–91, 2019 showing a decrease with increasing temperature and also a decrease with increasing precipitation (rs =−0.44,p = 3.1× 10−12 for P , and rs =−0.57,p = 1.8× 10−20 for T ).
  • The IDM (Fig. 10c) shows a mild decrease of both LFS (rs =−0.06,p = 0.368) and HFS correlation with increas- ing IDM (rs =−0.17,p = 0.01), while for the latter there seems to be a clearer trend (lower correlation with higher IDM) in very humid areas (dark blue points in Fig. 10c).

5.7 Physical drivers of high correlation

  • To gain further insight into the results the authors select the 20 catchments with the highest streamflow seasonal correlation coef- Hydrol.
  • Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/ ficients for both HFS and LFS periods in order to investigate their physical characteristics in relation to the remaining set of rivers.
  • Table 2 summarizes statistics for selected descriptors in order to identify dominant behaviours.
  • More robust considerations can be drawn for the LFS; higher seasonal correlation is found for larger catchments with a higher baseflow index and lower specific runoff, precipitation, and wetness.
  • The presence of lakes plays a significant role, both for lag-1 and lag-2 correlations, with the latter also being significantly influenced by the presence of glaciers.

6 Principal component analysis of the predictors and linear regression

  • To avoid the impact of multicollinearity in the regression while additionally summarizing river information, the authors apply PCA (see Sect. 2.2).
  • The authors avoid including highly correlated variables in the analysis.
  • Earth Syst. Sci., 23, 73–91, 2019 transformation is applied to the basin area to reduce the impact of outliers.
  • Slovenian rivers cluster towards the direction of increasing SR and T , whereas Swedish rivers cluster towards the opposite direction of increasing BI and decreasing T .
  • The coefficients for the first three PCs are found significantly different from zero at a 0.1 % significance level and are included in the regression (see Table 4).

7 Real-time updating of the frequency distribution of high and low flows for the Oise River

  • The authors apply the technical experiment (see Sect. 2.3) for high and low flows to the Oise River in France and assess the difference in the estimated flood and low-flow magnitudes.
  • The Oise River (55 years of daily flow values) at Sempigny in France has a basin area of 4320 km2, and its gauging station at Sempigny is part of the French national realtime monitoring system (https://www.vigicrues.gouv.fr/, last access: 23 July 2018), which is in place to monitor and forecast floods in the main French rivers.
  • The selected river has a high technical relevance, since it experiences both types of extremes with large impacts.
  • It is characterized by HFS correlation ρ = 0.54, which is the third largest lag-1 correlation for the HFS in their dataset, and LFS correlation ρ = 0.80, which stands for the 70 % quantile of the sample lag-1 correlation for LFS.
  • Figure 13c, d shows the conditioned and unconditioned probability distributions of peak and low flows in the Gaussian domain.

8 Discussion

  • The methodology presented herein aims to progress their physical understanding of seasonal river flow persistence for the sake of exploiting the related information to improve probabilistic prediction of high and low flows.
  • The correlation of average flow in the previous months with the LFS flow and HFS peak flow was found to be relevant, with the former prevailing over the latter.
  • It was postulated that this is due to wet catchments showing increased short-term variability compared to drier catchments (Szolgayova et al., 2014) and having a faster response to rainfall due to saturated soil.
  • Yet, these studies refer to generally humid regions and cannot be extrapolated to more arid climates.
  • Yet the mountainous, glacierdominated rivers still show increased LFS correlation compared to rivers in the lowlands, which agrees well with other studies that have found less uncertainty in the rainfall–runoff modelling in this regime owing to the greater seasonality of the runoff process and the decreased impact of rainfall compared to the rainfall-dominated regime of the lowlands (e.g. Parajka et al., 2016).

9 Conclusions and outlook

  • This research investigates the presence of persistence in river flow at the seasonal scale, the associated physical drivers, and the prospect for employing the related information to improve probabilistic prediction of high and low flows by exploring a large sample of European rivers.
  • – Storage mechanisms, groundwater-dominated basins, and slower catchment response time, as reflected by www.hydrol-earth-syst-sci.net/23/73/2019/.
  • Indeed, the presence of river memory at the seasonal scale represents a possible opportunity to improve the prediction of water-related natural hazards by reducing uncertainty of associated estimates and allowing significant lag time for decision-making and hazard prevention.
  • Nejc Bezak gratefully acknowledges funding by the Slovenian Research Agency (grants J2-7322 and P2-0180).
  • Edited by: Louise Slater Reviewed by: three anonymous referees.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Hydrol. Earth Syst. Sci., 23, 73–91, 2019
https://doi.org/10.5194/hess-23-73-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
A large sample analysis of European rivers on seasonal river flow
correlation and its physical drivers
Theano Iliopoulou
1
, Cristina Aguilar
2
, Berit Arheimer
3
, María Bermúdez
4
, Nejc Bezak
5
, Andrea Ficchì
6,a
,
Demetris Koutsoyiannis
1
, Juraj Parajka
7
, María José Polo
2
, Guillaume Thirel
8
, and Alberto Montanari
9
1
Department of Water Resources and Environmental Engineering, School of Civil Engineering,
National Technical University of Athens, Zographou, 15780, Greece
2
Fluvial Dynamics and Hydrology Research Group, Andalusian Institute of Earth System Research,
University of Córdoba, Córdoba, 14071, Spain
3
Swedish Meteorological and Hydrological Institute, 601 76 Norrköping, Sweden
4
Water and Environmental Engineering Group, Department of Civil Engineering,
University of A Coruña, 15071 A Coruña, Spain
5
Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, 1000 Ljubljana, Slovenia
6
Department of Geography and Environmental Science, University of Reading, Reading, RG6 6AB, UK
7
Vienna University of Technology, Institute of Hydraulic Engineering and Water Resources Management,
Karlsplatz 13/222, 1040 Vienna, Austria
8
IRSTEA, Hydrology Research Group (HYCAR), 92761, Antony, France
9
Department DICAM, University of Bologna, Bologna, 40136, Italy
a
formerly at: IRSTEA, Hydrology Research Group (HYCAR), 92761, Antony, France
Correspondence: Theano Iliopoulou (anyily@central.ntua.gr)
Received: 15 March 2018 Discussion started: 3 April 2018
Revised: 16 November 2018 Accepted: 6 December 2018 Published: 7 January 2019
Abstract. The geophysical and hydrological processes gov-
erning river flow formation exhibit persistence at several
timescales, which may manifest itself with the presence of
positive seasonal correlation of streamflow at several differ-
ent time lags. We investigate here how persistence propagates
along subsequent seasons and affects low and high flows. We
define the high-flow season (HFS) and the low-flow season
(LFS) as the 3-month and the 1-month periods which usu-
ally exhibit the higher and lower river flows, respectively. A
dataset of 224 rivers from six European countries spanning
more than 50 years of daily flow data is exploited. We com-
pute the lagged seasonal correlation between selected river
flow signatures, in HFS and LFS, and the average river flow
in the antecedent months. Signatures are peak and average
river flow for HFS and LFS, respectively. We investigate the
links between seasonal streamflow correlation and various
physiographic catchment characteristics and hydro-climatic
properties. We find persistence to be more intense for LFS
signatures than HFS. To exploit the seasonal correlation in
the frequency estimation of high and low flows, we fit a bi-
variate meta-Gaussian probability distribution to the selected
flow signatures and average flow in the antecedent months
in order to condition the distribution of high and low flows
in the HFS and LFS, respectively, upon river flow observa-
tions in the previous months. The benefit of the suggested
methodology is demonstrated by updating the frequency dis-
tribution of high and low flows one season in advance in a
real-world case. Our findings suggest that there is a traceable
physical basis for river memory which, in turn, can be sta-
tistically assimilated into high- and low-flow frequency es-
timation to reduce uncertainty and improve predictions for
technical purposes.
Published by Copernicus Publications on behalf of the European Geosciences Union.

74 T. Iliopoulou et al.: A large sample analysis of European rivers
1 Introduction
Recent analyses for the Po River and the Danube River high-
lighted that catchments may exhibit significant correlation
between peak river flows and average flows in the previ-
ous months (Aguilar et al., 2017). Such correlation is the
result of the behaviours of the physical processes involved
in the rainfall–runoff transformation that may induce mem-
ory in river flows at several different timescales. The pres-
ence of long-term persistence in streamflow has been known
for a long time, since the pioneering works of Hurst (1951),
and has been actively studied ever since (e.g. Koutsoyian-
nis, 2011; Montanari, 2012; O’Connell et al., 2016 and refer-
ences therein). While a number of seasonal flow forecasting
methods have been explored in the literature (e.g. Bierkens
and van Beek, 2009; Dijk et al., 2013), attempts to explic-
itly exploit streamflow persistence in seasonal forecasting
through information from past flows have been, in general,
limited. Koutsoyiannis et al. (2008) proposed a stochastic ap-
proach to incorporate persistence of past flows into a predic-
tion methodology for monthly average streamflow and found
the method to outperform the historical analogue method (see
also Dimitriadis et al., 2016, for theory and applications of
the latter) and artificial neural network methods in the case
of the Nile River. Similarly, Svensson (2016) assumed that
the standardized anomaly of the most recent month will not
change during future months to derive monthly flow fore-
casts for 1–3 months lead time and found the predictive skill
to be superior to the analogue approach for 93 UK catch-
ments. The above-mentioned persistence approach has also
been used operationally in the production of seasonal stream-
flow forecasts in the UK since 2013, within the framework of
the Hydrological Outlook UK (Prudhomme et al. 2017). A
few other studies have included past flow information in pre-
diction schemes along with teleconnections or other climatic
indices (Piechota et al., 2001; Chiew et al., 2003; Wang et al.,
2009). Recently, it was shown that streamflow persistence,
revealed as seasonal correlation, may also be relevant for pre-
diction of extreme events by allowing one to update the flood
frequency distribution based on river flow observations in the
pre-flood season and reduce its bias and variability (Aguilar
et al., 2017). The above previous studies postulated that sea-
sonal streamflow correlation may be due to the persistence
of the catchments storage and/or the weather, but no attempt
was made to identify the physical drivers.
The present study aims to further inspect seasonal persis-
tence in river flows and its determinants, by referring to a
large sample of catchments in six European countries (Aus-
tria, Sweden, Slovenia, France, Spain, and Italy). We focus
on persistence properties of both high and low flows by in-
vestigating the following research questions: (i) what are the
physical conditions, in terms of catchment properties, i.e. ge-
ology and climate, which may induce seasonal persistence in
river flow, and (ii) can floods and droughts be predicted, in
probabilistic terms, by exploiting the information provided
by average flows in the previous months? These questions
are relevant for gaining a better comprehension of catchment
dynamics and planning mitigation strategies for natural haz-
ards. To reach the above goals, we identify a set of descrip-
tors for catchment behaviours and climate and inspect their
impact on correlation magnitude and predictability of river
flows.
A few studies have analysed physical drivers of streamflow
persistence on annual and deseasonalized monthly and daily
time series (Mudelsee, 2007; Hirpa et al., 2010; Gudmunds-
son et al., 2011; Zhang et al., 2012; Szolgayova et al., 2014;
Markonis et al., 2018), but the topic has been less studied on
intra-annual scales relevant to seasonal forecasting of floods
and droughts.
To demonstrate the high practical relevance of the identi-
fied seasonal correlations we present a technical experiment
for one of the studied rivers (Sect. 7) in which the frequency
distribution of both high and low flows is updated one season
in advance by exploiting real-time information on the state
of the catchment.
2 Methodology
The investigation of the persistence properties of river flows
focuses separately on both high and low discharges and is
articulated in the following steps: (a) identification of the
high- and low-flow seasons, (b) correlation assessment be-
tween the peak flow in the high-flow season (average flow
in the low-flow season) and average flows in the previous
months, (c) analysis of the physical drivers for streamflow
persistence and its predictability through a principal compo-
nent analysis (PCA), and (d) real-time updating of the fre-
quency distribution of high and low flows for a selected case
study with significant seasonal correlation by employing a
meta-Gaussian approach. The above steps are described in
detail in the following sections.
2.1 Season identification
Season identification is performed algorithmically to identify
the high-flow season (HFS) and low-flow season (LFS) for
each river time series. For the estimation of HFS, we employ
an automated method recently proposed by Lee et al. (2015),
which identifies the high-flow season as the 3-month period
centred around the month with the maximum number of oc-
currences of peaks over threshold (POT), with the thresh-
old set to the highest 5 % of the daily flows. To evaluate
the selection of HFS, a metric constructed as the percentage
of annual maximum flows (PAMF) captured in the HFS is
used. The PAMFs are classified in the subjective categories
of “poor” (< 40 %), “low” (40 %–60 %), “medium” (60 %–
80 %), and “high” (> 80 %) values, denoting the probability
that the identified HFS is the dominant high-flow season in
the record. If the identified peak month alone contains more
Hydrol. Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/

T. Iliopoulou et al.: A large sample analysis of European rivers 75
than or equal to 80 % of the annual maxima flows, a unimodal
regime is assumed and the identification procedure is termi-
nated. In all other cases, the method allows for the search of
a second peak month and the identification of a minor HFS,
but we do not further elaborate on this analysis here, because
we are only interested in the most extreme seasons for the
purpose of predicting high and low flows.
The method proposed by Lee et al. (2015) has several ad-
vantages that make it suitable for the purpose of this research.
Most importantly, it is capable of handling conditions of bi-
modality, which is usually a major issue for traditional meth-
ods, e.g. directional statistics (Cunderlik et al., 2004). A po-
tential limitation is the assumption of symmetrical extension
of HFS around the peak month, along with the uniform selec-
tion of its length (3-month period). The degree of subjectiv-
ity in the evaluation of the second HFS is another limitation,
which is not relevant here, as we focus on the main HFS.
The LFS is herein identified as the 1-month period with
the lowest amount of mean monthly flow. An alternative ap-
proach of estimating the relative frequencies of annual min-
ima of monthly flow and selecting the month with the highest
frequency as the LFS is also considered.
2.2 Correlation analysis and physical interpretation
through principal component analysis
2.2.1 Correlation analysis
In the case of HFS, a correlation is sought between the max-
imum daily flow occurring in the HFS period and the mean
flow in the previous months, before the onset of HFS. For
LFS, correlation is computed between the mean flow in the
LFS itself and the mean flow in the previous months. We use
the mean flow in the previous month as a robust proxy of
“storage” in the catchment that is expected to reflect the state
of the catchment, i.e. wetter or drier than usual. Since we are
interested in seasonal persistence, we compute the Pearson’s
correlation coefficient for HFS lag up to 9 months and for
LFS lag up to 11 months.
2.2.2 Analysis of physical drivers
Catchment, geological, and climatic descriptors
An extensive investigation is carried out to identify physical
drivers of seasonal streamflow correlation, in terms of catch-
ment, geological, and climatic descriptors.
As catchment descriptors, we consider the basin area (A),
the baseflow index (BI), the mean specific runoff (SR), the
percentage of basin area covered by lakes (percentage of
lakes PL) and glaciers (percentage of glaciers PG), and
altitude as candidates for explanatory variables for stream-
flow correlation.
The area A (km
2
) is primarily investigated, as it is repre-
sentative of the scale of the catchment, under the assumption
that in larger basins the impact of the climatological and geo-
physical processes affecting river flow becomes more signif-
icant and may lead to a magnified seasonal correlation.
The BI is considered based on the assumption that high
groundwater storage may be a potential driver of correla-
tion. BI is calculated from the daily flow series of the rivers
following the hydrograph separation procedure detailed in
Gustard et al. (2008). Flow minima are sampled from non-
overlapping 5-day blocks of the daily flow series, and turn-
ing points in the sequence of minima are sought and identi-
fied when the 90 % value of a certain minimum is smaller or
equal to its adjacent values. Subsequently, linear interpola-
tion is used in between the turning points to obtain the base-
flow hydrograph. The BI is obtained as the ratio of the vol-
ume of water beneath the baseflow separation curve versus
the total volume of water from the observed hydrograph, and
an average value is computed over all the observed hydro-
graphs for a given catchment. A low index is indicative of
an impermeable catchment with rapid response, whereas a
high value suggests high storage capacity and a stable flow
regime.
SR (m
3
s
1
km
2
) is computed as the mean daily flow
of the river standardized by the size of its basin area. It
may be an important physical driver, as it is an indica-
tor of the catchment’s wetness. PL (%) and PG (%) are
investigated for the Swedish and Austrian catchments, re-
spectively, as lakes and glaciers are expected to increase
catchment storage thus affecting persistence. Lake cover-
age data are based on cartography and are available from
the Swedish Water Archive (https://www.smhi.se/, last ac-
cess: 1 November 2016), while glacier coverage data are
estimated from the CORINE land cover database (https:
//www.eea.europa.eu/publications/COR0-landcover, last ac-
cess: 6 November 2016).
The effect of catchment altitude is also inspected us-
ing relief maps from the Shuttle Radar Topography Mis-
sion (SRTM) data (http://srtm.csi.cgiar.org/, last access:
28 July 2017). The data are available for the whole globe and
are sampled at 3 arcsec resolution (approximately 90 m). To-
pographic information is available for all catchments located
at latitudes lower than 60
north, while a 1 km resolution dig-
ital elevation model is available for Austria.
As geological descriptors we consider the percentage of
catchment area with the presence of flysch (percentage of fly-
sch PF) and karstic formations (percentage of karst PK)
for Austrian and Slovenian catchments, respectively, where
this type of information is available. A subset of Austrian
catchments is characterized by the dominant presence of fly-
sch, a sequence of sedimentary rocks characterized by low
permeability, which is known to generate a very fast flow
response. Karstic catchments, characterized by the irregular
presence of sinkholes and caves, are also known for having
rapid response times and complex behaviour; e.g. initiating
fast preferential groundwater flow and intermittent discharge
via karstic springs (Ravbar, 2013; Cervi et al., 2017). Ge-
ological features are also presumed to be linked to persis-
www.hydrol-earth-syst-sci.net/23/73/2019/ Hydrol. Earth Syst. Sci., 23, 73–91, 2019

76 T. Iliopoulou et al.: A large sample analysis of European rivers
tence properties, because geology is the main control for the
baseflow index across the European continent (Kuentz et al.,
2017). PK (%) and PF (%) are estimated from geological
maps of Slovenia and Austria, respectively.
As climatic descriptors, the mean annual precipitation P
(mm year
1
) and the mean annual temperature T (
C) are
selected. Corresponding gridded data are retrieved from the
WorldClim database (http://www.worldclim.org/, last access:
20 March 2017) at a spatial resolution of 10 arcminutes (ap-
proximately 18.55 km). We note that low mean temperature
regimes are also associated with snow, the presence of which
is also considered in the interpretation of the results. We also
adopt the De Martonne index (IDM; De Martonne, 1926) as
a climatic descriptor, which is given by IDM = P /(T + 10)
and enables classification of a region into one of the fol-
lowing six climate classes, i.e. arid (IDM 5), semi-arid
(5 < IDM 10), dry subhumid (10 < IDM 20), wet subhu-
mid (20 < IDM 30), humid (30 < IDM 60), and very hu-
mid (IDM 60). Additionally, the Köppen–Geiger climatic
classification (Kottek et al., 2006) of the rivers is assessed.
Principal component analysis
To identify which catchment, physiographic, and climatic
characteristics may explain river memory, we attempt to
regress the seasonal streamflow correlation on the physical
descriptors introduced above. We expect the presence of mul-
ticollinearity among the predictor variables, and therefore
PCA (Pearson, 1901; Hotelling, 1933) was applied to con-
struct uncorrelated explanatory variables. In essence, PCA
is an orthonormal linear transformation of p data variables
into a new coordinate system of q p uncorrelated variables
(principal components PCs) ordered by decreasing degree
of variance retained when the original p variables are pro-
jected into them (Jolliffe, 2002). Therefore, the first princi-
pal axis contains the greatest degree of variance in the data,
while the second principal axis is the direction which max-
imizes the variance among all directions orthogonal to the
first principal axis, and each succeeding component in turn
has the highest variance possible while satisfying the condi-
tion of orthogonality to the preceding components. Specifi-
cally, let x be a random vector with mean µ and correlation
matrix 6, and the principal component transformation of x
is then obtained as follows:
y = C
T
x
0
, (1)
where y is the transformed vector whose kth column is the
kth principal component (k = 1, 2, . .. , p), C is the p × p
matrix of the coefficients or loadings for each principal com-
ponent, and x
0
is the standardized x vector. Standardization is
applied in order to avoid the impact of the different variable
units on selecting the direction of maximum variance when
forming the PCs. The y values are the scores of each obser-
vation, i.e. the transformed values of each observation of the
original p variables in the kth principal component direction.
PCA has useful descriptive properties of the underlying
structure of the data. These properties can be efficiently vi-
sualized in the biplot (Gabriel, 1971), which is the combined
plot of the scores of the data for the first two principal com-
ponents along with the relative position of the p variables as
vectors in the two-dimensional space. Herein, the distance
biplot type (Gower and Hand, 1995), which approximates
the Euclidean distances between the observations, is used.
Variable vector coordinates are obtained by the coefficients
of each variable for the first two principal components. After
construction of the PCs, a linear regression model is explored
for the case of HFS and LFS lag-1 correlation.
2.3 Technical experiment: real-time updating of the
frequency distribution of high and low flows
In order to evaluate the usefulness of the information pro-
vided by the 1-month-lag seasonal correlation for flow signa-
tures in HFS and LFS, we perform a real-time updating of the
frequency distribution of high and low flows based on the av-
erage river flow in the previous month. A similar analysis for
the high flows was carried out by Aguilar et al. (2017) for the
Po and Danube Rivers. In principle, this is a data assimila-
tion approach, since real-time information, i.e. observations
of the average river flow, is used in order to update a prob-
abilistic model and inform the forecast of the flow signature
of the upcoming season.
In detail, a bi-variate meta-Gaussian probability distribu-
tion (Kelly and Krzysztofowicz, 1997; Montanari and Brath,
2004) is fitted between the observed flow signatures, i.e. peak
flow in the HFS, Q
P
, average flow in the LFS, Q
L
, and the
average flow in the pre-HFS and LFS months, Q
m
. The peak
HFS flow and the average LFS flow are the dependent vari-
ables and are extracted as the peak river discharge observed
in the previously identified HFS and the average river dis-
charge observed in the previously identified LFS, respec-
tively. The average flow in the month preceding the HFS and
the LFS is the explanatory variable in both cases. In the fol-
lowing, random variables are denoted by an underscore and
their outcomes are written in plain form.
The normal quantile transform (NQT; Kelly and Krzyszto-
fowicz, 1997) is used in order to make the marginal probabil-
ity distribution of dependent and explanatory variables Gaus-
sian. This is achieved as follows: (a) the sample quantiles Q
are sorted in increasing order, e.g. Q
m
1
,Q
m
2
. . . Q
m
n
, (b) the
cumulative frequency, e.g. FQ
m
i
, is computed via a Weibull
plotting position, and (c) the standard normal quantile, e.g.
NQ
m
i
, is obtained as the inverse of the standard normal dis-
tribution for each cumulative frequency, e.g. G
1
(FQ
m
i
).
Therefore, all sample quantiles are discretely mapped into
the Gaussian domain. To get the inverse transformation for
any normal quantile, we connect the points in the above map-
ping with linear segments. The extreme segments are ex-
tended to allow extrapolation outside the range covered by
the observed sample.
Hydrol. Earth Syst. Sci., 23, 73–91, 2019 www.hydrol-earth-syst-sci.net/23/73/2019/

T. Iliopoulou et al.: A large sample analysis of European rivers 77
In the Gaussian domain, a bi-variate Gaussian distribu-
tion is fitted between the random explanatory variable NQ
m
and the dependent variables NQ
P
and NQ
L
by assuming the
stationarity and ergodicity of the variables. We define the
generic random variable NQ
fs
to represent any dependent
flow signature, i.e.; NQ
P
and NQ
L
in our case. Then, the pre-
dicted signature at time t can be written as
NQ
fs
(t) = ρ(NQ
m
,NQ
fs
)NQ
m
(t h) + Nε(t), (2)
where ρ(NQ
m
, NQ
fs
) is the Pearson’s cross-correlation coef-
ficient between NQ
m
and NQ
fs
, h is the selected correlation
lag with h = 1 in the present application, and Nε(t) is an
outcome of the stochastic process Nε, which is independent,
homoscedastic, stochastically independent of NQ
m
, and nor-
mally distributed with zero mean and variance 1 ρ
2
(NQ
m
,
NQ
fs
). Then, the joint bi-variate Gaussian probability dis-
tribution function is defined by the mean (µ(NQ
m
) = 0
and µ(NQ
fs
) = 0), the standard deviation (σ (NQ
m
) = 1 and
σ (NQ
fs
) = 1) of the standardized normalized series, and the
Pearson’s cross-correlation coefficient between the normal-
ized series, ρ(NQ
m
, NQ
fs
). From the Gaussian bi-variate
probability properties, it follows that for any observed
NQ
m
(t h) the probability distribution function of NQ
fs
(t)
conditioned on NQ
m
is Gaussian, with parameters given by
µ(NQ
fs
(t)) = ρ(NQ
m
,NQ
fs
)NQ
m
(t h), (3)
σ (NQ
fs
(t)) = (1 ρ
2
(NQ
m
,NQ
fs
))
0.5
. (4)
To derive the probability distribution of Q
fs
(t) conditioned
to the observed Q
m
(t h), we first apply the inverse NQT,
i.e. we use linear segments to connect the points of the pre-
vious discrete quantile mapping of the original quantiles into
the Gaussian domain, and accordingly, obtain Q
fs
(t) for any
NQ
fs
(t). Subsequently, we estimate the parameters of an as-
signed probability distribution for the obtained quantiles in
the untransformed domain. This is referred to as the up-
dated probability distribution of the considered flow signa-
ture (NQ
P
and NQ
L
, in our case). We use the extreme value
type I distribution for the peak flows and calculate the differ-
ences in the magnitude of estimated maxima for a given re-
turn period between the unconditioned and the updated distri-
bution. The latter is conditioned by the 95 % sample quantile
of the observed mean flow in the previous month. To model
the low flows we use the log-normal distribution, which was
found to exhibit the best fit for the river in question among
other typical candidates for average flows, i.e. the Weibull
and Gamma distribution. The low flows are conditioned by
the lower 5 % sample quantile of the observed mean flow in
the previous month.
3 Data and catchment description
The dataset includes 224 records spanning more than
50 years of daily river flow observations from gauging sta-
tions, mostly from non-regulated streams. A few catchments
are impacted by regulation. Among the 224 rivers, 108 are
located in Austria, 69 in Sweden, 31 in Slovenia, 13 in
France, two in Spain, and one in Italy. Catchment areas vary
significantly, the largest being the Po River basin in Italy
(70 091 km
2
) and the smallest being the Hallabäcken River
basin in Sweden (4.7 km
2
). The geographical location of the
river gauge stations as well as their climatic classification are
shown in Fig. 1. Most of the examined rivers belong to either
a warm temperate (C) or a boreal or snow climate (D) with
a subset impacted by polar climatic conditions (E), accord-
ing to the updated world map of the Köppen–Geiger climate
classification (Fig. 1) based on gridded temperature and pre-
cipitation data for the period 1951–2000 (Kottek et al., 2006).
More specifically, the majority of French and Slovenian and
approximately one third of the Swedish basins belong to the
warm temperate Cfb category characterized by precipitation
distributed throughout the year (fully humid) and warm sum-
mers. The rest of the Swedish catchments are impacted by a
Dfc climatic type, i.e. a snow climate, fully humid with cool
summers. The Austrian catchments belonging to the region
impacted by the European Alps have the most complicated
regime due to their topographic variability. At the lowest al-
titudes, Cfb is the prevailing regime, but as proximity to the
Alps increases, a Dfc regime dominates, and progressively, in
the highest altitude basins, the climate becomes a polar tun-
dra type (Et), characterized primarily by the very low temper-
atures present. The characteristics of all the climatic regimes
of the studied rivers are given in the legend of Fig. 1. A sum-
mary of the river basins under study, in terms of the selected
descriptors, is also provided in Table 1, showing that the in-
vestigated rivers cover a wide range of catchment area sizes,
flow regimes, and climatic conditions.
It is relevant to note that 16 of the Austrian rivers are sub-
ject to regulation, which may alter the persistence proper-
ties of river flows. This relates to generally “mild” forms of
regulation, i.e. upstream regulation with a very low degree
of flow attenuation, hydropower operations, and flow diver-
sions to and from the basin. A preliminary examination of
these rivers did not reveal any significant change during time
of the flow regime. The presence of regulation does not pre-
clude the exploitation of correlation for predicting river flows
in probabilistic terms, but it may affect the analysis of phys-
ical drivers, as it may enhance or reduce persistence in the
natural river flow regime. Given that detailed information is
generally lacking on the impact of regulation (Kuentz et al.
2017), we assume stationarity of the river flows for all the
catchments herein considered and, additionally, assume that
river management does not significantly affect the identifica-
tion of the physical drivers.
www.hydrol-earth-syst-sci.net/23/73/2019/ Hydrol. Earth Syst. Sci., 23, 73–91, 2019

Figures (15)
Citations
More filters

01 Apr 2017-
Abstract: . This study contributes to better understanding the physical controls on spatial patterns of pan-European flow signatures – taking advantage of large open datasets for catchment classification and comparative hydrology. Similarities in 16 flow signatures and 35 catchment descriptors were explored for 35 215 catchments and 1366 river gauges across Europe. Correlation analyses and stepwise regressions were used to identify the best explanatory variables for each signature. Catchments were clustered and analyzed for similarities in flow signature values, physiography and the combination of the two. We found the following. (i) A 15 to 33 % (depending on the classification used) improvement in regression model skills when combined with catchment classification versus simply using all catchments at once. (ii) Twelve out of 16 flow signatures were mainly controlled by climatic characteristics, especially those related to average and high flows. For the baseflow index, geology was more important and topography was the main control for the flashiness of flow. For most of the flow signatures, the second most important descriptor is generally land cover (mean flow, high flows, runoff coefficient, ET, variability of reversals). (iii) Using a classification and regression tree (CART), we further show that Europe can be divided into 10 classes with both similar flow signatures and physiography. The most dominant separation found was between energy-limited and moisture-limited catchments. The CART analyses also separated different explanatory variables for the same class of catchments. For example, the damped peak response for one class was explained by the presence of large water bodies for some catchments, while large flatland areas explained it for other catchments in the same class. In conclusion, we find that this type of comparative hydrology is a helpful tool for understanding hydrological variability, but is constrained by unknown human impacts on the water cycle and by relatively crude explanatory variables.

62 citations


Journal ArticleDOI
Ressol R. Shakir1Institutions (1)
Abstract: Selecting suitable probability distributions (PDs) to describe cone tip resistance measurements (qc) obtained by a cone penetration test (CPT) is considered a crucial requirement to get a good solution for geotechnical problems solved by simulating the engineering properties of soil as a random field or for use in reliability-based design. This paper presents a statistical analysis of seven PDs proposed to model qc obtained through performing CPT for soil in Nasiriyah during the construction of a new refinery petrol station. Preliminary testing for suitability of the suggested distributions has used the method of moment ratio diagram (MRD) based on the Pearson system. It was found that the soil stratification has a large effect on the distance between every two points on MRD. The type of probability distribution was also affected, and changed, by increasing the number of data points for qc included in the analysis. Logistic and Weibull distributions are considered the best PDs that represent the qc of the first layer having thickness 12 m of clay soil, followed by the other distributions, while the logistic and normal distributions were considered the best PDs among the seven suggested distributions for the second layer of 8 m silty sand and clayey sand. All the suggested distribution can represent the given qc data approximately except the Rayleigh distribution.

1 citations


Cites methods from "A large sample analysis of seasonal..."

  • ...3.7 Boxplots Box plots as an exploratory data analysis are widely used to describe data and identify any outliers (McGill et al. 1978; Aihua et al. 2016; Ouarda et al. 2016; Iliopoulou et al. 2018)....

    [...]


References
More filters

Journal ArticleDOI
TL;DR: A new digital Koppen-Geiger world map on climate classification, valid for the second half of the 20 th century, based on recent data sets from the Climatic Research Unit of the University of East Anglia and the Global Precipitation Climatology Centre at the German Weather Service.
Abstract: The most frequently used climate classification map is that o f Wladimir Koppen, presented in its latest version 1961 by Rudolf Geiger. A huge number of climate studies and subsequent publications adopted this or a former release of the Koppen-Geiger map. While the climate classification concept has been widely applied to a broad range of topics in climate and climate change research as well as in physical geography, hydrology, agriculture, biology and educational aspects, a well-documented update of the world climate classification map is still missing. Based on recent data sets from the Climatic Research Unit (CRU) of the University of East Anglia and the Global Precipitation Climatology Centre (GPCC) at the German Weather Service, we present here a new digital Koppen-Geiger world map on climate classification, valid for the second half of the 20 th century. Zusammenfassung Die am haufigsten verwendete Klimaklassifikationskarte ist jene von Wladimir Koppen, die in der letzten Auflage von Rudolf Geiger aus dem Jahr 1961 vorliegt. Seither bildeten viele Klimabucher und Fachartikel diese oder eine fruhere Ausgabe der Koppen-Geiger Karte ab. Obwohl das Schema der Klimaklassifikation in vielen Forschungsgebieten wie Klima und Klimaanderung aber auch physikalische Geographie, Hydrologie, Landwirtschaftsforschung, Biologie und Ausbildung zum Einsatz kommt, fehlt bis heute eine gut dokumentierte Aktualisierung der Koppen-Geiger Klimakarte. Basierend auf neuesten Datensatzen des Climatic Research Unit (CRU) der Universitat von East Anglia und des Weltzentrums fur Niederschlagsklimatologie (WZN) am Deutschen Wetterdienst prasentieren wir hier eine neue digitale Koppen-Geiger Weltkarte fur die zweite Halfte des 20. Jahrhunderts.

6,150 citations


"A large sample analysis of seasonal..." refers background in this paper

  • ...169 Additionally, the Köppen-Geiger climatic classification (Kottek et al., 2006) of the rivers is also assessed....

    [...]


Journal ArticleDOI
H. E. Hurst1Institutions (1)
Abstract: A solution of the problem of determining the reservoir storage required on a given stream, to guarantee a given draft, is presented in this paper. For example, if a long-time record of annual total...

4,638 citations


"A large sample analysis of seasonal..." refers background in this paper

  • ...The presence of long-54 term persistence in streamflow has been known for a long time since the pioneering works of Hurst (1951) 55 and has been actively studied ever since (e.g. Koutsoyiannis, 2011; Montanari, 2012; O’Connell et al., 2016 56 and references therein)....

    [...]


Book
05 Sep 2011-
TL;DR: The present article is a commencement at attempting to remedy this deficiency of scientific correlation, and the meaning and working of the various formulæ have been explained sufficiently, it is hoped, to render them readily usable even by those whose knowledge of mathematics is elementary.
Abstract: All knowledge—beyond that of bare isolated occurrence—deals with uniformities. Of the latter, some few have a claim to be considered absolute, such as mathematical implications and mechanical laws. But the vast majority are only partial; medicine does not teach that smallpox is inevitably escaped by vaccination, but that it is so generally; biology has not shown that all animals require organic food, but that nearly all do so; in daily life, a dark sky is no proof that it will rain, but merely a warning; even in morality, the sole categorical imperative alleged by Kant was the sinfulness of telling a lie, and few thinkers since have admitted so much as this to be valid universally. In psychology, more perhaps than in any other science, it is hard to find absolutely inflexible coincidences; occasionally, indeed, there appear uniformities sufficiently regular to be practically treated as laws, but infinitely the greater part of the observations hitherto recorded concern only more or less pronounced tendencies of one event or attribute to accompany another. Under these circumstances, one might well have expected that the evidential evaluation and precise mensuration of tendencies had long been the subject of exhaustive investigation and now formed one of the earliest sections in a beginner’s psychological course. Instead, we find only a general naı̈ve ignorance that there is anything about it requiring to be learnt. One after another, laborious series of experiments are executed and published with the purpose of demonstrating some connection between two events, wherein the otherwise learned psychologist reveals that his art of proving and measuring correspondence has not advanced beyond that of lay persons. The consequence has been that the significance of the experiments is not at all rightly understood, nor have any definite facts been elicited that may be either confirmed or refuted. The present article is a commencement at attempting to remedy this deficiency of scientific correlation. With this view, it will be strictly confined to the needs of practical workers, and all theoretical mathematical demonstrations will be omitted; it may, however, be said that the relations stated have already received a large amount of empirical verification. Great thanks are due from me to Professor Haussdorff and to Dr. G. Lipps, each of whom have supplied a useful theorem in polynomial probability; the former has also very kindly given valuable advice concerning the proof of the important formulæ for elimination of ‘‘systematic deviations.’’ At the same time, and for the same reason, the meaning and working of the various formulæ have been explained sufficiently, it is hoped, to render them readily usable even by those whose knowledge of mathematics is elementary. The fundamental procedure is accompanied by simple imaginary examples, while the more advanced parts are illustrated by cases that have actually occurred in my personal experience. For more abundant and positive exemplification, the reader is requested to refer to the under cited research, which is entirely built upon the principles and mathematical relations here laid down. In conclusion, the general value of the methodics recommended is emphasized by a brief criticism of the best correlational work hitherto made public, and also the important question is discussed as to the number of ‘‘cases’’ required for an experimental series.

3,267 citations


Journal ArticleDOI
01 Dec 1971-Biometrika
Abstract: SUMMARY Any matrix of rank two can be displayed as a biplot which consists of a vector for each row and a vector for each column, chosen so that any element of the matrix is exactly the inner product of the vectors corresponding to its row and to its column. If a matrix is of higher rank, one may display it approximately by a biplot of a matrix of rank two which approximates the original matrix. The biplot provides a useful tool of data analysis and allows the visual appraisal of the structure of large data matrices. It is especially revealing in principal component analysis, where the biplot can show inter-unit distances and indicate clustering of units as well as display variances and correlations of the variables. Any matrix may be represented by a vector for each row and another vector for each column, so chosen that the elements of the matrix are the inner products of the vectors representing the corresponding rows and columns. This is conceptually helpful in understanding properties of matrices. When the matrix is of rank 2 or 3, or can be closely approximated by a matrix of such rank, the vectors may be plotted or modelled and the matrix representation inspected physically. This is of obvious practical interest for the analysis of large matrices. Any n x m matrix Y of rank r can be factorized as

2,498 citations


"A large sample analysis of seasonal..." refers background or methods in this paper

  • ...These properties can be 188 efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for 189 the first two principal components along with the relative position of the p variables as vectors in the two-190 dimensional space....

    [...]

  • ...These properties can be 188 efficiently visualized in the biplot (Gabriel, 1971), which is the combined plot of the scores of the data for 189 the first two principal components along with the relative position of the p variables as vectors in the two190...

    [...]


Journal ArticleDOI
01 Dec 1973-The Statistician

2,136 citations


"A large sample analysis of seasonal..." refers background in this paper

  • ...…by the 276 Kolmogorov-Smirnov test for both LFS lags (corresponding p-values, plag1 < 2.2 ×10 –6 and plag2 < 2.2 ×10 –6 277 for the null hypothesis that the LFS correlation coefficients are not higher than the corresponding values for 278 the monthly series autocorrelation; Conover, 1971)....

    [...]


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20191
20171