scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Evaluating the use of “goodness-of-fit” Measures in hydrologic and hydroclimatic model validation

01 Jan 1999-Water Resources Research (Wiley-Blackwell)-Vol. 35, Iss: 1, pp 233-241
TL;DR: In this paper, the goodness-of-fit or relative error measures (including the coefficient of efficiency and the index of agreement) that overcome many of the limitations of correlation-based measures are discussed.
Abstract: Correlation and correlation-based measures (e.g., the coefficient of determination) have been widely used to evaluate the “goodness-of-fit” of hydrologic and hydroclimatic models. These measures are oversensitive to extreme values (outliers) and are insensitive to additive and proportional differences between model predictions and observations. Because of these limitations, correlation-based measures can indicate that a model is a good predictor, even when it is not. In this paper, useful alternative goodness-of-fit or relative error measures (including the coefficient of efficiency and the index of agreement) that overcome many of the limitations of correlation-based measures are discussed. Modifications to these statistics to aid in interpretation are presented. It is concluded that correlation and correlation-based measures should not be used to assess the goodness-of-fit of a hydrologic or hydroclimatic model and that additional evaluation measures (such as summary statistics and absolute error measures) should supplement model evaluation tools.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present guidelines for watershed model evaluation based on the review results and project-specific considerations, including single-event simulation, quality and quantity of measured data, model calibration procedure, evaluation time step, and project scope and magnitude.
Abstract: Watershed models are powerful tools for simulating the effect of watershed processes and management on soil and water resources. However, no comprehensive guidance is available to facilitate model evaluation in terms of the accuracy of simulated data compared to measured flow and constituent values. Thus, the objectives of this research were to: (1) determine recommended model evaluation techniques (statistical and graphical), (2) review reported ranges of values and corresponding performance ratings for the recommended statistics, and (3) establish guidelines for model evaluation based on the review results and project-specific considerations; all of these objectives focus on simulation of streamflow and transport of sediment and nutrients. These objectives were achieved with a thorough review of relevant literature on model application and recommended model evaluation methods. Based on this analysis, we recommend that three quantitative statistics, Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of the root mean square error to the standard deviation of measured data (RSR), in addition to the graphical techniques, be used in model evaluation. The following model evaluation performance ratings were established for each recommended statistic. In general, model simulation can be judged as satisfactory if NSE > 0.50 and RSR < 0.70, and if PBIAS + 25% for streamflow, PBIAS + 55% for sediment, and PBIAS + 70% for N and P. For PBIAS, constituent-specific performance ratings were determined based on uncertainty of measured data. Additional considerations related to model evaluation guidelines are also discussed. These considerations include: single-event simulation, quality and quantity of measured data, model calibration procedure, evaluation time step, and project scope and magnitude. A case study illustrating the application of the model evaluation guidelines is also provided.

9,386 citations


Cites background or methods from "Evaluating the use of “goodness-of-..."

  • ...Several graphical techniques are also described briefly because graphical techniques provide a visual comparison of simulated and measured constituent data and a first overview of model performance (ASCE, 1993) and are essential to appropriate model evaluation (Legates and McCabe, 1999)....

    [...]

  • ...Although r and R2 have been widely used for model evaluation, these statistics are oversensitive to high extreme values (outliers) and insensitive to additive and proportional differences between model predictions and measured data (Legates and McCabe, 1999)....

    [...]

  • ...The index of agreement can detect additive and proportional differences in the observed and simulated means and variances; however, d is overly sensitive to extreme values due to the squared differences (Legates and McCabe, 1999)....

    [...]

  • ...A number of publications have addressed model evaluation statistics (Willmott, 1981; ASCE, 1993; Legates and McCabe, 1999), but they do not include recently developed statistics (e....

    [...]

  • ...50(3): 885−900 a relative model evaluation assessment, and error indices quantify the deviation in the units of the data of interest (Legates and McCabe, 1999)....

    [...]

Journal ArticleDOI
TL;DR: A diagnostically interesting decomposition of NSE is presented, which facilitates analysis of the relative importance of its different components in the context of hydrological modelling, and it is shown how model calibration problems can arise due to interactions among these components.

3,147 citations


Cites background or methods from "Evaluating the use of “goodness-of-..."

  • ...…of model skill, there has been a long and vivid discussion about the suitability of NSE (McCuen and Snyder, 1975; Martinec and Rango, 1989; Legates and McCabe, 1999; Krause et al., 2005; McCuen et al., 2006; Schaefli and Gupta, 2007; Jain and Sudheer, 2008) and several authors have…...

    [...]

  • ...For such situations, various authors have recommended the use of the seasonal or climatological mean as a baseline model (Garrick et al., 1978; Murphy, 1988; Martinec and Rango, 1989; Legates and McCabe, 1999; Schaefli and Gupta, 2007)....

    [...]

  • ...…or reporting results based on calibration with NSE, information about the correlation, bias, and variability of flows should also be given (interestingly, this was already proposed by Legates and McCabe (1999), although they did not discuss the interrelation between NSE and its three components)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors used the PRISM (Parameter-elevation relationships on independent slopes model) interpolation method to develop data sets that reflected, as closely as possible, the current state of knowledge of spatial climate patterns in the United States.
Abstract: Spatial climate data sets of 1971–2000 mean monthly precipitation and minimum and maximum temperature were developed for the conterminous United States These 30-arcsec (∼800-m) grids are the official spatial climate data sets of the US Department of Agriculture The PRISM (Parameter-elevation Relationships on Independent Slopes Model) interpolation method was used to develop data sets that reflected, as closely as possible, the current state of knowledge of spatial climate patterns in the United States PRISM calculates a climate–elevation regression for each digital elevation model (DEM) grid cell, and stations entering the regression are assigned weights based primarily on the physiographic similarity of the station to the grid cell Factors considered are location, elevation, coastal proximity, topographic facet orientation, vertical atmospheric layer, topographic position, and orographic effectiveness of the terrain Surface stations used in the analysis numbered nearly 13 000 for precipitation and 10 000 for temperature Station data were spatially quality controlled, and short-period-of-record averages adjusted to better reflect the 1971–2000 period PRISM interpolation uncertainties were estimated with cross-validation (C-V) mean absolute error (MAE) and the 70% prediction interval of the climate–elevation regression function The two measures were not well correlated at the point level, but were similar when averaged over large regions The PRISM data set was compared with the WorldClim and Daymet spatial climate data sets The comparison demonstrated that using a relatively dense station data set and the physiographically sensitive PRISM interpolation process resulted in substantially improved climate grids over those of WorldClim and Daymet The improvement varied, however, depending on the complexity of the region Mountainous and coastal areas of the western United States, characterized by sparse data coverage, large elevation gradients, rain shadows, inversions, cold air drainage, and coastal effects, showed the greatest improvement The PRISM data set benefited from a peer review procedure that incorporated local knowledge and data into the development process Copyright © 2008 Royal Meteorological Society

2,447 citations


Cites background from "Evaluating the use of “goodness-of-..."

  • ...Willmott et al., 1985; Legates and McCabe, 1999 )....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the utility of several efficiency criteria is investigated in three examples using a simple observed streamflow hydrograph, and the selection and use of specific efficiency criteria and interpretation of the results can be a challenge for even the most experienced hydrologist since each criterion may place different emphasis on different types of simulated and observed behaviours.
Abstract: . The evaluation of hydrologic model behaviour and performance is commonly made and reported through comparisons of simulated and observed variables. Frequently, comparisons are made between simulated and measured streamflow at the catchment outlet. In distributed hydrological modelling approaches, additional comparisons of simulated and observed measurements for multi-response validation may be integrated into the evaluation procedure to assess overall modelling performance. In both approaches, single and multi-response, efficiency criteria are commonly used by hydrologists to provide an objective assessment of the "closeness" of the simulated behaviour to the observed measurements. While there are a few efficiency criteria such as the Nash-Sutcliffe efficiency, coefficient of determination, and index of agreement that are frequently used in hydrologic modeling studies and reported in the literature, there are a large number of other efficiency criteria to choose from. The selection and use of specific efficiency criteria and the interpretation of the results can be a challenge for even the most experienced hydrologist since each criterion may place different emphasis on different types of simulated and observed behaviours. In this paper, the utility of several efficiency criteria is investigated in three examples using a simple observed streamflow hydrograph.

2,375 citations


Cites background or methods from "Evaluating the use of “goodness-of-..."

  • ...As a result larger values in a time series are strongly overestimated whereas lower values are neglected (Legates and McCabe, 1999)....

    [...]

  • ...4.1 Results of example 1 In example 1, the value of the coefficient of determination r2, is 1.0 while the value of the weighted coefficient,wr2, is 0.7 reflecting the poor simulation better thanr2 alone....

    [...]

  • ...2.3 Index of agreementd The index of agreementd was proposed by Willmot (1981) to overcome the insensitivity ofE and r2 to differences in the observed and predicted means and variances (Legates and McCabe, 1999)....

    [...]

  • ...…the arithmetic mean of the entire observed; and in the remaining model simulations (3 to 136), the observed hydrograph values were progressively substituted for the arithmetic mean of the entire observed hydrograph until the last model simulation (number 136) was the actual observed hydrograph....

    [...]

Journal ArticleDOI
TL;DR: The SWAT-CUP tool as discussed by the authors is a semi-distributed river basin model that requires a large number of input parameters, which complicates model parameterization and calibration, and is used to provide statistics for goodness-of-fit.
Abstract: SWAT (Soil and Water Assessment Tool) is a comprehensive, semi-distributed river basin model that requires a large number of input parameters, which complicates model parameterization and calibration. Several calibration techniques have been developed for SWAT, including manual calibration procedures and automated procedures using the shuffled complex evolution method and other common methods. In addition, SWAT-CUP was recently developed and provides a decision-making framework that incorporates a semi-automated approach (SUFI2) using both manual and automated calibration and incorporating sensitivity and uncertainty analysis. In SWAT-CUP, users can manually adjust parameters and ranges iteratively between autocalibration runs. Parameter sensitivity analysis helps focus the calibration and uncertainty analysis and is used to provide statistics for goodness-of-fit. The user interaction or manual component of the SWAT-CUP calibration forces the user to obtain a better understanding of the overall hydrologic processes (e.g., baseflow ratios, ET, sediment sources and sinks, crop yields, and nutrient balances) and of parameter sensitivity. It is important for future calibration developments to spatially account for hydrologic processes; improve model run time efficiency; include the impact of uncertainty in the conceptual model, model parameters, and measured variables used in calibration; and assist users in checking for model errors. When calibrating a physically based model like SWAT, it is important to remember that all model input parameters must be kept within a realistic uncertainty range and that no automatic procedure can substitute for actual physical knowledge of the watershed.

2,200 citations


Cites methods from "Evaluating the use of “goodness-of-..."

  • ..., 1997); (2) multiple evaluation techniques (ASCE, 1993; Legates and McCabe, 1999; Boyle et al., 2000); (3) calibrating all constituents to be evaluated; and (4) verification that other important model outputs are reasonable....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, the principles governing the application of the conceptual model technique to river flow forecasting are discussed and the necessity for a systematic approach to the development and testing of the model is explained and some preliminary ideas suggested.

19,601 citations

Journal ArticleDOI
TL;DR: In this paper, it is suggested that the correlation coefficieness between observed and simulated variates is not as good as observed variates, and that correlation can be improved.
Abstract: Traditional methods of evaluating geographic models by statistical comparisons between observed and simulated variates are criticized. In particular, it is suggested that the correlation coefficien...

3,761 citations

Journal ArticleDOI
TL;DR: This paper reviewed the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule, at a relaxed mathematical level, omitting most proofs, regularity conditions and technical details.
Abstract: This is an invited expository article for The American Statistician. It reviews the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule. The presentation is written at a relaxed mathematical level, omitting most proofs, regularity conditions, and technical details.

3,146 citations