scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On the validation of models

01 Jul 1981-Physical Geography (Taylor & Francis Group)-Vol. 2, Iss: 2, pp 184-194
TL;DR: In this paper, it is suggested that the correlation coefficieness between observed and simulated variates is not as good as observed variates, and that correlation can be improved.
Abstract: Traditional methods of evaluating geographic models by statistical comparisons between observed and simulated variates are criticized. In particular, it is suggested that the correlation coefficien...
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present guidelines for watershed model evaluation based on the review results and project-specific considerations, including single-event simulation, quality and quantity of measured data, model calibration procedure, evaluation time step, and project scope and magnitude.
Abstract: Watershed models are powerful tools for simulating the effect of watershed processes and management on soil and water resources. However, no comprehensive guidance is available to facilitate model evaluation in terms of the accuracy of simulated data compared to measured flow and constituent values. Thus, the objectives of this research were to: (1) determine recommended model evaluation techniques (statistical and graphical), (2) review reported ranges of values and corresponding performance ratings for the recommended statistics, and (3) establish guidelines for model evaluation based on the review results and project-specific considerations; all of these objectives focus on simulation of streamflow and transport of sediment and nutrients. These objectives were achieved with a thorough review of relevant literature on model application and recommended model evaluation methods. Based on this analysis, we recommend that three quantitative statistics, Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of the root mean square error to the standard deviation of measured data (RSR), in addition to the graphical techniques, be used in model evaluation. The following model evaluation performance ratings were established for each recommended statistic. In general, model simulation can be judged as satisfactory if NSE > 0.50 and RSR < 0.70, and if PBIAS + 25% for streamflow, PBIAS + 55% for sediment, and PBIAS + 70% for N and P. For PBIAS, constituent-specific performance ratings were determined based on uncertainty of measured data. Additional considerations related to model evaluation guidelines are also discussed. These considerations include: single-event simulation, quality and quantity of measured data, model calibration procedure, evaluation time step, and project scope and magnitude. A case study illustrating the application of the model evaluation guidelines is also provided.

9,386 citations

Journal ArticleDOI
TL;DR: In this paper, the goodness-of-fit or relative error measures (including the coefficient of efficiency and the index of agreement) that overcome many of the limitations of correlation-based measures are discussed.
Abstract: Correlation and correlation-based measures (e.g., the coefficient of determination) have been widely used to evaluate the “goodness-of-fit” of hydrologic and hydroclimatic models. These measures are oversensitive to extreme values (outliers) and are insensitive to additive and proportional differences between model predictions and observations. Because of these limitations, correlation-based measures can indicate that a model is a good predictor, even when it is not. In this paper, useful alternative goodness-of-fit or relative error measures (including the coefficient of efficiency and the index of agreement) that overcome many of the limitations of correlation-based measures are discussed. Modifications to these statistics to aid in interpretation are presented. It is concluded that correlation and correlation-based measures should not be used to assess the goodness-of-fit of a hydrologic or hydroclimatic model and that additional evaluation measures (such as summary statistics and absolute error measures) should supplement model evaluation tools.

3,891 citations

Journal ArticleDOI
TL;DR: In this article, it is suggested that the correlation between model-predicted and observed data, commonly described by Pearson's productmoment correlation coefficient, is an insufficient and often misleading measure of accuracy, and a complement of difference and summary univariate indices is presented as the nucleus of a more informative, albeit fundamentally descriptive, approach to model evaluation.
Abstract: Quantitative approaches to the evaluation of model performance were recently examined by Fox (1981). His recommendations are briefly reviewed and a revised set of performance statistics is proposed. It is suggested that the correlation between model-predicted and observed data, commonly described by Pearson's product-moment correlation coefficient, is an insufficient and often misleading measure of accuracy. A complement of difference and summary univariate indices is presented as the nucleus of a more informative, albeit fundamentally descriptive, approach to model evaluation. Two models that estimate monthly evapotranspiration are comparatively evaluated in order to illustrate how the recommended method(s) can be applied.

3,218 citations

Journal ArticleDOI
TL;DR: In this paper, the utility of several efficiency criteria is investigated in three examples using a simple observed streamflow hydrograph, and the selection and use of specific efficiency criteria and interpretation of the results can be a challenge for even the most experienced hydrologist since each criterion may place different emphasis on different types of simulated and observed behaviours.
Abstract: . The evaluation of hydrologic model behaviour and performance is commonly made and reported through comparisons of simulated and observed variables. Frequently, comparisons are made between simulated and measured streamflow at the catchment outlet. In distributed hydrological modelling approaches, additional comparisons of simulated and observed measurements for multi-response validation may be integrated into the evaluation procedure to assess overall modelling performance. In both approaches, single and multi-response, efficiency criteria are commonly used by hydrologists to provide an objective assessment of the "closeness" of the simulated behaviour to the observed measurements. While there are a few efficiency criteria such as the Nash-Sutcliffe efficiency, coefficient of determination, and index of agreement that are frequently used in hydrologic modeling studies and reported in the literature, there are a large number of other efficiency criteria to choose from. The selection and use of specific efficiency criteria and the interpretation of the results can be a challenge for even the most experienced hydrologist since each criterion may place different emphasis on different types of simulated and observed behaviours. In this paper, the utility of several efficiency criteria is investigated in three examples using a simple observed streamflow hydrograph.

2,375 citations


Cites methods from "On the validation of models"

  • ...The index of agreement d was proposed by Willmot (1981) to overcome the insensitivity of E and r2 to differences in the observed and predicted means and variances (Legates and McCabe, 1999)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a set of difference measures are used to evaluate the operational performance of a wide spectrum of geophysical models, regardless of whether the model predictions are manifested as scalars, directions, or vectors.
Abstract: Procedures that may be used to evaluate the operational performance of a wide spectrum of geophysical models are introduced. Primarily using a complementary set of difference measures, both model accuracy and precision can be meaningfully estimated, regardless of whether the model predictions are manifested as scalars, directions, or vectors. It is additionally suggested that the reliability of the accuracy and precision measures can be determined from bootstrap estimates of confidence and significance. Recommended procedures are illustrated with a comparative evaluation of two models that estimate wind velocity over the South Atlantic Bight.

1,832 citations

References
More filters
Journal ArticleDOI
TL;DR: Box plots as mentioned in this paper display batches of data and use five values from a set of data: the extremes, the upper and lower hinges (quartiles), and the median, commonly used for exploratory data analysis and in preparing visual summaries.
Abstract: Box plots display batches of data. Five values from a set of data are conventionally used; the extremes, the upper and lower hinges (quartiles), and the median. Such plots are becoming a widely used tool in exploratory data analysis and in preparing visual summaries for statisticians and nonstatisticians alike. Three variants of the basic display, devised by the authors, are described. The first visually incorporates a measure of group size; the second incorporates an indication of rough significance of differences between medians; the third combines the features of the first two. These techniques are displayed by examples.

2,234 citations

Journal ArticleDOI
TL;DR: In this paper, an equation that predicts downcoming long-wave radiation from clear skies on the basis of screen air temperature and vapor pressure has been developed, which fits measured data at extremes of temperature and humidity better than existing equations, and exhibits comparable accuracy at intermediate values.
Abstract: An equation that predicts downcoming long-wave radiation from clear skies on the basis of screen air temperature and vapor pressure has been developed. It fits measured data at extremes of temperature and humidity better than existing equations, and exhibits comparable accuracy at intermediate values.

181 citations

Journal ArticleDOI
TL;DR: In this paper, an empirical method for interpolating monthly precipitation totals within California is described and evaluated using 120 monthly precipitation total observed from 1961-1970 at each of 90 randomly selected stations in California and a P-mode principal components analysis of a co-variance matrix, four independent sources of precipitation variability were identified and quantitatively paraphrased.
Abstract: An empirical method for interpolating monthly precipitation totals within California is described and evaluated. Using 120 monthly precipitation totals observed from 1961-1970 at each of 90 randomly selected stations in California and a P-mode principal components analysis of a co-variance matrix, four independent sources of precipitation variability were identified and quantitatively paraphrased. The four principal components were then linked to three representative stations by polynomial regression. From these relationships, monthly precipitation totals can be interpolated anywhere in the state by reversing the principal components computations. The required input includes: a monthly precipitation total, for the month of interest, from each of the three representative stations as well as isarithmically interpolated estimates of the component loadings and station means which were derived from the initial (1961-1970) data set. A major asset of the procedure is that it only requires three pieces of new inf...

175 citations

Journal ArticleDOI
TL;DR: In this article, a numerical model based on standard meteorological data, as described above, can be used to extend the data base for design purposes, and the model is used to compute the amount of direct solar flux.

148 citations

Journal ArticleDOI
TL;DR: In this paper, the Pearson product-moment correlation coefficient is adjusted by using a ratio of the standard deviations of the two hydrographs being compared, and the resulting index is numerically the same as the regression of the hydrogrogram with the smaller variance on the hydrogram having the larger variance.
Abstract: Existing goodness of fit indices are inadequate for comparing model-generated hydrographs with measured hydrographs. The Pearson product-moment correlation coefficient, which is the most frequently used index of regeneration, is inadequate because it is insensitive to differences in the size of the two hydrographs. A better coefficient of regeneration is obtained if Pearson's coefficient is adjusted by using a ratio of the standard deviations of the two hydrographs being compared. The resulting index is numerically the same as the regression of the hydrograph with the smaller variance on the hydrograph having the larger variance. Examples are used to illustrate the superiority of the modified correlation index.

77 citations