scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Improving uncertainty estimation in urban hydrological modeling by statistically describing bias

TL;DR: A structured approach to select, among five variants, the optimal bias de- scription for a given urban or natural case study and results clearly show that flow simulations are much more reliable when bias is accounted for than when it is neglected.
Abstract: . Hydrodynamic models are useful tools for urban water management. Unfortunately, it is still challenging to obtain accurate results and plausible uncertainty estimates when using these models. In particular, with the currently applied statistical techniques, flow predictions are usually overconfident and biased. In this study, we present a flexible and relatively efficient methodology (i) to obtain more reliable hydrological simulations in terms of coverage of validation data by the uncertainty bands and (ii) to separate prediction uncertainty into its components. Our approach acknowledges that urban drainage predictions are biased. This is mostly due to input errors and structural deficits of the model. We address this issue by describing model bias in a Bayesian framework. The bias becomes an autoregressive term additional to white measurement noise, the only error type accounted for in traditional uncertainty analysis. To allow for bigger discrepancies during wet weather, we make the variance of bias dependent on the input (rainfall) or/and output (runoff) of the system. Specifically, we present a structured approach to select, among five variants, the optimal bias description for a given urban or natural case study. We tested the methodology in a small monitored stormwater system described with a parsimonious model. Our results clearly show that flow simulations are much more reliable when bias is accounted for than when it is neglected. Furthermore, our probabilistic predictions can discriminate between three uncertainty contributions: parametric uncertainty, bias, and measurement errors. In our case study, the best performing bias description is the output-dependent bias using a log-sinh transformation of data and model results. The limitations of the framework presented are some ambiguity due to the subjective choice of priors for bias parameters and its inability to address the causes of model discrepancies. Further research should focus on quantifying and reducing the causes of bias by improving the model structure and propagating input uncertainty.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The findings show that data-driven UWM allows us to develop and apply novel methods, to optimize the efficiency of the current network-based approach, and to extend functionality of today's systems.
Abstract: The promise of collecting and utilizing large amounts of data has never been greater in the history of urban water management (UWM). This paper reviews several data-driven approaches which play a key role in bringing forward a sea change. It critically investigates whether data-driven UWM offers a promising foundation for addressing current challenges and supporting fundamental changes in UWM. We discuss the examples of better rain-data management, urban pluvial flood-risk management and forecasting, drinking water and sewer network operation and management, integrated design and management, increasing water productivity, wastewater-based epidemiology and on-site water and wastewater treatment. The accumulated evidence from literature points toward a future UWM that offers significant potential benefits thanks to increased collection and utilization of data. The findings show that data-driven UWM allows us to develop and apply novel methods, to optimize the efficiency of the current network-based approach...

165 citations


Cites background from "Improving uncertainty estimation in..."

  • ...A Bayesian framework also allows this reduction to be expressed formally.(74) Many studies have also shown clear increase in the performance of hydrological models with the increase in the layout detail(75) and the quality of input data....

    [...]

Journal ArticleDOI
TL;DR: Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking, and that bias‐free numerical methods should be preferred over ICs whenever computationally feasible.
Abstract: Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.

124 citations

Journal ArticleDOI
TL;DR: In this paper, the authors focus on approaches for representing error heteroscedasticity with respect to simulated streamflow, i.e., the pattern of larger errors in higher streamflow predictions.
Abstract: Reliable and precise probabilistic prediction of daily catchment-scale streamflow requires statistical characterization of residual errors of hydrological models. This study focuses on approaches for representing error heteroscedasticity with respect to simulated streamflow, i.e., the pattern of larger errors in higher streamflow predictions. We evaluate 8 common residual error schemes, including standard and weighted least squares, the Box-Cox transformation (with fixed and calibrated power parameter λ) and the log-sinh transformation. Case studies include 17 perennial and 6 ephemeral catchments in Australia and USA, and two lumped hydrological models. Performance is quantified using predictive reliability, precision and volumetric bias metrics. We find the choice of heteroscedastic error modelling approach significantly impacts on predictive performance, though no single scheme simultaneously optimizes all performance metrics. The set of Pareto optimal schemes, reflecting performance trade-offs, comprises Box-Cox schemes with λ of 0.2 and 0.5, and the log scheme (λ=0, perennial catchments only). These schemes significantly outperform even the average-performing remaining schemes (e.g., across ephemeral catchments, median precision tightens from 105% to 40% of observed streamflow, and median biases decrease from 25% to 4%). Theoretical interpretations of empirical results highlight the importance of capturing the skew/kurtosis of raw residuals and reproducing zero flows. Paradoxically, calibration of λ is often counterproductive: in perennial catchments, it tends to overfit low flows at the expense of abysmal precision in high flows. The log-sinh transformation is dominated by the simpler Pareto optimal schemes listed above. Recommendations for researchers and practitioners seeking robust residual error schemes for practical work are provided. This article is protected by copyright. All rights reserved.

113 citations

Journal ArticleDOI

75 citations


Cites background from "Improving uncertainty estimation in..."

  • ...…they cannot well capture the spatial variability of rainfall, which has a significant impact on hydrological systems and thus on runoff modeling, particularly in the case of small urban catchments (Del Giudice et al., 2013; Gires et al., 2012; Ochoa‐Rodriguez et al., 2015; Schellart et al., 2012)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, the principles governing the application of the conceptual model technique to river flow forecasting are discussed and the necessity for a systematic approach to the development and testing of the model is explained and some preliminary ideas suggested.

19,601 citations


"Improving uncertainty estimation in..." refers methods in this paper

  • ...Besides these two criteria, the Nash–Sutcliffe efficiency index (Nash and Sutcliffe, 1970), a metric often used in hydrology, is applied to evaluate goodness of fit of the deterministic model to the data....

    [...]

Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations


"Improving uncertainty estimation in..." refers methods in this paper

  • ...2.2.2 by using a Metropolis–Hastings MCMC algorithm (Hastings, 1970)....

    [...]

Journal ArticleDOI
TL;DR: In this article, Lindley et al. make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's.
Abstract: [Read at a RESEARCH METHODS MEETING of the SOCIETY, April 8th, 1964, Professor D. V. LINDLEY in the Chair] SUMMARY In the analysis of data it is often assumed that observations Yl, Y2, *-, Yn are independently normally distributed with constant variance and with expectations specified by a model linear in a set of parameters 0. In this paper we make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's. Inferences about the transformation and about the parameters of the linear model are made by computing the likelihood function and the relevant posterior distribution. The contributions of normality, homoscedasticity and additivity to the transformation are separated. The relation of the present methods to earlier procedures for finding transformations is discussed. The methods are illustrated with examples.

12,158 citations


"Improving uncertainty estimation in..." refers methods in this paper

  • ...The two variance stabilization techniques which are, in our view, most promising for urban drainage applications are: the Box–Cox transformation (Box and Cox, 1964) and the log-sinh transformation (Wang et al....

    [...]

  • ...The Box–Cox transformation (Box and Cox, 1964) has indeed been successfully used in several case studies, both rural (e.g., Kuczera, 1983; Bates and Campbell, 2001; Yang et al., 2007b, a; Frey et al., 2011; Sikorska et al., 2012) and urban (e.g., Freni et al., 2009b; Dotto et al., 2011; Breinholt…...

    [...]

  • ...The Box–Cox transformation (Box and Cox, 1964) has indeed been successfully used in several case studies, both rural (e....

    [...]

Journal ArticleDOI
TL;DR: A Bayesian calibration technique which improves on this traditional approach in two respects and attempts to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best‐fitting parameter values is presented.
Abstract: We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.

3,745 citations


"Improving uncertainty estimation in..." refers background or methods in this paper

  • ...To address these issues, here we adapt the framework of Kennedy and O’Hagan (2001), as formulated by Reichert and Schuwirth (2012), to assess model bias along with other uncertainty components....

    [...]

  • ...This has been originally suggested in the statistical literature (Craig et al., 2001; Kennedy and O’Hagan, 2001; Higdon et al., 2005; Bayarri et al., 2007) and later adapted to environmental modeling (Reichert and Schuwirth, 2012)....

    [...]

  • ...(6), the covariance in the original formulation by Kennedy and O’Hagan (2001) had an exponent α for the term |ti − tj |....

    [...]

  • ...To additionally separate bias from random measurement errors, Kennedy and O’Hagan (2001), Higdon et al. (2005), Bayarri et al. (2007) and others suggested using a Gaussian stochastic process to describe the knowledge about the bias, plus an independent error term for observation error....

    [...]

  • ...Regarding accounting for difficult-to-reduce input and structural errors responsible for autocorrelated residuals, it has been suggested to describe prior knowledge of model bias by means of a stochastic process and to update this knowledge through conditioning with the data (Craig et al., 2001; Kennedy and O’Hagan, 2001; Higdon et al., 2005; Bayarri et al., 2007)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the mean values of all the powers of the velocity $u$ and the displacement $s$ of a free particle in Brownian motion are calculated and the exact expressions for the square of the deviation of a harmonically bound particle in the Fokker-Planck partial differential equation as a function of the time and the initial deviation are obtained.
Abstract: With a method first indicated by Ornstein the mean values of all the powers of the velocity $u$ and the displacement $s$ of a free particle in Brownian motion are calculated It is shown that $u\ensuremath{-}{u}_{0}\mathrm{exp}(\ensuremath{-}\ensuremath{\beta}t)$ and $s\ensuremath{-}\frac{{u}_{0}}{\ensuremath{\beta}[1\ensuremath{-}\mathrm{exp}(\ensuremath{-}\ensuremath{\beta}t)]}$ where ${u}_{0}$ is the initial velocity and $\ensuremath{\beta}$ the friction coefficient divided by the mass of the particle, follow the normal Gaussian distribution law For $s$ this gives the exact frequency distribution corresponding to the exact formula for ${s}^{2}$ of Ornstein and F\"urth Discussion is given of the connection with the Fokker-Planck partial differential equation By the same method exact expressions are obtained for the square of the deviation of a harmonically bound particle in Brownian motion as a function of the time and the initial deviation Here the periodic, aperiodic and overdamped cases have to be treated separately In the last case, when $\ensuremath{\beta}$ is much larger than the frequency and for values of $t\ensuremath{\gg}{\ensuremath{\beta}}^{\ensuremath{-}1}$, the formula takes the form of that previously given by Smoluchowski

3,394 citations


"Improving uncertainty estimation in..." refers methods in this paper

  • ...The simplest bias formulation is a mean-reverting OU process (Uhlenbeck and Ornstein, 1930), the discretization of which would be a first-order autoregressive process (AR(1)) with Gaussian iid noise....

    [...]

  • ...Constant bias The simplest bias formulation is a mean-reverting OU process (Uhlenbeck and Ornstein, 1930), the discretization of which would be a first-order autoregressive process (AR(1)) with Gaussian iid noise....

    [...]