scispace - formally typeset
Search or ask a question

Methodology for fitting and updating predictive accident models with trend [forthcoming]

01 Jan 2013-
TL;DR: This paper addresses a number of methodological issues that arise in seeking practical and efficient ways to update PAMs, whether by re-calibration or byRe-fitting, including the choice of distributional assumption for overdispersion, and considerations about the most efficient and convenient ways to fit the required models.
Abstract: Reliable predictive accident models (PAMs) have a variety of important uses in traffic safety research and practice They are used to help identify sites in need of remedial treatment, in the design of transport schemes to assess safety implications, and to estimate the effectiveness of remedial treatments The PAMs currently in use in the UK are now quite old; the data used in their development was gathered up to 30 years ago Many changes have occurred over that period in road and vehicle design, in road safety campaigns and legislation, and the national accident rate has fallen substantially It seems unlikely that these aging models can be relied upon to provide accurate and reliable predictions of accident frequencies on the roads today This paper addresses a number of methodological issues that arise in seeking practical and efficient ways to update PAMs Models for accidents on rural single carriageway roads have been chosen to illustrate these issues, including the choice of distributional assumption for overdispersion, the choice of goodness of fit measures, questions of independence between observations in different years, and between links on the same scheme, the estimation of trends in the models, the uncertainty of predictions, as well as considerations about the most efficient and convenient ways to fit the required models, given the considerable advances that have been seen in statistical computing software in recent years
Citations
More filters
01 Jan 2016
TL;DR: Sample-size guidelines were prepared based on the coefficient of variation of the crash data that are needed for the calibration process and they can be used for all facility types and both for segment and intersection prediction models.
Abstract: The Highway Safety Manual (HSM) prediction models are fitted and validated based on the crash data collected from a selected number of states in the United States. Therefore, for a jurisdiction to be able to fully benefit from applying these models, it is necessary to calibrate them to local conditions. The first edition of the HSM recommends calibrating the models using a one size fits-all sample-size of 30 to 50 locations with total of at least 100 crashes per year. However, the HSM recommendation is not fully supported by documented studies. The objectives of this paper are consequently to: 1) examine the required sample size based on the characteristics of the data that will be used for the recalibration process; and, 2) propose revised guidelines. The objectives were accomplished using simulation runs for different scenarios that characterized the sample mean and variance of the data. The simulation results indicate that as the ratio of the standard deviation to the mean (i.e., coefficient of variation) of the crash data increases, a larger sample-size is warranted to fulfil certain levels of accuracies. Taking this observation into account, sample-size guidelines were prepared based on the coefficient of variation of the crash data that are needed for the recalibration process. The guidelines were then successfully applied to the two observed datasets. The proposed guidelines can be used for all facility types and both for segment and intersection prediction models.

39 citations


Cites background from "Methodology for fitting and updatin..."

  • ...Connors et al. (2013) documented several methodological issues that arise when updating predictive models initially developed in England through both the scalar calibration and the re-fitting of the models....

    [...]

  • ...Wood et al. (2013) analyzed two issues regarding the updating of the same crash prediction models addressed in the Connors et al. (2013) study....

    [...]

01 Jan 2016
TL;DR: Two popular techniques from the two approaches are compared: negative binomial models for the parametric approach and kernel regression for the nonparametric counterpart, and it is shown that the kernel regression method outperforms the model-based approach for predictive performance, and that performance advantage increases noticeably as data available for calibration grow.
Abstract: Crash data for road safety analysis and modeling are growing steadily in size and completeness due to latest advancement in information technologies. This increased availability of large datasets has generated resurgent interest in applying data-driven nonparametric approach as an alternative to the traditional parametric models for crash risk prediction. This paper investigates the question of how the relative performance of these two alternative approaches changes as crash data grows. The authors focus on comparing two popular techniques from the two approaches: negative binomial models (NB) for the parametric approach and kernel regression (KR) for the nonparametric counterpart. Using two large crash datasets, the authors investigate the performance of these two methods as a function of the amount of training data. Through a rigorous bootstrapping validation process, the study found that the two approaches exhibit strikingly different patterns, especially, in terms of sensitivity to data size. The kernel regression method outperforms the model based approach – NB in terms of predictive performance and that performance advantage increases noticeably as data available for calibration grows. With the arrival of the Big Data era and the added benefits of enabling automated road safety analysis and improved responsiveness to latest safety issues, nonparametric techniques (especially those of modern machine approaches) could be included as one of the important tools for road safety studies.

10 citations


Cites methods from "Methodology for fitting and updatin..."

  • ...However, the performance of the Poisson model is limited by its single-parameter form; therefore, it was extended to various other forms, including Poisson–gamma, also known as the NB model (1, 4, 13–19); the generalized NB model (14, 17, 20, 21); the Poisson–Weibull model (22, 23); and the Poisson–lognormal model (21, 24)....

    [...]

Graham Wood1
01 Jan 2004
TL;DR: In this article, the authors describe how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals can be produced using spreadsheet technology, which can be used for estimating the number of accidents at a new site with given flows.
Abstract: Generalised linear models, with "log" link and either Poisson or negative binomial errors, are commonly used for relating accident rates to explanatory variables. This paper adds to the toolkit for such models. It describes how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals (for example, for the number of accidents at a new site with given flows) can be produced using spreadsheet technology.

8 citations

Journal ArticleDOI
06 Apr 2022
TL;DR: In this article , the authors compare the performance of a scalar calibration factor and a calibration function for different ranges of data characteristics (i.e., sample mean and variance) as well as the sample size.
Abstract: Abstract The Highway Safety Manual (HSM) recommends calibrating Safety Performance Functions using a scalar calibration factor. Recently, a few studies explored the merits of estimating a calibration function instead of a calibration factor. Although it seems a promising approach, it is not clear when a calibration function should be preferred over a scalar calibration factor. On the one hand estimating a scalar factor is easier than estimating a calibration function; on the other hand, the calibration results may improve using a calibration function. This study performs a simulation study to compare the two calibration strategies for different ranges of data characteristics (i.e.: sample mean and variance) as well as the sample size. A measure of prediction accuracy is used to compare the two methods. The results show that as the sample size increases, or variation of data decreases, the calibration function performs better than the scalar calibration factor. If the analyst can collect a sample of at least 150 locations, calibration function is recommended over the scalar factor. If the HSM recommendation of 30-50 locations is used and the analyst desires a better accuracy, calibration function is recommended only if the coefficient of variation of data is less than 2. Otherwise, calibration factor yields better results.

1 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Journal ArticleDOI
01 May 1972
TL;DR: In this paper, the authors used iterative weighted linear regression to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation.
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series A (General). SUMMARY The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log-likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components). The implications of the approach in designing statistics courses are discussed.

8,793 citations

Journal ArticleDOI
TL;DR: How and why various modern computing concepts, such as object-orientation and run-time linking, feature in the software's design are discussed and how the framework may be extended.
Abstract: WinBUGS is a fully extensible modular framework for constructing and analysing Bayesian full probability models. Models may be specified either textually via the BUGS language or pictorially using a graphical interface called DoodleBUGS. WinBUGS processes the model specification and constructs an object-oriented representation of the model. The software offers a user-interface, based on dialogue boxes and menu commands, through which the model may then be analysed using Markov chain Monte Carlo techniques. In this paper we discuss how and why various modern computing concepts, such as object-orientation and run-time linking, feature in the software's design. We also discuss how the framework may be extended. It is possible to write specific applications that form an apparently seamless interface with WinBUGS for users with specialized requirements. It is also possible to interface with WinBUGS at a lower level by incorporating new object types that may be used by WinBUGS without knowledge of the modules in which they are implemented. Neither of these types of extension require access to, or even recompilation of, the WinBUGS source-code.

5,620 citations


"Methodology for fitting and updatin..." refers methods in this paper

  • ...These models were all fitted using MCMC methods, in WinBUGS (Lunn et al, 2000), using a burn-in stage of 5000 iterations, followed by 25000 further iterations to collect the statistics on the parameters....

    [...]

  • ...The advent of MCMC methods implemented in open-source packages such WinBUGS (Lunn et al, 2000) has vastly widened the range of possible model models that can be fitted, and in a relatively simple manner....

    [...]

Journal ArticleDOI
TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Abstract: Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.

4,164 citations


"Methodology for fitting and updatin..." refers methods in this paper

  • ...One recently-developed approach that removes the problem of excessive run-times in Bayesian MCMC methods for some classes of model formulation is that known as INLA (Integrated Nested Laplace Approximations) – see Rue et al (2009)....

    [...]

Journal ArticleDOI
TL;DR: In the absence of detailed driving data that would help improve the identification of cause and effect relationships with individual vehicle crashes, most researchers have addressed this problem by framing it in terms of understanding the factors that affect the frequency of crashes -the number of crashes occurring in some geographical space (usually a roadway segment or intersection) over some specified time period as mentioned in this paper.
Abstract: Gaining a better understanding of the factors that affect the likelihood of a vehicle crash has been an area of research focus for many decades. However, in the absence of detailed driving data that would help improve the identification of cause and effect relationships with individual vehicle crashes, most researchers have addressed this problem by framing it in terms of understanding the factors that affect the frequency of crashes - the number of crashes occurring in some geographical space (usually a roadway segment or intersection) over some specified time period. This paper provides a detailed review of the key issues associated with crash-frequency data as well as the strengths and weaknesses of the various methodological approaches that researchers have used to address these problems. While the steady march of methodological innovation (including recent applications of random parameter and finite mixture models) has substantially improved our understanding of the factors that affect crash-frequencies, it is the prospect of combining evolving methodologies with far more detailed vehicle crash data that holds the greatest promise for the future.

1,483 citations