scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Agricultural Biological and Environmental Statistics in 2022"




Journal ArticleDOI
TL;DR: In this article , a multi-level spatio-temporal model is proposed to predict the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain.
Abstract: Abstract Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatiotemporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captures temporal dependence through autoregression on days within year and also on years. Spatial dependence is captured through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. The model is expressed in a form which separates fixed effects from random effects and also separates space, years, and days for each type of effect. Motivated by exploratory data analysis, fixed effects to capture the influence of elevation, seasonality, and a linear trend are employed. Pure errors are introduced for years, for locations within years, and for locations at days within years. The performance of the model is checked using a leave-one-out cross-validation. Applications of the model are presented including prediction of the daily temperature series at unobserved or partially observed sites and inference to investigate climate change comparison. Supplementary materials accompanying this paper appear online.

6 citations


Journal ArticleDOI
TL;DR: A Bayesian data-driven approach to nonlinear dynamic equation discovery is presented, which can accommodate measurement noise and missing data, which are common in complex nonlinear systems, and accounts for model parameter uncertainty.

5 citations


Journal ArticleDOI
TL;DR: In this article , a time-varying functional principal components analysis (FPCA) for non-stationary functional time series (FTS) is proposed to investigate how the variability and auto-covariance structures in a FTS change over time.
Abstract: Abstract Outgassing of carbon dioxide (CO $$_2$$ 2 ) from river surface waters, estimated using partial pressure of dissolved CO $$_2$$ 2 , has recently been considered an important component of the global carbon budget. However, little is still known about the high-frequency dynamics of CO $$_2$$ 2 emissions in small-order rivers and streams. To analyse such highly dynamic systems, we propose a time-varying functional principal components analysis (FPCA) for non-stationary functional time series (FTS). This time-varying FPCA is performed in the frequency domain to investigate how the variability and auto-covariance structures in a FTS change over time. This methodology, and the associated proposed inference, enables investigation of the changes over time in the variability structure of the diurnal profiles of the partial pressure of CO $$_2$$ 2 and identification of the drivers of those changes. By means of a simulation study, the performance of the time-varying dynamic FPCs is investigated under different scenarios of complete and incomplete FTS. Although the time-varying dynamic FPCA has been applied here to study the daily processes of consuming and producing CO $$_2$$ 2 in a small catchment of the river Dee in Scotland, this methodology can be applied more generally to any dynamic time series.Supplementary materials accompanying this paper appear online.

3 citations



Journal ArticleDOI
TL;DR: In this paper , the authors propose to calculate unexplained variations conditional on individual random and/or fixed effects so as to keep individual heterogeneity brought by available predictors, which can be defined for a generalized linear mixed model using a distance measured along its variance function, accounting for its heteroscedasticity.
Abstract: The coefficient of determination is well defined for linear models and its extension is long wanted for mixed-effects models in agricultural, biological, and ecological research. We revisit its extension to define measures for proportions of variation explained by the whole model, fixed effects only, and random effects only. We propose to calculate unexplained variations conditional on individual random and/or fixed effects so as to keep individual heterogeneity brought by available predictors. While these measures were naturally defined for linear mixed models, they can be defined for a generalized linear mixed model using a distance measured along its variance function, accounting for its heteroscedasticity. We demonstrate the promising performance and utility of our proposed methods via simulation studies as well as applications to real data sets in agricultural and ecological studies.

2 citations


Journal ArticleDOI
TL;DR: In this paper , the authors use Hidden Semi-Markov Models (HSMMs) and autoregressive HSMMs to classify accelerometer data from Merino sheep to distinguish between four different behaviors of interest.
Abstract: Hidden Markov models (HMMs) and their extensions have proven to be powerful tools for classification of observations that stem from systems with temporal dependence as they take into account that observations close in time are likely generated from the same state (i.e., class). When information on the classes of the observations is available in advanced, supervised methods can be applied. In this paper, we provide details for the implementation of four models for classification in a supervised learning context: HMMs, hidden semi-Markov models (HSMMs), autoregressive-HMMs, and autoregressive-HSMMs. Using simulations, we study the classification performance under various degrees of model misspecification to characterize when it would be important to extend a basic HMM to an HSMM. As an application of these techniques we use the models to classify accelerometer data from Merino sheep to distinguish between four different behaviors of interest. In particular in the field of movement ecology, collection of fine-scale animal movement data over time to identify behavioral states has become ubiquitous, necessitating models that can account for the dependence structure in the data. We demonstrate that when the aim is to conduct classification, various degrees of model misspecification of the proposed model may not impede good classification performance unless there is high overlap between the state-dependent distributions, that is, unless the observation distributions of the different states are difficult to differentiate. Supplementary materials accompanying this paper appear on-line.

2 citations



Journal ArticleDOI
Hanfei Peng1
TL;DR: In this article , a hierarchical Bayesian spatial model for extreme precipitations is proposed to compute intensity-duration-frequency (IDF) curves on a large, sparse domain, where a reconstruction of the historical meteorology is used as a covariate.
Abstract: Abstract An intensity–duration–frequency (IDF) curve describes the relationship between rainfall intensity and duration for a given return period and location. Such curves are obtained through frequency analysis of rainfall data and commonly used in infrastructure design, flood protection, water management, and urban drainage systems. However, they are typically available only in sparse locations. Data for other sites must be interpolated as the need arises. This paper describes how extreme precipitation of several durations can be interpolated to compute IDF curves on a large, sparse domain. In the absence of local data, a reconstruction of the historical meteorology is used as a covariate for interpolating extreme precipitation characteristics. This covariate is included in a hierarchical Bayesian spatial model for extreme precipitations. This model is especially well suited for a covariate gridded structure, thereby enabling fast and precise computations. As an illustration, the methodology is used to construct IDF curves over Eastern Canada. An extensive cross-validation study shows that at locations where data are available, the proposed method generally improves on the current practice of Environment and Climate Change Canada which relies on a moment-based fit of the Gumbel extreme-value distribution.

2 citations


Journal ArticleDOI
TL;DR: In this paper , the authors developed a spatiotemporal model to estimate the association between exposure to fine particulate matter PM2.5 and mortality accounting for several social and environmental factors.
Abstract: The world is experiencing a pandemic due to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), also known as COVID-19. The USA is also suffering from a catastrophic death toll from COVID-19. Several studies are providing preliminary evidence that short- and long-term exposure to air pollution might increase the severity of COVID-19 outcomes, including a higher risk of death. In this study, we develop a spatiotemporal model to estimate the association between exposure to fine particulate matter PM2.5 and mortality accounting for several social and environmental factors. More specifically, we implement a Bayesian zero-inflated negative binomial regression model with random effects that vary in time and space. Our goal is to estimate the association between air pollution and mortality accounting for the spatiotemporal variability that remained unexplained by the measured confounders. We applied our model to four regions of the USA with weekly data available for each county within each region. We analyze the data separately for each region because each region shows a different disease spread pattern. We found a positive association between long-term exposure to PM2.5 and the mortality from the COVID-19 disease for all four regions with three of four being statistically significant. Data and code are available at our GitHub repository. Supplementary materials accompanying this paper appear on-line.The online version contains supplementary material available at 10.1007/s13253-022-00487-1.

Journal ArticleDOI
TL;DR: In this article , the authors focus on optimizing the precision of disease prevalence estimators calculated from multiplex pooled testing data, and determine the pooling strategies that offer the highest benefits when jointly estimating the prevalence of multiple diseases, such as theileriosis and anaplasmosis.
Abstract: Pooled testing can enhance the efficiency of diagnosing individuals with diseases of low prevalence. Often, pooling is implemented using standard groupings (2, 5, 10, etc.). On the other hand, optimization theory can provide specific guidelines in finding the ideal pool size and pooling strategy. This article focuses on optimizing the precision of disease prevalence estimators calculated from multiplex pooled testing data. In the context of a surveillance application of animal diseases, we study the estimation efficiency (i.e., precision) and cost efficiency of the estimators with adjustments for the number of expended tests. This enables us to determine the pooling strategies that offer the highest benefits when jointly estimating the prevalence of multiple diseases, such as theileriosis and anaplasmosis. The outcomes of our work can be used in designing pooled testing protocols, not only in simple pooling scenarios but also in more complex scenarios where individual retesting is performed in order to identify positive cases. A software application using the shiny package in R is provided with this article to facilitate implementation of our methods. Supplementary materials accompanying this paper appear online. Supplementary materials for this article are available at 10.1007/s13253-022-00511-4.

Journal ArticleDOI
Babette Ludowici1
TL;DR: In this paper , a causal mediation model that accommodates longitudinal mediators on arbitrary time grids and survival outcomes simultaneously is proposed to investigate the causal pathways between an exposure and outcome and a mediator that lies in between.
Abstract: In animal behavior studies, a common goal is to investigate the causal pathways between an exposure and outcome, and a mediator that lies in between. Causal mediation analysis provides a principled approach for such studies. Although many applications involve longitudinal data, the existing causal mediation models are not directly applicable to settings where the mediators are measured on irregular time grids. In this paper, we propose a causal mediation model that accommodates longitudinal mediators on arbitrary time grids and survival outcomes simultaneously. We take a functional data analysis perspective and view longitudinal mediators as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. We employ a functional principal component analysis approach to estimate the mediator process and propose a Cox hazard model for the survival outcome that flexibly adjusts the mediator process. We then derive a g-computation formula to express the causal estimands using the model coefficients. The proposed method is applied to a longitudinal data set from the Amboseli Baboon Research Project to investigate the causal relationships between early adversity, adult physiological stress responses, and survival among wild female baboons. We find that adversity experienced in early life has a significant direct effect on females’ life expectancy and survival probability, but find little evidence that these effects were mediated by markers of the stress response in adulthood. We further developed a sensitivity analysis method to assess the impact of potential violation to the key assumption of sequential ignorability. Supplementary materials accompanying this paper appear on-line.


Journal ArticleDOI
TL;DR: In this article , the authors investigate the impacts of these sets of regulations on the frequency of industrial forestry-caused (IDF) wildland fires in the province of Ontario, Canada.
Abstract: Abstract Wildland fire prevention and mitigation is of mutual interest to both government and the forest industry. In 1989, the Ontario Ministry of Natural Resources and Forestry introduced the Woods Modification Guidelines that provided rules on how forestry operations should be modified based on local fire danger conditions. Those guidelines were replaced by the Modifying Industrial Operations Protocol (MIOP) in 2008. One objective of MIOP is to allow forestry operations to be done safely for as long as possible as the fire danger increases. We investigate the impacts of these sets of regulations on the frequency of industrial forestry-caused (IDF) wildland fires in the province of Ontario, Canada. Data from 1976 to 2019 are analyzed. A case-crossover study finds no evidence to suggest that MIOP’s greater flexibility in operating hours has increased the probability of IDF fire occurrences. This result indicates that MIOP’s regulations have had the desired effect of allowing longer working hours on days with heightened fire risk without adding to the seasonal wildland fire load.


Journal ArticleDOI
TL;DR: In this paper , a spatially varying mixture model was developed to compare the distribution of precipitation in the High Mountain Asia region as produced by climate models with the corresponding distribution from in situ observations from the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) data product.
Abstract: The high mountain regions of Asia contain more glacial ice than anywhere on the planet outside of the polar regions. Because of the large population living in the Indus watershed region who are reliant on melt from these glaciers for fresh water, understanding the factors that affect glacial melt along with the impacts of climate change on the region is important for managing these natural resources. While there are multiple climate data products (e.g., reanalysis and global climate models) available to study the impact of climate change on this region, each product will have a different amount of skill in projecting a given climate variable, such as precipitation. In this research, we develop a spatially varying mixture model to compare the distribution of precipitation in the High Mountain Asia region as produced by climate models with the corresponding distribution from in situ observations from the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) data product. Parameter estimation is carried out via a computationally efficient Markov chain Monte Carlo algorithm. Each of the estimated climate distributions from each climate data product is then validated against APHRODITE using a spatially varying Kullback-Leibler divergence measure. Supplementary materials accompanying this paper appear online. Supplementary materials for this article are available at 10.1007/s13253-022-00515-0.

Journal ArticleDOI
TL;DR: In this paper , it is shown that the information the data have about the correlation parameters varies substantially depending on the true model and sampling design and, in particular, the information about the smoothness parameter can be large, in some cases larger than the information of the range parameter.
Abstract: The Matern family of covariance functions is currently the most commonly used for the analysis of geostatistical data due to its ability to describe different smoothness behaviors. Yet, in many applications the smoothness parameter is set at an arbitrary value. This practice is due partly to computational challenges faced when attempting to estimate all covariance parameters and partly to unqualified claims in the literature stating that geostatistical data have little or no information about the smoothness parameter. This work critically investigates this claim and shows it is not true in general. Specifically, it is shown that the information the data have about the correlation parameters varies substantially depending on the true model and sampling design and, in particular, the information about the smoothness parameter can be large, in some cases larger than the information about the range parameter. In light of these findings, we suggest to reassess the aforementioned practice and instead establish inferences from data-based estimates of both range and smoothness parameters, especially for strongly dependent non-smooth processes observed on irregular sampling designs. A data set of daily rainfall totals is used to motivate the discussion and gauge this common practice.

Journal ArticleDOI
TL;DR: In this article , a sparse discretized kernel is proposed to reduce the complexity of the movement submodel in open spatial explicit capture-recapture (SECR) models.
Abstract: Abstract Spatially explicit capture–recapture (SECR) models treat detection probability as a function of the distance between each animal and its notional activity centre. Open-population variants of these models (open SECR) are increasingly used to estimate the vital rates (survival and recruitment) of spatial populations subject to turnover between sampling times. If activity centres also move between sampling times then modelling the movement can reduce bias in estimates of vital rates. The usual movement model in open SECR is a random walk with step length governed by a probability kernel. Space is discretized in open SECR for computational convenience, and in some implementations this includes truncation of the probability kernel. Computations for the movement submodel are nevertheless very time-consuming owing to the repeated convolution steps and the need to manage boundary effects. A novel ‘sparse’ discretized kernel is proposed that greatly reduces fitting time. The sparse kernel was tested by simulation and applied to two datasets. Differences between models fitted using the sparse and full kernels were minor and unlikely to matter in practice. The sparse kernel extends the practical limits of the movement modelling in open SECR to greater dispersal distances and greater spatial resolution. Supplementary materials accompanying this paper appear online.



Journal ArticleDOI
TL;DR: In this paper , the authors developed a new diversity-interaction mixed model for jointly assessing many species interactions and within-plot species planting patterns over multiple years, and used a small number of fixed parameters that incorporate spatial effects and supplement this by including all pairwise interaction variables as random effects, each constrained to have the same variance within each year.
Abstract: Abstract In grassland ecosystems, it is well known that increasing plant species diversity can improve ecosystem functions (i.e., ecosystem responses), for example, by increasing productivity and reducing weed invasion. Diversity-Interactions models use species proportions and their interactions as predictors in a regression framework to assess biodiversity and ecosystem function relationships. However, it can be difficult to model numerous interactions if there are many species, and interactions may be temporally variable or dependent on spatial planting patterns. We developed a new Diversity-Interactions mixed model for jointly assessing many species interactions and within-plot species planting pattern over multiple years. We model pairwise interactions using a small number of fixed parameters that incorporate spatial effects and supplement this by including all pairwise interaction variables as random effects, each constrained to have the same variance within each year. The random effects are indexed by pairs of species within plots rather than a plot-level factor as is typical in mixed models, and capture remaining variation due to pairwise species interactions parsimoniously. We apply our novel methodology to three years of weed invasion data from a 16-species grassland experiment that manipulated plant species diversity and spatial planting pattern and test its statistical properties in a simulation study.Supplementary materials accompanying this paper appear online.


Journal ArticleDOI
TL;DR: In this paper , the authors presented a new methodology to model total abundance by merging count data information from surveys with different sampling protocols, which can be used for producing biogeographical abundance maps based on the transboundary information available covering more than one country.
Abstract: Abstract Quantifying the total number of individuals (abundance) of species is the basis for spatial ecology and biodiversity conservation. Abundance data are mostly collected through professional surveys as part of monitoring programs, often at a national level. These surveys rarely follow exactly the same sampling protocol in different countries, which represents a challenge for producing biogeographical abundance maps based on the transboundary information available covering more than one country. Moreover, not all species are properly covered by a single monitoring scheme, and countries typically collect abundance data for target species through different monitoring schemes. We present a new methodology to model total abundance by merging count data information from surveys with different sampling protocols. The proposed methods are used for data from national breeding bird monitoring programs in Norway and Sweden. Each census collects abundance data following two different sampling protocols in each country, i.e., these protocols provide data from four different sampling processes. The modeling framework assumes a common Gaussian Random Field shared by both the observed and true abundance with either a linear or a relaxed linear association between them. The models account for particularities of each sampling protocol by including terms that affect each observation process, i.e., accounting for differences in observation units and detectability. Bayesian inference is performed using the Integrated Nested Laplace Approximation (INLA) and the Stochastic Partial Differential Equation (SPDE) approach for spatial modeling. We also present the results of a simulation study based on the empirical census data from mid-Scandinavia to assess the performance of the models under model misspecification. Finally, maps of the expected abundance of birds in our study region in mid-Scandinavia are presented with uncertainty estimates. We found that the framework allows for consistent integration of data from surveys with different sampling protocols. Further, the simulation study showed that models with a relaxed linear specification are less sensitive to misspecification, compared to the model that assumes linear association between counts. Relaxed linear specifications of total bird abundance in mid-Scandinavia improved both goodness of fit and the predictive performance of the models.

Journal ArticleDOI
TL;DR: In this article , a multistage method for making inference at all levels of a Bayesian hierarchical model (BHM) using natural data partitions to increase efficiency by allowing computations to take place in parallel form using software that is most appropriate for each data partition.
Abstract: Abstract We propose a multistage method for making inference at all levels of a Bayesian hierarchical model (BHM) using natural data partitions to increase efficiency by allowing computations to take place in parallel form using software that is most appropriate for each data partition. The full hierarchical model is then approximated by the product of independent normal distributions for the data component of the model. In the second stage, the Bayesian maximum a posteriori (MAP) estimator is found by maximizing the approximated posterior density with respect to the parameters. If the parameters of the model can be represented as normally distributed random effects, then the second-stage optimization is equivalent to fitting a multivariate normal linear mixed model. We consider a third stage that updates the estimates of distinct parameters for each data partition based on the results of the second stage. The method is demonstrated with two ecological data sets and models, a generalized linear mixed effects model (GLMM) and an integrated population model (IPM). The multistage results were compared to estimates from models fit in single stages to the entire data set. In both cases, multistage results were very similar to a full MCMC analysis. Supplementary materials accompanying this paper appear online.


Journal ArticleDOI
TL;DR: In this paper , a new sampling strategy that incorporates ranking information from nearby locations into a spatially balanced sample is proposed, which can improve the precision of commonly used estimators when surveying natural resources.
Abstract: Abstract A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with good precision. Spatially balanced designs draw samples with good spatial spread and provide precise results for commonly used estimators when surveying natural resources. In this article, we propose a new sampling strategy that incorporates ranking information from nearby locations into a spatially balanced sample. If the population exhibits spatial trends, our simple local ranking strategy can improve the precision of commonly used estimators. Numerical results on several test populations with different spatial structures show that local ranking can improve the performance of a spatially balanced design. To show that local ranking is simple and effective in practice, we provide an example application for the health and productivity assessment of a Shiraz vineyard in South Australia. Supplementary materials accompanying this paper appear online.