scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Agricultural Biological and Environmental Statistics in 2013"


Journal ArticleDOI
TL;DR: In this article, the authors propose an approach to the analysis of multi-species data when the goal is to understand how each species responds to their environment, using a finite mixture of regression models, grouping species into archetypes according to their environmental response.
Abstract: Understanding how species distributions respond as a function of environmental gradients is a key question in ecology, and will benefit from a multi-species approach. Multi-species data are often high dimensional, in that the number of species sampled is often large relative to the number of sites, and are commonly quantified as either presence–absence, counts of individuals, or biomass of each species. In this paper, we propose a novel approach to the analysis of multi-species data when the goal is to understand how each species responds to their environment. We use a finite mixture of regression models, grouping species into “Archetypes” according to their environmental response, thereby significantly reducing the dimension of the regression model. Previous research introduced such Species Archetype Models (SAMs), but only for binary assemblage data. Here, we extend this basic framework with three key innovations: (1) the method is expanded to handle count and biomass data, (2) we propose grouping on the slope coefficients only, whilst the intercept terms and nuisance parameters remain species-specific, and (3) we develop model diagnostic tools for SAMs. By grouping on environmental responses only, the model allows for inter-species variation in terms of overall prevalence and abundance. The application of our expanded SAM framework data is illustrated on marine survey data and through simulation.

57 citations


Journal ArticleDOI
TL;DR: In this article, a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification is proposed.
Abstract: Compositional data analysis considers vectors of nonnegative-valued variables subject to a unit-sum constraint. Our interest lies in spatial compositional data, in particular, land use/land cover (LULC) data in the northeastern United States. Here, the observations are vectors providing the proportions of LULC types observed in each 3 km×3 km grid cell, yielding order 104 cells. On the same grid cells, we have an additional compositional dataset supplying forest fragmentation proportions. Potentially useful and available covariates include elevation range, road length, population, median household income, and housing levels. We propose a spatial regression model that is also able to capture flexible dependence among the components of the observation vectors at each location as well as spatial dependence across the locations of the simplex-restricted measurements. A key issue is the high incidence of observed zero proportions for the LULC dataset, requiring incorporation of local point masses at 0. We build a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification. Analyses for the LULC and forest fragmentation data illustrate the interpretation of the regression coefficients and the benefit of incorporating spatial smoothing.

43 citations


Journal ArticleDOI
TL;DR: A Bayesian spatio-temporal Conway–Maxwell Poisson model with dynamic dispersion for predicting migratory bird settling patterns is proposed and a threshold vector-autoregressive model for the CMP intensity parameter that allows for regime switching based on climate conditions is proposed.
Abstract: Modeling spatio-temporal count processes is often a challenging endeavor. That is, in many real-world applications the complexity and high-dimensionality of the data and/or process do not allow for routine model specification. For example, spatio-temporal count data often exhibit temporally varying over/underdispersion within the spatial domain. In order to accommodate such structure, while quantifying different sources of uncertainty, we propose a Bayesian spatio-temporal Conway–Maxwell Poisson (CMP) model with dynamic dispersion. Motivated by the problem of predicting migratory bird settling patterns, we propose a threshold vector-autoregressive model for the CMP intensity parameter that allows for regime switching based on climate conditions. Additionally, to reduce the inherent high-dimensionality of the underlying process, we consider nonlinear dimension reduction through kernel principal component analysis. Finally, we demonstrate the effectiveness of our approach through out-of-sample one-year-ahead prediction of waterfowl migratory patterns across the United States and Canada. The proposed approach is of independent interest and illustrates the potential benefits of dynamic dispersion in terms of superior forecasting.

29 citations


Journal ArticleDOI
TL;DR: In this paper, two sets of data were collected at the stop-over site: a capture-recapture-resighting data set and a vector of counts of unmarked birds.
Abstract: The models presented in this paper are motivated by a stop-over study of semipalmated sandpipers, Calidris pusilla. Two sets of data were collected at the stop-over site: a capture–recapture–resighting data set and a vector of counts of unmarked birds. The two data sets are analyzed simultaneously by combining a new model for the capture–recapture–resighting data set with a binomial likelihood for the counts. The aim of the analysis is to estimate the total number of birds that used the site and the average duration of stop-over. The combined analysis is shown to be highly efficient, even when just 1 % of birds are recaptured, and is recommended for similar investigations. This article has supplementary material online.

29 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian model for mixed ordinal and continuous multivariate data to evaluate a latent spatial Gaussian process is proposed, which can be used in many contexts where mixed continuous and discrete multivariate responses are observed in an effort to quantify an unobservable continuous measurement.
Abstract: We propose a Bayesian model for mixed ordinal and continuous multivariate data to evaluate a latent spatial Gaussian process. Our proposed model can be used in many contexts where mixed continuous and discrete multivariate responses are observed in an effort to quantify an unobservable continuous measurement. In our example, the latent, or unobservable measurement is wetland condition. While predicted values of the latent wetland condition variable produced by the model at each location do not hold any intrinsic value, the relative magnitudes of the wetland condition values are of interest. In addition, by including point-referenced covariates in the model, we are able to make predictions at new locations for both the latent random variable and the multivariate response. Lastly, the model produces ranks of the multivariate responses in relation to the unobserved latent random field. This is an important result as it allows us to determine which response variables are most closely correlated with the latent variable. Our approach offers an alternative to traditional indices based on best professional judgment that are frequently used in ecology. We apply our model to assess wetland condition in the North Platte and Rio Grande River Basins in Colorado. The model facilitates a comparison of wetland condition at multiple locations and ranks the importance of in-field measurements.

28 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an approach to impose computationally advantageous changes of support in statistical implementations of partial differential equations (PDEs) and demonstrate its utility through simulation using a form of PDE known as "ecological diffusion".
Abstract: Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.

26 citations


Journal ArticleDOI
TL;DR: A nonparametric covariance estimator is proposed for the spatial data, as well as its extension to the spatio-temporal data based on the class of space-time covariance models developed by Gneiting (J. Am. Stat. Assoc. 97:590–600, 2002).
Abstract: Covariance structure modeling plays a key role in the spatial data analysis. Various parametric models have been developed to accommodate the idiosyncratic features of a given dataset. However, the parametric models may impose unjustified restrictions to the covariance structure and the procedure of choosing a specific model is often ad hoc. To avoid the choice of parametric forms, we propose a nonparametric covariance estimator for the spatial data, as well as its extension to the spatio-temporal data based on the class of space-time covariance models developed by Gneiting (J. Am. Stat. Assoc. 97:590–600, 2002). Our estimator is obtained via a nonparametric approximation of completely monotone functions. It is easy to implement and our simulation shows it outperforms the parametric models when there is no clear information on model specification. Two real datasets are analyzed to illustrate our approach and provide further comparison between the nonparametric estimator and parametric models.

26 citations


Journal ArticleDOI
TL;DR: This paper uses Generalized Additive Models to evaluate model-based designs for wildlife abundance surveys where substantial pre-existing data are available, often the case in fisheries with historical catch and effort data.
Abstract: This paper uses Generalized Additive Models to evaluate model-based designs for wildlife abundance surveys where substantial pre-existing data are available. This is often the case in fisheries with historical catch and effort data. Compared to conventional stratified design or design-based designs, our model-based designs can be both efficient and flexible, for example in allowing uneven sampling due to survey logistics, and providing a general framework to answer specific design questions. As an example, we describe the design and preliminary implementation of a trawl survey for eleven fish species along the continental slope off South-East Australia.

25 citations


Journal ArticleDOI
TL;DR: A class of nonlinear multivariate time-frequency functional models that can identify important features of each signal as well as the interaction of signals corresponding to the response variable of interest are introduced.
Abstract: Time-frequency analysis has become a fundamental component of many scientific inquiries. Due to improvements in technology, the amount of high-frequency signals that are collected for ecological and other scientific processes is increasing at a dramatic rate. In order to facilitate the use of these data in ecological prediction, we introduce a class of nonlinear multivariate time-frequency functional models that can identify important features of each signal as well as the interaction of signals corresponding to the response variable of interest. Our methodology is of independent interest and utilizes stochastic search variable selection to improve model selection and performs model averaging to enhance prediction. We illustrate the effectiveness of our approach through simulation and by application to predicting spawning success of shovelnose sturgeon in the Lower Missouri River.

23 citations


Journal ArticleDOI
TL;DR: Various loss-function-based ranking approaches for comparing ENM within experiments and toxicity parameters are presented and a framework for the aggregation of ranks across different sources of evidence is proposed, allowing for differential weighting of this evidence based on its reliability and importance in risk ranking.
Abstract: The development of high throughput screening (HTS) assays in the field of nanotoxicology provide new opportunities for the hazard assessment and ranking of engineered nanomaterials (ENMs). It is often necessary to rank lists of materials based on multiple risk assessment parameters, often aggregated across several measures of toxicity and possibly spanning an array of experimental platforms. Bayesian models coupled with the optimization of loss functions have been shown to provide an effective framework for conducting inference on ranks. In this article we present various loss-function-based ranking approaches for comparing ENM within experiments and toxicity parameters. Additionally, we propose a framework for the aggregation of ranks across different sources of evidence while allowing for differential weighting of this evidence based on its reliability and importance in risk ranking. We apply these methods to high throughput toxicity data on two human cell-lines, exposed to eight different nanomaterials, and measured in relation to four cytotoxicity outcomes. This article has supplementary material online.

21 citations


Journal ArticleDOI
TL;DR: It is shown how a particular low-rank process, the predictive process, which has been widely used to model large geostatistical datasets, can be effectively deployed to model non-degenerate cross-covariance processes.
Abstract: Advances in geo-spatial technologies have created data-rich environments which provide extraordinary opportunities to understand the complexity of large and spatially indexed data in ecology and the natural sciences. Our current application concerns analysis of soil nutrients data collected at La Selva Biological Station, Costa Rica, where inferential interest lies in capturing the spatially varying relationships among the nutrients. The objective here is to interpolate not just the nutrients across space, but also associations among the nutrients that are posited to vary spatially. This requires spatially varying cross-covariance models. Fully process-based specifications using matrix-variate processes are theoretically attractive but computationally prohibitive. Here we develop fully process-based low-rank but non-degenerate spatially varying cross-covariance processes that can effectively yield interpolate cross-covariances at arbitrary locations. We show how a particular low-rank process, the predictive process, which has been widely used to model large geostatistical datasets, can be effectively deployed to model non-degenerate cross-covariance processes. We produce substantive inferential tools such as maps of nonstationary cross-covariances that constitute the premise of further mechanistic modeling and have hitherto not been easily available for environmental scientists and ecologists.

Journal ArticleDOI
TL;DR: In this article, a model-based clustering approach to examine abundance trends in a metapopulation is proposed, which incorporates a clustering method that is an extension of the classic Chinese Restaurant Process and the associated Dirichlet process prior, which allows for inclusion of distance covariates between sites.
Abstract: We consider a model-based clustering approach to examining abundance trends in a metapopulation. When examining trends for an animal population with management goals in mind one is often interested in those segments of the population that behave similarly to one another with respect to abundance. Our proposed trend analysis incorporates a clustering method that is an extension of the classic Chinese Restaurant Process, and the associated Dirichlet process prior, which allows for inclusion of distance covariates between sites. This approach has two main benefits: (1) nonparametric spatial association of trends and (2) reduced dimension of the spatio-temporal trend process. We present a transdimensional Gibbs sampler for making Bayesian inference that is efficient in the sense that all of the full conditionals can be directly sampled from save one. To demonstrate the proposed method we examine long term trends in northern fur seal pup production at 19 rookeries in the Pribilof Islands, Alaska. There was strong evidence that clustering of similar year-to-year deviation from linear trends was associated with whether rookeries were located on the same island. Clustering of local linear trends did not seem to be strongly associated with any of the distance covariates. In the fur seal trends analysis an overwhelming proportion of the MCMC iterations produced a 73–79 % reduction in the dimension of the spatio-temporal trend process, depending on the number of cluster groups.

Journal ArticleDOI
TL;DR: In this article, a novel idea of modeling plant growth in the framework of non-homogeneous hidden Markov models (Cappe, Moulines, and Ryden 2005) for a certain class of plants with known organogenesis is presented.
Abstract: Parametric identification of plant growth models formalized as discrete dynamical systems is a challenging problem due to specific data acquisition (system observation is generally done with destructive measurements), non-linear dynamics, model uncertainties and high-dimensional parameter space. In this study, we present a novel idea of modeling plant growth in the framework of non-homogeneous hidden Markov models (Cappe, Moulines, and Ryden 2005), for a certain class of plants with known organogenesis (structural development). Unknown parameters of the models are estimated via a stochastic variant of a generalized EM (Expectation-Maximization) algorithm and approximate confidence intervals are given via parametric bootstrap. The complexity of the model makes both the E-step (expectation step) and the M-step (maximization step) non-explicit. For this reason, the E-step is approximated via a sequential Monte Carlo procedure (sequential importance sampling with resampling) and the M-step is separated into two steps (Conditional-Maximization), where before applying a numerical maximization procedure (quasi-Newton type), a large subset of unknown parameters is updated explicitly conditioned on the other subset. A simulation study and a case-study with real data from the sugar beet are considered and a model comparison is performed based on these data. Appendices are available online.

Journal ArticleDOI
TL;DR: In this article, a discrete Gamma Markov Random Field (MRF) prior is introduced for modeling spatial relations among regions in geo-referenced health data, which is incorporated into a generalized linear mixed model zero-inflated framework that accounts for excess zeroes not explained by usual parametric (Poisson or Negative Binomial) assumptions.
Abstract: In this paper, we introduce a novel discrete Gamma Markov random field (MRF) prior for modeling spatial relations among regions in geo-referenced health data. Our proposition is incorporated into a generalized linear mixed model zero-inflated (ZI) framework that accounts for excess zeroes not explained by usual parametric (Poisson or Negative Binomial) assumptions. The ZI framework categorizes subjects into low-risk and high-risk groups. Zeroes arising from the low-risk group contributes to structural zeroes, while the high-risk members contributes to random zeroes. We aim to identify explanatory covariates that might have significant effect on (i) the probability of subjects in low-risk group, and (ii) intensity of the high risk group, after controlling for spatial association and subject-specific heterogeneity. Model fitting and parameter estimation are carried out under a Bayesian paradigm through relevant Markov chain Monte Carlo (MCMC) schemes. Simulation studies and application to a real data on hypertensive disorder of pregnancy confirms that our model provides superior fit over the widely used conditionally auto-regressive proposition.

Journal ArticleDOI
TL;DR: Inverse sampling for proportions is useful when there is a need to estimate the prevalence of a disease without delay as discussed by the authors, which can be combined with group (pooled) testing, in which individuals are pooled together and tested as a group for the disease.
Abstract: Inverse sampling for proportions is useful when there is a need to estimate the prevalence of a disease without delay. This can be combined with group (pooled) testing, in which individuals are pooled together and tested as a group for the disease. Pritchard and Tebbs (in Journal of Agricultural, Biological, and Environmental Statistics 16, 70–87, 2011a) introduced this combination to the statistical literature, and we have addressed some of the key problems raised, for groups of equal size. Most point estimators of the proportion are biased, especially the MLE, but by applying a suitable correction we have developed an estimator which is almost unbiased in the region of interest. We propose two interval estimators which improve on existing methods and have excellent coverage properties. Our recommendation is a score-based method with a correction for skewness, but a good alternative is an exact method with a mid-P correction.

Journal ArticleDOI
TL;DR: In this paper, the cumulative distribution function (cumulative emergence) of the cumulative hydrothermal time (CHTT) is considered for weed emergence, which is an alternative approach to classical parametric regression, often employed in this framework.
Abstract: Hydrothermal time (HTT) is a valuable environmental index to predict weed emergence. In this paper, we focus on the problem of predicting weed emergence given some HTT observations from a distribution point of view. This is an alternative approach to classical parametric regression, often employed in this framework. The cumulative distribution function (cumulative emergence) of the cumulative hydrothermal time (CHTT) is considered for this task. Due to the monitoring process, it is not possible to observe the exact emergence time of every seedling. On the contrary, these emergence times are observed in an aggregated way. To address these facts, a new nonparametric distribution function estimator has been proposed. A bootstrap bandwidth selection method is also presented. Moreover, bootstrap techniques are also used to develop simultaneous confidence intervals for the HTT cumulative distribution function. The proposed methods have been applied to an emergence data set of Bromus diandrus.

Journal ArticleDOI
TL;DR: It is shown how the approach may be used to obtain predictions of pure stand additive and non-additive effects in the context of a single field trial using an example from an Australian sorghum breeding program.
Abstract: There are two key types of selection in a plant breeding program, namely selection of hybrids for potential commercial use and the selection of parents for use in future breeding. Oakey et al. (in Theoretical and Applied Genetics 113, 809–819, 2006) showed how both of these aims could be achieved using pedigree information in a mixed model analysis in order to partition genetic effects into additive and non-additive effects. Their approach was developed for field trial data subject to spatial variation. In this paper we extend the approach for data from trials subject to interplot competition. We show how the approach may be used to obtain predictions of pure stand additive and non-additive effects. We develop the methodology in the context of a single field trial using an example from an Australian sorghum breeding program.

Journal ArticleDOI
TL;DR: It is shown that, for a wide range of models, the empirical velocity of processive motor proteins has a limiting Pearson type VII distribution with finite mean but infinite variance, and maximum likelihood inference is developed for this Pearson type-VII distribution.
Abstract: We show that, for a wide range of models, the empirical velocity of processive motor proteins has a limiting Pearson type VII distribution with finite mean but infinite variance. We develop maximum likelihood inference for this Pearson type VII distribution. In two simulation studies, we compare the performance of our MLE with the performance of standard Student’s t-based inference. The studies show that incorrectly assuming normality (1) can lead to imprecise inference regarding motor velocity in the one-sample case, and (2) can significantly reduce power in the two-sample case. These results should be of interest to experimentalists who wish to engineer motors possessing specific functional characteristics.

Journal ArticleDOI
TL;DR: This work model counts of marked and unmarked animals as multinomial random variables, using the capture frequencies of marked animals for inference about the latentMultinomial frequencies for unmarked animals, and discusses undesirable behavior of the commonly used discrete uniform prior distribution on the population size parameter.
Abstract: Mark-resight designs for estimation of population abundance are common and attractive to researchers. However, inference from such designs is very limited when faced with sparse data, either from a low number of marked animals, a low probability of detection, or both. In the Greater Yellowstone Ecosystem, yearly mark-resight data are collected for female grizzly bears with cubs-of-the-year (FCOY), and inference suffers from both limitations. To overcome difficulties due to sparseness, we assume homogeneity in sighting probabilities over 16 years of bi-annual aerial surveys. We model counts of marked and unmarked animals as multinomial random variables, using the capture frequencies of marked animals for inference about the latent multinomial frequencies for unmarked animals. We discuss undesirable behavior of the commonly used discrete uniform prior distribution on the population size parameter and provide OpenBUGS code for fitting such models. The application provides valuable insights into subtleties of implementing Bayesian inference for latent multinomial models. We tie the discussion to our application, though the insights are broadly useful for applications of the latent multinomial model.

Journal ArticleDOI
TL;DR: In this article, a penalized pseudolikelihood estimation method and an approximation of the variance of the parameter estimates were proposed for big ecological data, and a simulation study was conducted to evaluate the performance of the proposed method and a data example in a study of land cover in relation to land ownership characteristics.
Abstract: Autologistic regression models are suitable for relating spatial binary responses in ecology to covariates such as environmental factors. For big ecological data, pseudolikelihood estimation is appealing due to its ease of computation, but at least two challenges remain. Although an important issue, it is unclear how model selection may be carried out under pseudolikelihood. In addition, for assessing the variation of pseudolikelihood estimates, parametric bootstrap using Monte Carlo simulation is often used but may be infeasible for very large data sizes. Here both these issues are addressed by developing a penalized pseudolikelihood estimation method and an approximation of the variance of the parameter estimates. A simulation study is conducted to evaluate the performance of the proposed method, followed by a data example in a study of land cover in relation to land ownership characteristics. Extension of these models and methods to spatial-temporal binary data is further discussed. This article has supplementary material online.

Journal ArticleDOI
James S. Clark1, David M. Bell1, Matthew Kwit1, Amanda S. Powell1, Kai Zhu1 
TL;DR: DIP is described, a synthetic approach to SA that quantifies how inputs affect the combined (multivariate) output and provides a synthetic index of important inputs, including climate vulnerability in the context of competition for light and soil moisture, based on the full ( multivariate) response.
Abstract: Sensitivity analysis (SA) of environmental models is inefficient when there are large numbers of inputs and outputs and interactions cannot be directly linked to input variables. Traditional SA is based on coefficients relating the importance of an input to an output response, generating as many as one coefficient for each combination of model input and output. In many environmental models multiple outputs are part of an integrated response that should be considered synthetically, rather than by separate coefficients for each output. For example, there may be interactions between output variables that cannot be defined by standard interaction terms for input variables. We describe dynamic inverse prediction (DIP), a synthetic approach to SA that quantifies how inputs affect the combined (multivariate) output. We distinguish input interactions (specified as a traditional product of input variables) from output interactions (relationships between outputs not directly linked to inputs). Both contribute to traditional SA coefficients and DIP in ways that permit interpretation of unexpected model results. An application of broad and timely interest, anticipating effects of climate change on biodiversity, illustrates how DIP helps to quantify the important input variables and the role of interactions. Climate affects individual trees in competition with neighboring trees, but interest lies at the scale of species and landscapes. Responses of individuals to climate and competition for resources involve a number of output variables, such as birth rates, growth, and mortality. They are all components of ‘individual health’, and they interact in ways that cannot be linked to observed inputs, through allocation constraints. We show how prior dependence is introduced to aid interpretation of inputs in the context of ecological resource modeling. We further demonstrate that a new approach to multiplicity (multiple-testing) correction can be implemented in such models to filter through the large number of input combinations. DIP provides a synthetic index of important inputs, including climate vulnerability in the context of competition for light and soil moisture, based on the full (multivariate) response. By aggregating in specific ways (over individuals, years, and other input variables) we provide ways to summarize and rank species in terms of their vulnerability to climate change. This article has supplementary material online.

Journal ArticleDOI
TL;DR: This article developed default priors for the anisotropic Gaussian random field model with and without including a nugget parameter accounting for the effects of microscale variations and measurement errors.
Abstract: Anisotropic models are often used in spatial statistics to analyze spatially referenced data. Within a Bayesian framework we develop default priors for the anisotropic Gaussian random field model with and without including a nugget parameter accounting for the effects of microscale variations and measurement errors. We present Jeffreys priors and a reference prior and study their posterior propriety. Moreover, we obtain that the predictive distributions at ungauged locations have finite variance. We also show that the seemingly uninformative uniform prior for the anisotropy parameters, ratio and angle, yields an improper posterior. Finally, we find that the proposed priors have good frequentist properties and we illustrate our approach by analyzing two data sets for which we discuss model choice as well as predictions and uncertainty estimates.

Journal ArticleDOI
TL;DR: A Bayes neutral zone classifier is derived and it is demonstrated that it outperforms previousneutral zone classifiers with respect to the expected cost of misclassifications and also with respect with computational complexity.
Abstract: Neutral zone classifiers allow for a region of neutrality when there is inadequate information to assign a predicted class with suitable confidence. A neutral zone classifier is defined by classification regions that trade off the cost of an incorrect classification against the cost of remaining neutral. In this paper, we derive a Bayes neutral zone classifier and demonstrate that it outperforms previous neutral zone classifiers with respect to the expected cost of misclassifications and also with respect to computational complexity. We apply the neutral zone classifier to a microbial community profiling application in which no training data are available, thereby illustrating how it can be extended to unsupervised settings. This article has supplementary material online.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new sampling design with two independent samples A and B each addressing one of these objectives: estimating the national agricultural income and providing detailed data at production branch level.
Abstract: Farm accountancy data network normally serves two objectives: estimating the national agricultural income and providing detailed data at production branch level. This study proposes a new sampling design with two independent samples A and B each addressing one of these objectives. The design is based on stratified sampling along with a combination of optimal and power allocation. Violations of precision targets will be avoided by the collapsing of strata. We assess the accuracy of key structural and economic variables by means of a Monte Carlo simulation. Multiple linear regression has been shown to be a powerful tool for imputing financial data to individual census farms. The results illustrate that the proposed design meets prescribed precision and feasibility restrictions at both the single-strata and national levels. It is further demonstrated that unifying samples A and B helps to significantly reduce the survey costs.

Journal ArticleDOI
TL;DR: In this paper, a spatio-temporal model is proposed to describe the spread of apple scab within an orchard composed of several plots, where the model is defined on a regular lattice and evolves in continuous time.
Abstract: We consider a spatio-temporal model to describe the spread of apple scab within an orchard composed of several plots. The model is defined on a regular lattice and evolves in continuous time. Based on ordinal categorical data observed only at some discrete instants, we adopt a continuous-time approach and apply a Bayesian framework for estimating unknown parameters.

Journal ArticleDOI
TL;DR: This work focuses on a multiplicative-lognormal structural measurement error scenario and approaches to address it when external validation data are available, and favors a pseudo-likelihood approach that exhibits fewer computational problems than direct full maximum likelihood (ML) yet maintains consistency under the assumed models without necessitating small exposure effects and/or small measurement error assumptions.
Abstract: A common goal in environmental epidemiologic studies is to undertake logistic regression modeling to associate a continuous measure of exposure with binary disease status, adjusting for covariates. A frequent complication is that exposure may only be measurable indirectly, through a collection of subject-specific variables assumed associated with it. Motivated by a specific study to investigate the association between lung function and exposure to metal working fluids, we focus on a multiplicative-lognormal structural measurement error scenario and approaches to address it when external validation data are available. Conceptually, we emphasize the case in which true untransformed exposure is of interest in modeling disease status, but measurement error is additive on the log scale and thus multiplicative on the raw scale. Methodologically, we favor a pseudo-likelihood (PL) approach that exhibits fewer computational problems than direct full maximum likelihood (ML) yet maintains consistency under the assumed models without necessitating small exposure effects and/or small measurement error assumptions. Such assumptions are required by computationally convenient alternative methods like regression calibration (RC) and ML based on probit approximations. We summarize simulations demonstrating considerable potential for bias in the latter two approaches, while supporting the use of PL across a variety of scenarios. We also provide accessible strategies for obtaining adjusted standard errors to accompany RC and PL estimates.

Journal ArticleDOI
TL;DR: In this article, the authors extend the standard natural mortality model by including a random effect to account for overdispersion, and use the likelihood ratio test, effective dose, and the use of a simulated residual envelope for model checking.
Abstract: When fitting dose–response models to entomological data it is often necessary to take account of natural mortality and/or overdispersion. The standard approach to handle natural mortality is to use Abbott’s formula, which allows for a constant underlying mortality rate. Commonly used overdispersion models include the beta-binomial model, logistic-normal, and discrete mixtures. Here we extend the standard natural mortality model by including a random effect to account for overdispersion. Parameter estimation is based on a combined EM Newton–Raphson algorithm, which provides a simple framework for maximum likelihood estimation of the natural mortality model. We consider the application of this model to data from an experiment on the use of a virus (PhopGV) for the biological control of worm larvae (Phthorimaea operculella) in potatoes. For this natural mortality model with a random effect we introduce the likelihood ratio test, effective dose, and the use of a simulated residual envelope for model checking. Comparisons are made with an equivalent beta-binomial model. The procedures are implemented in the R system.

Journal ArticleDOI
TL;DR: In this article, a multivariate t-distribution is used for the calculation of multiplicity-adjusted p-values and simultaneous confidence intervals and the number of the multiple variables as well as their correlations are taken into account this way.
Abstract: Agricultural experiments often have a completely randomized design, and multiple, correlated variables are measured. This paper addresses an appropriate statistical evaluation. A multivariate t-distribution is used for the calculation of multiplicity-adjusted p-values and simultaneous confidence intervals. The number of the multiple variables as well as their correlations are taken into account this way. We consider ratios of means instead of differences, and comparisons versus the overall mean instead of all-pair comparisons. A data set from a greenhouse experiment with glucosinolates of several cultivars of Chinese cabbage (Brassica rapa subsp. pekinensis) is used as an example. Related code based on the R-package SimComp is presented. This package allows a wide application in many agricultural experiments with a similar design.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of how to optimally allocate sampling effort at the depletion sites, which involves determining how many times to repeat the depletion process at a given site versus how many different sites to include in the sampling.
Abstract: We consider the problem of designing a depletion or removal survey as part of estimating animal abundance for populations with imperfect capture or detection rates. In a depletion survey, animals are captured from a given area, counted, and withheld from the population. This process is then repeated some number of times at the same location and the decreasing catches of the local population inform jointly local abundance and the capture or detection rate, which we call catchability. The aim of such a survey may be to learn about the catchability process, and this information may then be applied to a broader survey of the population so as to accurately estimate total abundance. In this manuscript we consider the problem of how to optimally allocate sampling effort at the depletion sites. Allocating sampling effort involves determining how many times to repeat the depletion process at a given site versus how many different sites to include in the sampling. By maximizing the Fisher information of the parameter describing catchability as a function of the survey design, we attempt to estimate the optimal number of depletions per site, which depends on the catchability value itself. We also discuss other aspects of depletion sampling apparent from the derivation of Fisher information, including the difficulties of sampling with low catchability values (e.g. below 0.15), and we consider our results with respect to the annual Chesapeake Bay blue crab abundance survey conducted by the Maryland Department of Natural Resources.

Journal ArticleDOI
TL;DR: In this article, the authors developed a nonlinear model that predicts the abundance of the important zooplankton species Calanus finmarchicus from hydrographic data from the Gulf of Maine.
Abstract: Time series of physical and biological properties of the ocean are a valuable resource for developing models for ecological forecasting and ecosystem-based management. Both the physics of the oceans and organisms living in it can exhibit nonlinear dynamics. We describe the development of a nonlinear model that predicts the abundance of the important zooplankton species Calanus finmarchicus from hydrographic data from the Gulf of Maine. The results of a neural network model, including model diagnostics and forecasts, are presented. The best neural network model based on generalized cross-validation includes variables of C. finmarchicus abundance, herring abundance, and the state of the Gulf of Maine waters, with meaningful time lags. Forecasts are constructed for the model fit to 1978–2003 bimonthly data and corresponding forecasts intervals are obtained by the stationary bootstrap.