scispace - formally typeset
Search or ask a question

Showing papers in "Environmetrics in 2002"


Journal ArticleDOI
TL;DR: Basic concepts based on actual avian, amphibian, and fish monitoring studies are presented in this article and it is believed that the estimation of detection probability should be built into the monitoring design through a double sampling approach.
Abstract: Techniques for estimation of absolute abundance of wildlife populations have received a lot of attention in recent years. The statistical research has been focused on intensive small-scale studies. Recently, however, wildlife biologists have desired to study populations of animals at very large scales for monitoring purposes. Population indices are widely used in these extensive monitoring programs because they are inexpensive compared to estimates of absolute abundance. A crucial underlying assumption is that the population index (C) is directly proportional to the population density (D). The proportionality constant, β, is simply the probability of ‘detection’ for animals in the survey. As spatial and temporal comparisons of indices are crucial, it is necessary to also assume that the probability of detection is constant over space and time. Biologists intuitively recognize this when they design rigid protocols for the studies where the indices are collected. Unfortunately, however, in many field studies the assumption is clearly invalid. We believe that the estimation of detection probability should be built into the monitoring design through a double sampling approach. A large sample of points provides an abundance index, and a smaller sub-sample of the same points is used to estimate detection probability. There is an important need for statistical research on the design and analysis of these complex studies. Some basic concepts based on actual avian, amphibian, and fish monitoring studies are presented in this article. Copyright © 2002 John Wiley & Sons, Ltd.

581 citations


Journal ArticleDOI
TL;DR: In this article, trend analyses of time series of environmental data are carried out to assess the human impact on the environment under the influence of natural fluctuations in temperature, precipitation, an...
Abstract: Trend analyses of time series of environmental data are often carried out to assess the human impact on the environment under the influence of natural fluctuations in temperature, precipitation, an ...

380 citations


Journal ArticleDOI
TL;DR: This work proposes several strategies for utilizing external data (such as might be obtained using GIS) to aid in the completion of species lists, and demonstrates the potential of these approaches using simulation and case studies from Oklahoma.
Abstract: A substantial body of literature has accumulated on the topic of the estimation of species richness by extrapolation. However, most of these methods rely on an objective sampling of nature. This condition is difficult to meet and seldom achieved for large regions. Furthermore, scientists conducting biological surveys often already have preliminary but subjectively gathered species lists, and would like to assess the completeness of such lists, and/or to find a way to perfect them. We propose several strategies for utilizing external data (such as might be obtained using GIS) to aid in the completion of species lists. These include: (i) using existing species lists to develop predictive models; (ii) using the uniqueness of the environment as a guide to find underrepresented species; (iii) using spectral heterogeneity to locate environmentally heterogeneous regions; (iv) combining surveys with statistical model-building in an iterative manner. We demonstrate the potential of these approaches using simulation and case studies from Oklahoma. Copyright © 2002 John Wiley & Sons, Ltd.

342 citations


Journal ArticleDOI
TL;DR: In this article, a new technique for ozone forecasting is proposed, which considers stochastic processes with values in function spaces, and makes use of the essential characteristic of this type of phenomenon by taking into account theoretically and practically the continuous time evolution of pollution.
Abstract: In this article, we propose a new technique for ozone forecasting. The approach is functional, that is we consider stochastic processes with values in function spaces. We make use of the essential characteristic of this type of phenomenon by taking into account theoretically and practically the continuous time evolution of pollution. One main methodological enhancement of this article is the incorporation of exogenous variables (wind speed and temperature) in those models. The application is carried out on a six-year data set of hourly ozone concentrations and meteorological measurements from Bethune (France). The study examines the summer periods because of the higher values observed. We explain the non-parametric estimation procedure for autoregressive Hilbertian models with or without exogenous variables (considering two alternative versions in this case) as well as for the functional kernel model. The comparison of all the latter models is based on up-to-24 hour-ahead predictions of hourly ozone concentrations. We analyzed daily forecast curves upon several criteria of two kinds: functional ones, and aggregated ones where attention is put on the daily maximum. It appears that autoregressive Hilbertian models with exogenous variables show the best predictive power. Copyright (C) 2002 John Wiley Sons, Ltd.

106 citations


Journal ArticleDOI
TL;DR: In this article, three load estimation techniques are investigated using field data from an upland catchment in Wales, using approximately weekly concentration measurements with corresponding discharge measurement, supplemented by 15'min measurements of discharge.
Abstract: The transport of dissolved organic carbon (DOC) from land to ocean is a significant component of the global carbon cycle, and good estimates of the load transported in rivers are needed. Three load estimation techniques are investigated using field data from an upland catchment in Wales. The methods use approximately weekly concentration measurements with corresponding discharge measurement, supplemented by 15 min measurements of discharge. Annual load estimates are obtained for the years 1985–2000 with the following treatment of supplementary information: (i) exclusion using simple extrapolation; (ii) inclusion using the ratio method; (iii) inclusion using the rating curve method with a straight line fitted to the logarithms of load and discharge data. A variance estimate is given for the three load estimation methods. The influence of additional opportunistic DOC concentration measurements on load estimation is discussed and investigated. The importance of an unbiased sampling procedure for simple extrapolation is demonstrated, but the inclusion of opportunistic data from high discharge samples is shown to improve rating curve estimates of load, reducing their variance. The strong influence of parameter uncertainty in the load variance for the rating curve method is demonstrated, as well as the sensitivity of the method to model assumptions. Copyright © 2002 John Wiley & Sons, Ltd.

67 citations


Journal ArticleDOI
TL;DR: In this article, a spatial and temporal algorithm (geostatistical temporal-spatial or GTS) was developed for optimizing long-term monitoring (LTM) networks.
Abstract: In a pilot project, a spatial and temporal algorithm (geostatistical temporal-spatial or GTS) was developed for optimizing long-term monitoring (LTM) networks. Data from two monitored ground-water plumes were used to test the algorithm. The primary objective was to determine the degree to which sampling, laboratory analysis, and/ or well construction resources could be pared without losing key statistical information concerning the plumes. Optimization of an LTM network requires an accurate assessment of both ground-water quality over time and trends or other changes in individual monitoring wells. Changes in interpolated concentration maps over time indicate whether ground-water quality has improved or declined. GTS separately identifies temporal and spatial redundancies. Temporal redundancy may be reduced by lengthening the time between sample collection. Spatial redundancy may be reduced by removing wells from the network which do not significantly impact assessment of ground-water quality. Part of the temporal algorithm in GTS involves computation of a composite temporal variogram to determine the least redundant overall sampling interval. Under this measure of autocorrelation between sampling events, the lag time at which the variogram reaches a sill is the sampling interval at which same-well measurements lack correlation and are therefore non-redundant. The spatial algorithm assumes that well locations are redundant if nearby wells offer nearly the same statistical information about the underlying plume. A well was considered redundant if its removal did not significantly change: (i) an interpolated map of the plume; (ii) the local kriging variances in that section of the plume; and (iii) the average global kriging variance. To identify well redundancy, local kriging weights were accumulated into global weights and used to gauge each well's relative contribution to the interpolated plume map. By temporarily removing that subset of wells with the lowest global kriging weights and re-mapping the plume, it was possible to determine how many wells could be removed without losing critical information. Test results from the Massachusetts Military Reserve (MMR) indicated that substantial savings in sampling, analysis and operational costs could be realized by utilizing GTS. Annual budgetary savings that would accrue were estimated at between 35 per cent and 5 per cent for both LTM networks under study.

66 citations


Journal ArticleDOI
TL;DR: In this paper, the benthic index of biotic integrity (B-IBI) developed for the Chesapeake Bay was statistically verified using simulations and a suite of multivariate statistical techniques.
Abstract: The benthic index of biotic integrity (B-IBI) developed for the Chesapeake Bay was statistically verified using simulations and a suite of multivariate statistical techniques. The B-IBI uses a simple scoring system for benthic community metrics to assess benthic community health and to infer environmental quality of benthic habitats in the Bay. Overall, the B-IBI was verified as being sensitive, stable, robust and statistically sound. Classification effectiveness of the B-IBI increased with salinity, from marginal performance for tidal freshwater ecosystems to excellent results for polyhaline areas. The greater classification uncertainty in low salinity habitats may be due to difficulties in reliably identifying naturally unstressed areas or may be due to regional ecotones created by stress gradients. Pollution-indicative species abundance, pollution-sensitive species abundance, and diversity (Shannon's index) were the most important metrics in discriminating between degraded and non-degraded conditions in the majority of the habitats. Single metrics often performed as well as the multi-metric B-IBI in correctly classifying the relative quality of sites. However, the redundancy in the multi-metric B-IBI provided a stable 'weight of evidence' system which increased confidence in general conclusions. Confidence limits developed for the B-IBI scores were used to distinguish among habitats that were degraded, non-degraded, of intermediate quality, or of indeterminate condition.

55 citations


Journal ArticleDOI
TL;DR: Estimation of the source profiles (pollution recipes) and their contributions and their constraints given by nonnegativity and identifiability conditions of the model parameters based on constrained nonlinear least squares methods.
Abstract: Multivariate receptor models aim to identify the pollution sources based on multivariate air pollution data. This article is concerned with estimation of the source profiles (pollution recipes) and their contributions (amounts of pollution). The estimation procedures are based on constrained nonlinear least squares methods with the constraints given by nonnegativity and identifiability conditions of the model parameters. We investigate several identifiability conditions that are appropriate in the context of receptor models, and also present new sets of identifiability conditions, which are often reasonable in practice when the other traditional identifiability conditions fail. The resulting estimators are consistent under appropriate identifiability conditions, and standard errors for the estimators are also provided. Simulation and application to real air pollution data illustrate the results. Copyright © 2002 John Wiley & Sons, Ltd.

47 citations


Journal ArticleDOI
TL;DR: This work proposes a class of air quality indices which are simple to read and easy to understand by citizens and policy-makers and to use simultaneously more than one index of the selected class and to associate a measure of variability with every index.
Abstract: Interest in air quality indices has been increasing in recent years. This is strictly connected with the development and the easy availability of web-communication and on-line information. By means of web pages it is indeed possible to give quick and easy-to-consult information about air quality in a specific area. We propose a class of air quality indices which are simple to read and easy to understand by citizens and policy-makers. They are constructed in order to be able to compare situations that differ in time and space. In particular, interest is focused on situations where many monitoring stations are operating in the same area. In this case, which occurs frequently, air pollution data are collected according to three dimensions: time, space and type of pollutant. In order to obtain a synthetic value, the dimensions are reduced by means of aggregation processes that occur by successively applying some aggregating function. The final index may be influenced by the order of aggregation. The hierarchical aggregation here proposed is based on the successive selection of order statistics, i.e. on percentiles and on maxima. The variety of pollutants measured in each area imposes a standardization due to their different effects on the human health. This evaluation comes from epidemiological studies and influences the final value of the index. We propose to use simultaneously more than one index of the selected class and to associate a measure of variability with every index. Such measures of dispersion account for very important additional information. Copyright © 2002 John Wiley & Sons, Ltd.

45 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used a hierarchical Bayesian approach to predict average hourly concentrations of ambient PM 10 in Vancouver, using a multivariate AR(3) temporal process with common spatial parameters.
Abstract: In this article we describe an approach for predicting average hourly concentrations of ambient PM 10 in Vancouver. We know our solution also applies to hourly ozone fields and believe it may be quite generally applicable. We use a hierarchical Bayesian approach. At the primary level we model the logarithmic field as a trend model plus Gaussian stochastic residual. That trend model depends on hourly meteorological predictors and is common to all sites. The stochastic component consists of a 24-hour vector response that we model as a multivariate AR(3) temporal process with common spatial parameters. Removing the trend and AR structure leaves 'whitened' time series of vector series. With this approach (as opposed to using 24 separate univariate time series models), there is little loss of spatial correlation in these residuals compared with that in just the detrended residuals (prior to removing the AR component). Moreover our multivariate approach enables predictions for any given hour to 'borrow strength' through its correlation with adjoining hours. On this basis we develop a spatial predictive distribution for these residuals at unmonitored sites. By transforming the predicted residuals back to the original data scales we can impute Vancouver's hourly PM 10 field.

41 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used distance sampling (line transects) and mark-resight to estimate habitat and local area (area size, range: 3.9-44.5 ha) population density of ungulates.
Abstract: We used distance sampling (line transects) and mark-resight to estimate habitat and local area (area size, range: 3.9-44.5 ha) population density of ungulates. The distance sampling study was performed on fallow deer (Dama dama), roe deer (Capreolus capreolus) and wild boar (Sus scrofa) in a mediterranean forest. The mark-resight study was performed on a roe deer population in a hilly area of the Apennines. The first study allowed us to estimate the animal density into four different habitats (deciduous oak wood, evergreen oak wood, maquis and open areas with domestic-pine woods). The between habitat differences of population density are large for the three species: fallow deer are more abundant in the open areas (22.22 heads/km 2 , c.i. 12.42-39.74), roe deer in the deciduous oak wood (14.50 heads/km 2 , c.i. 7.01-30.10) and wild boar in both the open areas (11.29 heads/km 2 c.i. 4.86-26.20) and evergreen oak wood (10.42 heads/km 2 , c.i. 6.78-16.02). The roe deer population in the Apennines is characterized by large between-zone variations of population density (range: 1997: 13.25-131.31, 1998: 29.06-78.01, 1999: 10.67-88.58). Moreover, trends of local zones may be quite different with respect to the average trend of a wider study area, suggesting a well-defined short-scale spatial structure for this population. We conclude that both survey methods may be very useful in population assessment, but they need intense field effort and careful statistical design. Care needs to be taken to satisfy the assumptions of the statistical models.

Journal ArticleDOI
TL;DR: In this paper, an indicator kriging-based approach is used to account for measurement errors in the modeling of uncertainty prevailing at unsampled locations, and Probability field simulation is then used to assess the probability that the average pollutant concentration within remediation units exceeds a regulatory threshold, and probability maps are used to identify hazardous units that need to be remediated.
Abstract: In many environmental studies spatial variability is viewed as the only source of uncertainty while measurement errors tend to be ignored. This article presents an indicator kriging-based approach to account for measurement errors in the modeling of uncertainty prevailing at unsampled locations. Probability field simulation is then used to assess the probability that the average pollutant concentration within remediation units exceeds a regulatory threshold, and probability maps are used to identify hazardous units that need to be remediated. This approach is applied to two types of dioxin data (composite and single spoon samples) with different measurement errors which were collected at the Piazza Road dioxin site, an EPA Superfund site located in Missouri. A validation study shows that the proportion of contaminated soil cores provides a reasonable probability threshold to identify hazardous remediation units. When a lower probability threshold is chosen, the total remediation costs are unreasonably high while false negatives are unacceptably frequent for a higher probability threshold. The choice of this threshold becomes critical as the sampling density decreases.

Journal ArticleDOI
TL;DR: In this article, the bivariate lognormal distribution is proposed as a model for the joint distribution of storm peak (maximum rainfall intensity) and storm amount, which is suitable for representing multiple episodic storm events at the Motoyama meteorological observation station in Japan.
Abstract: The bivariate lognormal distribution is proposed as a model for the joint distribution of storm peak (maximum rainfall intensity) and storm amount. Using the marginal distributions, the joint distribution, the conditional distributions, and the associated return periods are derived. The model is found appropriate for representing multiple episodic storm events at the Motoyama meteorological observation station in Japan. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a non-linear model for the estimation of a no effect concentration (NEC), which is included as a threshold parameter in a nonlinear model.
Abstract: The use of a no effect concentration (NEC), instead of the commonly used no observed effect concentration (NOEC), has been advocated recently. In this article models and methods for the estimation of an NEC are proposed and it is shown that the NEC overcomes many of the objections to the NOEC. The NEC is included as a threshold parameter in a non-linear model. Numerical methods are then used for point estimation and several techniques are proposed for interval estimation (based on bootstrap, profile likelihood and asymptotic normality). The adequacy of these methods is empirically confirmed by the results of a simulation study. The profile likelihood based interval has emerged as the best method. Finally the methodology is illustrated with data obtained from a 21 day Daphnia magna reproduction test with a reference substance, 3,4-dichloroaniline (3,4-DCA), and with a real effluent. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors developed several models for particulate matter in an urban region that allow spatial dependence to be represented in different manners over a time period of one year, based on a Markov random field approach, and a conceptualization of observed data as arising from two random processes, a conditionally independent observation process and a spatially dependent latent pollution process.
Abstract: Researchers are beginning to realize the need to take spatial structure into account when modeling data on air pollutants. We develop several models for particulate matter in an urban region that allow spatial dependence to be represented in different manners over a time period of one year. The models are based on a Markov random field approach, and a conceptualization of observed data as arising from two random processes, a conditionally independent observation process and a spatially dependent latent pollution process. Optimal predictors are developed for both of these processes, and predictions of the observation process are used for model assessment.

Journal ArticleDOI
TL;DR: In this article, the Chesapeake Bay Benthic Index of Biotic Integrity (B-IBI) and the Environmental Monitoring and Assessment Program (EMAP-VP BI) were compared.
Abstract: SUMMARY The Chesapeake Bay Benthic Index of Biotic Integrity (B-IBI) and the Environmental Monitoring and Assessment Program’s Virginian Province Benthic Index (EMAP-VP BI) were applied to 294 sampling events in Chesapeake Bay and the results were compared. These estuarine benthic indices are intended to identify benthic invertebrate assemblages that have been degraded by low dissolved oxygen concentrations or high concentrations of chemical contaminants. The B-IBI includes several community measures and weights them equally using a simple scoring system that compares them against values expected for undegraded sites. It includes 11 measures of species diversity, productivity, indicator species and trophic composition. The EMAP-VP BI uses discriminant function coefficients to weight contributions of species diversity and the abundances of two indicator families. The two indices agreed on degraded or undegraded classifications for benthos at 81.3% of the sites. This level of agreement is within the level of accuracy achieved during index development and, therefore, may approach the limits that can be achieved. The indices were strongly associated (Pearson’s r ¼ 0.75). The B-IBI was more conservative than the EMAP-VP BI, classifying 72.7% of the disagreements as degraded. The 55 sites where the indices disagreed were distributed in different habitats throughout the Bay except polyhaline sand. Many of the classification disagreements were at sites with index values close to, but on opposite sides of, the degraded–undegraded thresholds, with 49.1% of the B-IBI values within 0.5 units and 81.8% within 1.0 units; the corresponding values for sites where both indices agreed were only 23.4% and 62.7%, respectively. The pattern for the EMAP-VP BI was similar, with 61.8% and 74.6% of disagreements and only 18.8% and 38.9% of agreements within 0.5 and 1.0 units of the threshold. Although the close agreement suggests that either index is suitable for evaluating the benthic condition, the B-IBI offers some additional advantages. Copyright # 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: An alternative SVIS is investigated that takes direct account of uncertainty and variation, and is statistically interpretable for the illustrative case of independently and identically normally distributed pollution data, using a more statistically appropriate and less conservative principle than the ‘divide-by-n’ rule.
Abstract: Current environmental standards commonly consist of a statement of some upper or lower limit to be met by pollutant levels ‘at large’ or of prescriptions of required outcomes from sampling procedures, with no consideration of the effects of uncertainty or variation. Barnett and O'Hagan (1997) classified these as ideal standards and realizable standards, respectively, concluding that neither form was satisfactory and recommending that they be replaced with statistically meaningful standards. Such standards should conjointly incorporate a regulatory limit coupled with a standard that provides some prescribed level of statistical assurance that the limit is actually being met, and are termed statistically verifiable ideal standards (SVISs). This recommendation was endorsed by the U.K. government Royal Commission on Environmental Pollution (1998). In many cases, realizable standards specify a particular sampling scheme and the outcome that is required from such a scheme to demonstrate compliance with the standard, but without consideration of the statistical interpretation of the outcome that might infer some property of the underlying population. The sampling methods take various forms and may feature composite sampling. One example of a set of realizable standards that consider composite sampling is found in the Australian Environmental Investigation Limits (EILs) for contaminated land sites (ANZECC/NHMRC, 1992). Under specific sampling guidelines (Standards Australia, 1997), composite sampling is allowed only if the conservative ‘divide-by-n’ principle is employed for adjustment of the standard limit. Using this set of standards and the associated sampling guidelines as motivation, we investigate an alternative SVIS that takes direct account of uncertainty and variation, and is statistically interpretable. The approach is developed initially for the illustrative case of independently and identically normally distributed pollution data, using a more statistically appropriate and less conservative principle than the ‘divide-by-n’ rule. We also consider other distributional assumptions for the pollution data. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, two new systems for modeling, estimating, and predicting a multivariate spatio-temporal process are given, one based on a linear or nonlinear within-cylinder trend function using minimum distance, and the other based on an autocovariance function of residual processes.
Abstract: Two new systems for modeling, estimating, and predicting a multivariate spatio–temporal process are given. In the first system, a cylinder containing a fraction of the multivariate data with axis along the temporal dimension is centered at a spatio–temporal prediction location. After fitting a linear or nonlinear within-cylinder trend function using minimum distance, an optimal prediction of the observable multivariate process is computed at the cylinder's center. The local nature of this system makes it suited to the analysis of large data sets often associated with ecosystem analyses. The second system consists of a global model of both the spatio–temporal trend and autocovariance function of the residual process. Each of these is formed by the weighted sum of component global models with weights computed from spatio–temporal kernels centered at each component's most characteristic spatio–temporal location. Minimum distance is again used to estimate all parameters. Predictions are then computed from this fitted global model. Spatio–temporal autocovariance within and between residual processes is modeled with parametric functions and need not be separable in space and time. The fast Fourier transform is used to approximately verify Hermitian positive semi-definiteness of the covariogram matrix. Asymmetry of the cross-covariogram in the temporal dimension is modeled as the sum of parametric even and odd functions. These functions have interpretable parameters. Long temporal memory is also modeled with an interpretable parametric function—obviating the need for fractional differencing across time. These covariance structure models are believed to be new. The use of a combined John–Draper and Box–Cox transformation allows continuous, count, categorical–ordinal, and categorical–nominal process data to be fitted within the same modeling, estimation, and prediction system, thus allowing mixed continuous–discrete multivariate predictions to be computed that take into account correlations between all constituent processes. Numerical examples consist of, first, the local system being used to redesign a spatial monitoring network with simulated data, and then both systems being applied to several different real data sets: multivariate spatio–temporal data (prediction of sulfate deposition in the conterminous U.S. with and without nitrate observations), multivariate temporal-only data (prediction of mink–muskrat pelt counts), and univariate temporal-only data with long memory (prediction of 5000 years of annual Lake Saki mud thickness). Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, a one-dimensional numerical model was developed and used to simulate the fate and transport of methyl bromide from a fumigated field, and three boundary conditions were used to assess their accuracy in predicting the volatilization rates.
Abstract: Due to concerns about public health and environmental contamination, there has been great interest in improving our understanding of the processes and mechanisms that affect pesticide emissions from fields. For many situations, predicting pesticide volatilization has been limited to simple situations that often neglect important environmental conditions such as changes in ambient temperature and/or the effect of micrometeorological conditions. Recent research has shown that changes in ambient temperature can strongly affect methyl bromide (Me Br) volatilization under field conditions. Little research has been conducted that couples atmospheric processes to the volatilization of pesticides from soils. A field study was conducted to measure the volatilization of methyl bromide from a 3.5 ha field. Four methods were used to obtain the volatilization rate as a function of time. A one-dimensional numerical model was developed and used to simulate the fate and transport of methyl bromide from the fumigated field. The numerical simulation simultaneously solves water, heat, and solute transport equations including chemical transport in the vapor phase. Three volatilization boundary conditions were used to assess their accuracy in predicting the volatilization rates. The first two boundary conditions follow stagnant boundary layer theory and use no atmospheric information. For these boundary conditions, one assumes isothermal conditions and the other assumes temperature-dependent conditions. The third boundary condition couples soil and atmospheric processes and was found to provide an accurate and credible simulation of the instantaneous volatilization rates compared to a stagnant boundary layer condition. For some information such as cumulative emissions, the simulations for each boundary condition provided similar results. This indicates that simplified methods may be appropriate for obtaining certain information.

Journal ArticleDOI
TL;DR: Design and model based approaches in sampling and experiments, and in particular studies which combine both elements, are examined in this article.
Abstract: Sampling generally concerns how a sample of units is selected from a population, while experiments deal with the effects of a treatment or exposure on units and are concerned with the assignment of treatments to units. Real studies typically involve elements of both, with varying control by investigators over sample selection and treatment assignment aspects. Design and model based approaches in sampling and experiments, and in particular studies which combine both elements, are examined in this article. Within a model based approach design based methods can be used based on a conditioning argument which is necessarily somewhat more complex in the case of experimental studies than in studies involving only sampling.

Journal ArticleDOI
TL;DR: In this paper, a parametric model which includes seasonal fractionally integrated components, self-exciting threshold autoregressive components, covariates and auto-gressive conditionally heteroscedastic errors with high tails is introduced.
Abstract: The problem of describing hourly data of ground ozone is considered. The complexity of high frequency environmental data dynamics often requires models covering covariates, multiple frequency periodicities, long memory, non-linearity and heteroscedasticity. For these reasons we introduce a parametric model which includes seasonal fractionally integrated components, self-exciting threshold autoregressive components, covariates and autoregressive conditionally heteroscedastic errors with high tails. For the general model, we present estimation and identification techniques. To show the model descriptive capability and its use, we analyse a five year hourly ozone data set from an air traffic pollution station located in Bergamo, Italy. The role of meteo and precursor covariates, periodic components, long memory and non-linearity is assessed. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This work addresses the issue of improving the forecasting capability of the single indices by combining them in a nonparametric multidimensional regression model, and applying discriminant analysis to the resulting predicted values.
Abstract: Forecasting aircraft (clear-air) turbulence is currently based on a system of observations by pilots combined with a mostly subjective evaluation of turbulence indices derived from numerical weather prediction models. We address the issue of improving the forecasting capability of the single indices by combining them in a nonparametric multidimensional regression model, and applying discriminant analysis to the resulting predicted values. Thus we enhance the predictive skills of the indices considered in isolation and provide a more robust algorithm. We adopt the paradigm of flexible discriminant analysis (FDA), and use multivariate adaptive regression splines (MARS) and neural networks (NN) in the regression stage. The data for this case study covers the period 12–15 March 1999, for the United States. Results of the analyses suggest that our statistical approach improves upon current practice to the point that it holds promise for operational forecasts. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, a contaminant plume is described by a function defined in space-time, which can quantify characteristics such as the total volume of the plume, the total concentration of the contaminant, rates of change of the volume, and rates of increase or decrease of concentration.
Abstract: SUMMARY A contaminant plume might be described by a function defined in space–time. Spatial integrals or time derivatives of this function as well as time derivatives of spatial integrals will quantify characteristics such as the total volume of the plume, the total concentration of the contaminant in the plume, rates of change of the volume, and rates of change of concentration. The plume function usually cannot be derived in analytic form but instead must be estimated or approximated. The dual form of the kriging estimator, which is equivalent to the use of radial basis functions, provides a tool for modeling this function in analytic form. The extension of the kriging estimator, in its usual form or in its dual form, to space–time poses no problems since the estimator and the equations are essentially dimension free. The difficulty is an adequate choice of space–time variograms or covariances. The product–sum and integrated product–sum models provide an extensive array of valid models and also lead to a simple process for fitting the models by the use of marginal variograms. Examples are given and an application to air pollution data from the Milan District (Italy) illustrates the method. Copyright # 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, a Bayesian Transformed Gaussian (BTG) random field model was used to model the nitrogen concentration in the Chesapeake Bay, which combines the Box-Cox family of power transformations and a spatial trend.
Abstract: Extreme concentrations of water quality variables can cause serious adverse effects in an ecosystem, making their detection an important environmental issue. In Chesapeake Bay, a decreasing gradient of total nitrogen concentration extends from the highest values in the north at the mouth of the Susquehanna river to the lowest values in the south near the Atlantic ocean. We propose a general definition of ‘hot spot’ that includes previous definitions and is appealing for processes with a spatial trend. We model these data using the Bayesian Transformed Gaussian (BTG) random field model proposed by De Oliveira et al. (1997), which combines the Box–Cox family of power transformations and a spatial trend. The median function is used as the measure of spatial trend, which offers some advantages over the customarily used mean function. The BTG model is fitted by an enhanced Monte Carlo algorithm, and the methodology is applied to the nitrogen concentration data. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of estimating the population mean and standard deviation based on a ranked set sample with some data being censored, and proposed maximum likelihood estimators when the data are assumed to follow a lognormal distribution.
Abstract: The ranked set sampling (RSS) technique has been shown to be superior to classical simple random sampling (SRS) in the sense that it always provides a more precise estimator of the population mean. However, it is quite often that some measurements are below the limit of detection and hence become censored. In such situations, the superiority of RSS over SRS may no longer be held. In this article we consider the problem of estimating the population mean and standard deviation based on a ranked set sample with some data being censored. Maximum likelihood estimators are proposed when the data are assumed to follow a lognormal distribution. In the case where the distribution is unknown, a variant of the Kaplan–Meier estimator is proposed in the estimation of the population mean. A simulation study is conducted to compare the performance of the proposed RSS estimators with the corresponding SRS estimators. The impact of imperfect judgment ranking is also discussed. The proposed methods are applied to a real data set on mercury concentration in swordfish. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, the authors proposed the jackknife procedure in order to reduce the bias of the diversity index estimator for finite samples, where the study area is partitioned into a frame of sub-areas, and a suitable design is constituted by adaptive sampling.
Abstract: The use of diversity indices is adopted in surveys on biological population to quantify species diversity. However, when the population is clustered and spread in a very wide area, usual sampling designs provide estimators with large variances. In this case, if the study area is partitioned into a frame of sub-areas, a suitable design is constituted by adaptive sampling. The adaptive sampling ensures that the abundance vector estimator is unbiased and more accurate than that obtained with simple random sampling. However, the corresponding diversity index estimator, which can be viewed as a function of the abundance vector estimator, is biased for finite samples. Accordingly, we propose the jackknife procedure in order to reduce the bias.

Journal ArticleDOI
TL;DR: In this article, nonparametric estimators for the distribution function and the mean of Y utilizing the concomitant variable and auxiliary information in a ranked set sampling setup are proposed.
Abstract: The method of ranked set sampling is widely applicable in environmental research mainly in the estimation of the mean and distribution function of the variable of interest, Y. Ranking of the Ys by visual judgment may be imperfect sometimes. When the Ys are expensive to measure, it would be more convenient to determine the ‘rankings’ of the Ys by a concomitant variable, X, which is relatively easy and cheap to make measurements. The information carried in X is not utilized in all estimation methods available in the literature except in determining the rankings of Ys unless extra distributional or linearity assumptions are made. However, these assumptions may be too stringent in environmental research. Nonparametric estimators for the distribution function and the mean of Y utilizing the concomitant variable and auxiliary information in a ranked set sampling setup are proposed in this article. The estimators are robust to model misspecification, and the performance of the estimators is highly satisfactory, supported by some simulation studies. The estimators are applied to a real data set to estimate the mean and distribution function of plutonium concentration in surface soil on the Nevada Test Site, Nevada, U.S.A. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors examined three types of double sampling methods (ranked set sampling, weighted double sampling and double sampling with ratio estimation), with accompanying examples from Oregon stream habitat data.
Abstract: Environmental sampling can be difficult and expensive to carry out. Those taking the samples would like to integrate their knowledge of the system of study or their judgment about the system into the sample selection process to decrease the number of necessary samples. However, mere convenience or non-random sampling can severely limit statistical inference. Methods do exist that integrate prior knowledge into a random sampling procedure that allows for valid statistical inference. Double sampling methods use this extra information to select samples for measurement, thus reducing the number of necessary samples (in order to achieve a desired objective) and thereby reducing sampling costs. The level of prior information required can range from a linear relationship with a known auxiliary variable to simple ranking based on auxiliary information. We examine three types of double sampling methods (ranked set sampling, weighted double sampling and double sampling with ratio estimation), with accompanying examples from Oregon stream habitat data. All three methods can provide increased precision and/or lower sampling costs over simple random sampling. The appropriate double sampling method for the data and research situation depends upon the type of prior information available. The categories of prior information are summarized in a table and illustrated using the example data.

Journal ArticleDOI
TL;DR: This incomplete overview hopes to encourage stochasticians to put more interest in wind problems and it is hoped that meteorologists will help them with high quality data, needed in the verification of the offered models.
Abstract: This article is an attempt to summarize some of the existing problems in stochastic aspects of wind. Different types of wind are listed with their specific properties. For most of them no statistical model or stochastic process has been constructed as yet. At the same time, existing problems with data are very diverse and possible improvements are proposed. For example, the quality of wind speed and wind direction data might be upgraded by a careful inclusion of measurable covariates while developing models. Other problems are dealing with extreme winds that can hardly be measured accurately. In this connection, interesting and important questions for insurance companies and for construction engineers can be tackled by applying extreme value theory. This incomplete overview hopes to encourage stochasticians to put more interest in wind problems. At the same time it is hoped that meteorologists will help them with high quality data, needed in the verification of the offered models. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, two stochastic models that capture the main features of daily exposure of global radiation in Kuwait are proposed, which are based on removing the annual periodicity and seasonal variation of solar radiation.
Abstract: Two stochastic models that capture the main features of daily exposure of global radiation in Kuwait are proposed. The development of these models is based on removing the annual periodicity and seasonal variation of solar radiation. Thus the daily radiation is decomposed as the sum of the trend component and a stochastic component. In many situations, there are dramatic changes in the radiation series through the year due to the condition of the weather, as is the case of the data from Kuwait. This would affect the accuracy of the model, and therefore the series is divided into two regimes: one corresponds to clear days where the value of the global radiation would be normal and the other to non-clear days where the value of global radiation would be very low. Then the trend component is expressed as a Fourier series taking into account such apparent breaks in the series. The stochastic component is first tested for linearity and Gaussianity and it is found that it does not satisfy these assumptions. Therefore, a linear time series model (ARMA modeling) may not be adequate and, to overcome this problem, a bilinear time series is used to model the stochastic component of daily global radiation in Kuwait. The method proposed considers first fitting an AR model to the data and then seeing whether a further reduction in the mean sum of squares can be achieved by introducing extra bilinear terms. The Akaike Information Criterion (AIC) is used to select the best model. Copyright © 2002 John Wiley & Sons, Ltd.