scispace - formally typeset
Search or ask a question

Showing papers on "Sampling (statistics) published in 1987"


Journal ArticleDOI
TL;DR: A set of very simple estimators of efficiency are presented and illustrated with a variety of biological examples and a nomogram for predicting the necessary number of points when performing point counting is provided.
Abstract: The superior efficiency of systematic sampling at all levels in stereological studies is emphasized and various commonly used ways of implementing it are briefly described. Summarizing recent theoretical and experimental studies a set of very simple estimators of efficiency are presented and illustrated with a variety of biological examples. In particular, a nomogram for predicting the necessary number of points when performing point counting is provided. The very efficient and simple unbiased estimator of the volume of an arbitrary object based on Cavalieri's principle is dealt with in some detail. The efficiency of the systematic fractionating of an object is also illustrated.

3,396 citations


Book
01 Jan 1987
TL;DR: In this paper, the authors proposed three-stage sampling: simple random sampling, two stage sampling and three stage sampling, and two-stage and double sampling, respectively, to estimate the mean and variance from censored data sets.
Abstract: Sampling Environmental Populations. Environmental Sampling Design. Simple Random Sampling. Stratified Random Sampling. Two-Stage Sampling. Compositing and Three-Stage Sampling. Systematic Sampling. Double Sampling. Locating Hot Spots. Quantiles, Proportions, and Means. Skewed Distributions and Goodness-of-Fit Tests. Characterizing Lognormal Populations. Estimating the Mean and Variance from Censored Data Sets. Outlier Detection and Control Charts. Detecting and Estimating Trends. Trends and Seasonality. Comparing Populations. Appendices. Symbols. Glossary. Bibliography. Index.

2,253 citations


Book
23 Nov 1987
TL;DR: The design of experiments, analysis of the means of small samples using the t-c Distribution, and selection of the statistical method for clinical measurement and the structure of human populations are reviewed.
Abstract: Introduction The design of experiments Sampling and observational studies Summarizing data Presenting data Probability The Normal Distribution Estimation, standard error, and confidence intervals Significance tests Analysis of the means of small samples using the t-c Distribution Choosing the statistical method Clinical measurement Mortality statistics and the structure of human populations Solutions to exercises.

2,245 citations


Journal ArticleDOI
TL;DR: In this paper, a method for producing Latin hypercube samples when the components of the input variables are statistically dependent is described, and the estimate is also shown to be asymptotically normal.
Abstract: Latin hypercube sampling (McKay, Conover, and Beckman 1979) is a method of sampling that can be used to produce input values for estimation of expectations of functions of output variables. The asymptotic variance of such an estimate is obtained. The estimate is also shown to be asymptotically normal. Asymptotically, the variance is less than that obtained using simple random sampling, with the degree of variance reduction depending on the degree of additivity in the function being integrated. A method for producing Latin hypercube samples when the components of the input variables are statistically dependent is also described. These techniques are applied to a simulation of the performance of a printer actuator.

1,750 citations


Journal ArticleDOI
TL;DR: In this paper, maximum entropy sampling is used to sample the maximum entropy of a set of maximum entropy samples from a single maximum entropy sample set, and the sample set is used for maximum entropy estimation.
Abstract: (1987). Maximum entropy sampling. Journal of Applied Statistics: Vol. 14, No. 2, pp. 165-170.

553 citations



Book ChapterDOI
01 Jan 1987
TL;DR: For instance, survey interviews have become the dominant method of data collection in empirical social research (Phillips, 1971; Kaase, Ott, & Scheuch, 1983); however, despite the popularity of the survey interview, the processes underlying the responses to survey questions are not well understood and a "theory of asking questions" has never been developed.
Abstract: Since the early 1940s, survey interviews have become the dominant method of data collection in empirical social research (Phillips, 1971; Kaase, Ott, & Scheuch, 1983). Despite the popularity of the survey interview, the processes underlying the responses to survey questions are not well understood, and a ”theory of asking questions” (Hyman, 1954) has never been developed. Thus, survey methodology today is characterised by rigorous knowledge about sampling procedures on the one hand, and a surprising lack of knowledge about the ”art” (sicl) of asking questions (e.g., Noelle-Neumann, 1963) on the other hand. Unfortunately, however, empirical research (e.g., Sudman & Bradburn, 1974) suggests that nonsampling error provides considerable limitations to the usefulness of survey data.

417 citations



Journal ArticleDOI
TL;DR: In this article, a procedure has been developed which uses environmental data to predict the probabilities of macro-invertebrate taxa occurring at running-water sites in Great Britain.
Abstract: SUMMARY. 1. A procedure has been developed which uses environmental data to predict the probabilities of macro-invertebrate taxa occurring at running-water sites in Great Britain. 2. Biological, physical and chemical data were collected from twenty- one sites on three river systems in order to evaluate the procedure. 3. For most sites the number and type of taxa recorded, using a standard sampling programme, were very close to those predicted using twenty-eight environmental variables. 4. Comparison with other studies at the same sites showed that most taxa whose probability of occurrence was ≥0.5 could be found with more intensive sampling. 5. Reducing the number of variables used in making the predictions from twenty-eight to five resulted in only a slight loss of predictive accuracy. 6. Combinations of chemical and physical variables gave better predictions than equivalent numbers of physical variables only but the latter may be more appropriate where chemical pollution is known, or suspected to occur. 7. The procedure is of practical value in the detection and assessment of pollution. 8. It may also be used to explore patterns in the structure and functioning of stream communities.

334 citations


Journal ArticleDOI
TL;DR: In this article, the authors used a rating curve to predict unmeasured river loads from continuous discharge data but relatively infrequent sampling of sediment, solute, or pollutant concentrations.
Abstract: River loads often have to be estimated from continuous discharge data but relatively infrequent sampling of sediment, solute, or pollutant concentrations. Two standard ways of doing this are to multiply mean concentration by mean discharge, and to use a rating curve to predict unmeasured concentrations. Both methods are known from previous empirical studies to underestimate true load. Statistical considerations explain these biases and yield correction factors which can be used to obtain unbiased estimates of load. Simulation experiments with normally-distributed scatter about log-linear trends, and sampling experiments using a natural data set, show that the corrected rating curve method has lower sampling variability than other unbiased methods based on average instantaneous load and is thus the recommended procedure when the rating plot is of the assumed form. The precision of all methods increases with sample size and decreases with increasing rating-curve slope and scatter.

293 citations


Book
01 Jan 1987
TL;DR: In this article, the authors present a list of research designs for representing, modeling, and sampling of data in statistical design, including: 1. Representation, Randomization, and Realism.
Abstract: Chapter and Section Contents.Tables and Figures.1. Representation, Randomization, and Realism.1.1 Three Criteria.1.2 Four Classes of Variables.1.3 Surveys, Experiments, and Controlled Investigations.1.4 Randomization of Subjects Over Treatments and Over Populations.1.5 Statistical Tests.1.6 An Ordered List of Research Designs.1.7 Representation and Probability Sampling.1.8 Model-Dependent Inference.2. Analytical Use of Sample Surveys.2.1 Populations of Elements and Sampling Units.2.2 Inferences from Complex Samples.2.3 Domains and Subclasses: Classifications.2.4 Overview of Subclass Effects.2.5 Proportionate Stratified Element Sampling (PRES).2.6 Cluster Sampling.2.7 Four Obstacles to Representation in Analytic Studies.3. Designs for Comparisons.3.1 Substitutes for Probability Sampling.3.2 Basic Modules for Comparisons.3.3 Four Modules: Costs, Variances, Bias Sources.3.4 Five Basic Designs for Comparisons.3.5 Classification for 22 Sources of Bias.3.6 Time Curves of Responses.3.7 Evaluation Research.4. Controls for Disturbing Variables.4.1 Control Strategies.4.2 Analysis in Separate Subclasses.4.3 Selecting Matched Units.4.4 Matched Subclasses.4.5 Standardization: Adjustment by Weighting Indexes.4.6 Covariances and Residuals from Linear Regressions Categorical Data Analyses.4.7 Ratio Estimates.5. Samples and Censuses.5.1 Censuses and Researchers.5.2 Samples Compared to Censuses.5.3 Samples Attached to Censuses.6. Sample Designs Over Time.6.1 Technology and Concepts.6.2 Purposes and Designs for Periodic Samples.6.3 Changing and Mobile Populations.6.4 Panel Effects.6.5 Split-Panel Designs.6.6 Cumulating Cases and Combining Statistics from Samples.7. Several Distinct Problems of Design.7.1 Analytical Statistics from Complex Samples.7.2 Generalizations Beyond the Modules of 3.3.7.3 Multipurpose Designs.7.4 Weighted Means: Selection, Bias, Variance.7.5 Observational Units of Variable Sizes.7.6 On Falsifiability in Statistical Design.Problems.References.Index.

Journal ArticleDOI
TL;DR: In this article, the free induction decays are sampled exponentially, using many points where the signal-to-noise ratio (S/N) is high and a few where it is low.

Journal ArticleDOI
TL;DR: An expression for the noise power spectrum of images reconstructed by the discrete filtered backprojection algorithm has been derived, which explicitly includes sampling within the projections, angular sampling, and the two-dimensional sampling implicit in the discrete representation of the image.
Abstract: An expression for the noise power spectrum of images reconstructed by the discrete filtered backprojection algorithm has been derived. The formulation explicitly includes sampling within the projections, angular sampling, and the two-dimensional sampling implicit in the discrete representation of the image. The effects of interpolation are also considered. Noise power spectra predicted by this analysis differ from those predicted using continuous theory in two respects: they are rotationally asymmetric, and they do not approach zero at zero frequency. Both of these properties can be attributed to two-dimensional aliasing due to pixel sampling. The predictions were confirmed by measurement of noise power spectra of both simulated images and images from a commercial x-ray transmission CT scanner.


Journal ArticleDOI
TL;DR: In this paper, the authors examined the geometrical relationship of three sampling designs, namely the square, the equilateral triangle, and the regular hexagon, and compared the maximum mean square error for each of these designs.
Abstract: Although several researchers have pointed out some advantages and disadvantages of various soil sampling designs in the presence of spatial autocorrelation, a more detailed study is presented herein which examines the geometrical relationship of three sampling designs, namely the square, the equilateral triangle, and the regular hexagon. Both advantages and disadvantages exist in the use of these designs with respect to estimation of the semivariogram and their effect on the mean square error or variance of error. This research could be used to design optimal sampling strategies; it is based on the theory of regionalized variables, in which the intrinsic hypothesis is satisfied. Among alternative designs, an equilateral triangle design gives the most reliable estimate of the semivariogram. It also gives the minimum maximum mean square error of point estimation of the concentration over the other two designs for the same number of measurements when the nugget effect is small relative to the variance. If the nugget effect is large (.90 σ 2 or more), and the linear sampling density is >0.85r where r is the range, the hexagonal design is best. This study computes and compares the maximum mean square error for each of these designs.

Journal ArticleDOI
TL;DR: In this paper, a method is presented and demonstrated for optimizing the selection of sample locations for variogram estimation, where the distribution of distance classes is decided a priori and the problem therefore is to closely approximate the preselected distribution, although the dispersion within individual classes can also be considered.
Abstract: A method is presented and demonstrated for optimizing the selection of sample locations for variogram estimation. It is assumed that the distribution of distance classes is decided a priori and the problem therefore is to closely approximate the preselected distribution, although the dispersion within individual classes can also be considered. All of the locations may be selected or points added to an existing set of sites or to those chosen on regular patterns. In the examples, the sum of squares characterizing the deviation from the desired distribution of couples is reduced by as much as 2 orders of magnitude between random and optimized points. The calculations may be carried out on a microcomputer. Criteria for what constitutes best estimators for variogram are discussed, but a study of variogram estimators is not the object of this paper.


Journal ArticleDOI
TL;DR: In this paper, the concentrations of eight dissolved trace metals were measured in the lower Mississippi River seven times at various flow stages during a two-year interval using trace metal sampling and analysis techniques demonstrated to be reliable.


Journal ArticleDOI
TL;DR: The data suggest that reserve status is causal in these differing abundance and size structure estimates for the large, common, reef fish Cheilodactylus spectabilis (Hutton).
Abstract: Total abundance estimates for the large, common, reef fish Cheilodactylus spectabilis (Hutton) were obtained for a marine reserve and adjacent section of coast in north-eastern New Zealand during 1985. Visual strip-transects were used to estimate abundance and size structure in both areas. The accuracy, precision and cost efficiency of five transect sizes (500, 375, 250, 100, 75 m2) were examined over three times per day (dawn, midday and dusk), by simulating transects over mapped C. spectabilis populations. Two transect sizes showed similarly high efficiency. The smaller of the two (20x5 m) was chosen for the survey because of the general advantages attributable to small sampling units. Biases related to strip-transect size are discussed. Preliminary sampling indicated that C. spectabilis was distributed heterogeneously, and that density was habitat-related. An optimal stratified-random design was employed in both locations, to obtain total abundance and size-structure estimates. This reduced the between-habitat source of variability in density. The total number of sampling units used was governed by the time available. The resulting total abundance estimates obtained were 18 338±2 886 (95% confidence limit) for the 5 km marine reserve, compared to 3 987±1 117 for an adjacent, heavily fished 4 km section of coast. When corrected for total area and habitat area sampled, this represented a 2.3-fold difference in abundance. If sampling had been designed to detect an arbitrary 10% difference in abundance within each habitat, an infeasible 440 h of sampling would have been required. Size-frequency distributions of C. spectabilis at the reserve had a larger model size class than distributions from the adjacent area. The data suggest that reserve status is causal in these differing abundance and size structure estimates.

Journal ArticleDOI
01 Apr 1987-Ecology
TL;DR: Cost-efficient ecological and natural resource surveys need: (1) flexible, logistically simple, and statistically sound sampling methods, and (2) sensitive, computationallysimple, and ecologically robust data analysis methods.
Abstract: Cost-efficient ecological and natural resource surveys need: (1) flexible, logistically simple, and statistically sound sampling methods, and (2) sensitive, computationally simple, and ecologically robust data analysis methods. "Gradsects" (gradient-oriented transects) have recently been shown to be a more efficient sampling method than random sampling using nongradient-oriented transects, especially when surveying large, biologically diverse areas (Gillison and Brewer 1985). If biotic (e.g., vegetation) sampling is oriented along a defined environmental gradient, and the purpose is classification, then simple, efficient analysis techniques are needed. This note briefly reviews appropriate analysis techniques in order to identify ecotones or boundaries along gradsects. If the vegetation survey or resource inventory is made by sampling at fixed intervals along a steep gradient, then the sampling units (e.g., plots, quadrats, lines) may traverse different biotic zones reflecting underlying environmental (e.g., topographic, soil) discontinuities. And, if the data set obtained at each sampling unit is multivariate (e.g., many species are observed), then these data can be used to locate ecotones separating different biotic zones along the gradsect. A simple but robust method for locating the boundaries or ecotones between communities sampled along gradsects is the computation of moving split-window distances, a procedure described by Whittaker (1960) who was working on "... quantitative methods by which relative discontinuities . .. might be objectively revealed from the transect tables" for his Siskiyou Mountains gradsect data. The basic procedure is: (1) obtain multivariate gradsect data by sampling along a defined gradient; (2) bracket or block a set of sampling positions into a window of preassigned width (i.e., including the data for two or more adjacent sampling positions, as for calculating moving averages; Legendre and Legendre 1983:344); (3) split this window of transect samples into two equal groups; (4) average the data for each variate within each group; (5) compute a distance or dissimilarity between these two groups (Legendre and Legendre 1983:Chapter 6); (6) move the window one position further along the gradsect and compute another distance; and (7) after moving the split-window along the gradsect from one end to the other, with a distance computed for each window midpoint position, plot distances (ordinate) against gradsect positions (abscissa). Sharp, high peaks identify the location of boundaries between adjacent biotic community zones. For continuous gradations the expected graph would be ". . . points generally at the same level, but with some zig-zag up and down due to chance variations in stand composition" (Whittaker 1960). As an example, Wierenga et al. (1987) used both squared Euclidean distances (SED) and Hotelling-Lawley trace F values (HLF) with the moving split-window procedure to examine the coincidence of boundaries separating vegetation zones and soil zones along a gradsect in the northern Chihuahuan Desert. Seven vegetation zones were revealed by distance peaks (Fig. 1), which were strongly coincident with eight soil series zones. These results substantiated earlier vegetationsoil studies in the area (Stein and Ludwig 1979). A window width of 10 revealed a smoother pattern of distance peaks, whereas a width of 2 (the SED between adjacent gradsect positions) had greater sample-tosample noise, as expected. Although SED was used, several other distance coefficients were examined and gave similar results. Any one of several association, similarity, distance, or dependence coefficients (Legendre and Legendre 1983:Chapter 6) could have been used. When defining soil zones using HLF, Wierenga et al. (1987) reduced the original data set to a smaller set with principal components analysis, as described by Webster (1973). Relative to the simple computation of SED, the HLF calculation is more powerful, but HLF is more complex and is limited to fewer variables (e.g., principal components) and to wider window widths (Wierenga et al. 1987). Many variables can be used to compute SEDs, with a larger number of variables likely to produce a more accurate distance. If a large number of variables is used to compute HLF, window width must also be large, which may obscure boundaries because the window may include two or more boundaries (Webster 1973). In our analysis, varying window width from 6 to 10 did not appreciably affect the interpretation of boundary locations, only the emphasis of certain peaks. Abrupt shifts in community types are evident as high, narrow peaks (e.g., as between zones 5 and 6 in Fig. 1), whereas gradual ecotone shifts or fuzzy community boundaries are evident by wider and lower SED peaks (e.g., as between zones 3 and 4 in Fig. 1). The technique described above is generally similar, but differs inP detail from, other boundary analysis techniques described in the literature. Beals (1969) used the coefficient of dissimilarity or percentage difference (Legendre and Legendre 1983:201) to compare successive adjacent segments of five samples (a split-window with a window width of 10) for two altitudinal


Journal ArticleDOI
TL;DR: In this article, an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation is presented, and several principles account for the observed influence of sensitivities on parameter uncertainty.
Abstract: The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.

Journal ArticleDOI
TL;DR: This paper examined the effect of disproportionate sampling on estimates of recreational fishing demand with data from the 1980 National Survey of Fishing, Hunting, and Wildlife-Associated Recreation (1980 Survey) and found that household characteristics determining sampling ratios have little relation to the level of fishing by household members.
Abstract: Data on recreation activity often are obtained from national surveys using stratified, disproportionate sampling. One such survey is the 1980 National Survey of Fishing, Hunting, and Wildlife‐Associated Recreation (1980 Survey). This paper examines the effect of disproportionate sampling on estimates of recreational fishing demand with data from the 1980 Survey. Contrary to some expectations in the literature, disproportionate sampling appears to cause few problems for demand estimation with the 1980 Survey. The evidence suggests that household characteristics determining sampling ratios have little relation to the level of fishing by household members.



Journal ArticleDOI
TL;DR: Unified methods for incorporating misclassification information and general variance expressions into analyses based on log-linear models and maximum likelihood estimation are presented.
Abstract: Misclassification is a common source of bias and reduced efficiency in the analysis of discrete data. Several methods have been proposed to adjust for misclassification using information on error rates (i) gathered by resampling the study population, (ii) gathered by sampling a separate population, or (iii) assumed a priori. We present unified methods for incorporating these types of information into analyses based on log-linear models and maximum likelihood estimation. General variance expressions are developed. Examples from epidemiologic studies are used to demonstrate the proposed methodology.

Journal ArticleDOI
TL;DR: In this article, the authors used both classical and Bayesian viewpoints to estimate the exceedance probabilities of the largest floods using only their rank, the number of observed historical floods, and the lengths of the historical period and the systematic record.
Abstract: Plotting positions are needed for situations where, in addition to a systematically recorded annual flood series, one would have a record of any large floods which occurred during an extended historical period, if they occurred. Many of the published estimators are based on uncensored sampling theory which is not appropriate for such data sets. Here such historical and systematic flood records are viewed as resulting from a partially censored sampling experiment. Plotting positions are derived for such experiments using both classical and Bayesian viewpoints. In general, it is impossible to construct highly accurate estimates of the exceedance probabilities of the largest floods using only their rank, the number of observed historical floods, and the lengths of the historical period and the systematic record. For the largest flood, the coefficient of variation of exceedance-probability estimators is of the order of 1, as it is for complete systematic records. Examples illustrate the bias and precision of a variety of plotting position formulas. The differences among the different plotting positions are generally small in comparison to the sampling variability. However, plotting positions which are unbiased with uncensored samples are often the most biased when used with a combination of historical and systematic data. Three appendices consider the effect of misspecification of the length of the historical period, the effect of misspecification of the threshold of perception or observation level, and plotting positions for situations with several perception thresholds.

Journal ArticleDOI
P.M. Hahn1, M. Jeruchim1
TL;DR: It is found that the variance improvement may be severely limited by the dimensionality of the system, and a means for circumventing this limitation is described through the definition of a statistically equivalent impulse response.
Abstract: The assessment of bit error rate (BER) performance of a digital communication system via computer simulation has traditionally been done using the Monte Carlo method. For very low BER, this method requires excessive computer time. This time can be substantially reduced by using extrapolation based on importance sampling (IS). In applying IS to a complex system, many considerations must be addressed, chief among which is the reliability (variance) of the estimator as a function of the system particulars. We discuss a number of these considerations and, specifically, derive a number of expressions for the variance. We find that the variance improvement may be severely limited by the dimensionality (or memory) of the system. We describe a means for circumventing this limitation through the definition of a statistically equivalent impulse response. For a linear system, this amounts to the ordinary impulse response. The simulation can be structured to estimate the equivalent impulse response using statistical regression. This new approach has been implemented and found to yield significant runtime improvement over conventional importance sampling for linear systems of large dimensionality. We believe this technique will work also for mildly nonlinear systems, as might be encountered in typical satellite Communications.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an ideal sampling approach (ISA) for elucidating, formulating, and predicting minimum disturbance effects in deep tube samples of saturated clays, which relies on approximate solutions based on the strain path method to incorporate tube penetration disturbances.
Abstract: The “ideal sampling approach” (ISA) for elucidating, formulating, and predicting minimum disturbance effects in deep tube samples of saturated clays is proposed. The ISA relies on approximate solutions based on the strain path method to incorporate tube penetration disturbances. Laboratory test results on normally consolidated Boston blue clay indicate that the ISA provides more realistic predictions than the existing perfect sampling approach and that tube penetration disturbances are significant in “undisturbed” tube samples of soft clays obtained by means of existing thin‐walled sampling techniques.