scispace - formally typeset
Search or ask a question
Topic

Sampling bias

About: Sampling bias is a research topic. Over the lifetime, 1075 publications have been published within this topic receiving 52895 citations. The topic is also known as: ascertainment bias & biased sample.


Papers
More filters
Journal ArticleDOI
TL;DR: Seven major types of sampling for observational studies of social behavior have been found in the literature and the major strengths and weaknesses of each method are pointed out.
Abstract: Seven major types of sampling for observational studies of social behavior have been found in the literature. These methods differ considerably in their suitability for providing unbiased data of various kinds. Below is a summary of the major recommended uses of each technique: In this paper, I have tried to point out the major strengths and weaknesses of each sampling method. Some methods are intrinsically biased with respect to many variables, others to fewer. In choosing a sampling method the main question is whether the procedure results in a biased sample of the variables under study. A method can produce a biased sample directly, as a result of intrinsic bias with respect to a study variable, or secondarily due to some degree of dependence (correlation) between the study variable and a directly-biased variable. In order to choose a sampling technique, the observer needs to consider carefully the characteristics of behavior and social interactions that are relevant to the study population and the research questions at hand. In most studies one will not have adequate empirical knowledge of the dependencies between relevant variables. Under the circumstances, the observer should avoid intrinsic biases to whatever extent possible, in particular those that direcly affect the variables under study. Finally, it will often be possible to use more than one sampling method in a study. Such samples can be taken successively or, under favorable conditions, even concurrently. For example, we have found it possible to take Instantaneous Samples of the identities and distances of nearest neighbors of a focal individual at five or ten minute intervals during Focal-Animal (behavior) Samples on that individual. Often during Focal-Animal Sampling one can also record All Occurrences of Some Behaviors, for the whole social group, for categories of conspicuous behavior, such as predation, intergroup contact, drinking, and so on. The extent to which concurrent multiple sampling is feasible will depend very much on the behavior categories and rate of occurrence, the observational conditions, etc. Where feasible, such multiple sampling can greatly aid in the efficient use of research time.

12,470 citations

Journal ArticleDOI
TL;DR: In this paper, a simulation study is used to determine the influence of different sample sizes at the group level on the accuracy of the estimates (regression coefficients and variances) and their standard errors.
Abstract: An important problem in multilevel modeling is what constitutes a sufficient sample size for accurate estimation. In multilevel analysis, the major restriction is often the higher-level sample size. In this paper, a simulation study is used to determine the influence of different sample sizes at the group level on the accuracy of the estimates (regression coefficients and variances) and their standard errors. In addition, the influence of other factors, such as the lowest-level sample size and different variance distributions between the levels (different intraclass correlations), is examined. The results show that only a small sample size at level two (meaning a sample of 50 or less) leads to biased estimates of the second-level standard errors. In all of the other simulated conditions the estimates of the regression coefficients, the variance components, and the standard errors are unbiased and accurate.

2,931 citations

Journal ArticleDOI
TL;DR: It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.
Abstract: Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.

2,307 citations

Journal ArticleDOI
TL;DR: This work uses causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish over adjustment bias from unnecessary adjustment.
Abstract: Overadjustment is defined inconsistently. This term is meant to describe control (eg, by regression adjustment, stratification, or restriction) for a variable that either increases net bias or decreases precision without affecting bias. We define overadjustment bias as control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome. We define unnecessary adjustment as control for a variable that does not affect bias of the causal relation between exposure and outcome but may affect its precision. We use causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish overadjustment bias from unnecessary adjustment. Using simulations, we quantify the amount of bias associated with overadjustment. Moreover, we show that this bias is based on a different causal structure from confounding or selection biases. Overadjustment bias is not a finite sample bias, while inefficiencies due to control for unnecessary variables are a function of sample size.

1,480 citations

Journal ArticleDOI
TL;DR: This work provides a worked example of spatial thinning of species occurrence records for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.
Abstract: Spatial thinning of species occurrence records can help address problems associated with spatial sampling biases. Ideally, thinning removes the fewest records necessary to substantially reduce the effects of sampling bias, while simultaneously retaining the greatest amount of useful information. Spatial thinning can be done manually; however, this is prohibitively time consuming for large datasets. Using a randomization approach, the ‘thin’ function in the spThin R package returns a dataset with the maximum number of records for a given thinning distance, when run for sufficient iterations. We here provide a worked example for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.

1,016 citations


Network Information
Related Topics (5)
Regression analysis
31K papers, 1.7M citations
82% related
Inference
36.8K papers, 1.3M citations
81% related
Sampling (statistics)
65.3K papers, 1.2M citations
80% related
Linear regression
21.3K papers, 1.2M citations
79% related
Population
2.1M papers, 62.7M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202322
202258
202187
202074
201966
201859