Topic

Sampling bias

About: Sampling bias is a research topic. Over the lifetime, 1075 publications have been published within this topic receiving 52895 citations. The topic is also known as: ascertainment bias & biased sample.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Observational study of behavior: sampling methods.

[...]

Jeanne Altmann¹•Institutions (1)

University of Chicago¹

01 Jan 1974-Behaviour

TL;DR: Seven major types of sampling for observational studies of social behavior have been found in the literature and the major strengths and weaknesses of each method are pointed out.

...read moreread less

Abstract: Seven major types of sampling for observational studies of social behavior have been found in the literature. These methods differ considerably in their suitability for providing unbiased data of various kinds. Below is a summary of the major recommended uses of each technique: In this paper, I have tried to point out the major strengths and weaknesses of each sampling method. Some methods are intrinsically biased with respect to many variables, others to fewer. In choosing a sampling method the main question is whether the procedure results in a biased sample of the variables under study. A method can produce a biased sample directly, as a result of intrinsic bias with respect to a study variable, or secondarily due to some degree of dependence (correlation) between the study variable and a directly-biased variable. In order to choose a sampling technique, the observer needs to consider carefully the characteristics of behavior and social interactions that are relevant to the study population and the research questions at hand. In most studies one will not have adequate empirical knowledge of the dependencies between relevant variables. Under the circumstances, the observer should avoid intrinsic biases to whatever extent possible, in particular those that direcly affect the variables under study. Finally, it will often be possible to use more than one sampling method in a study. Such samples can be taken successively or, under favorable conditions, even concurrently. For example, we have found it possible to take Instantaneous Samples of the identities and distances of nearest neighbors of a focal individual at five or ten minute intervals during Focal-Animal (behavior) Samples on that individual. Often during Focal-Animal Sampling one can also record All Occurrences of Some Behaviors, for the whole social group, for categories of conspicuous behavior, such as predation, intergroup contact, drinking, and so on. The extent to which concurrent multiple sampling is feasible will depend very much on the behavior categories and rate of occurrence, the observational conditions, etc. Where feasible, such multiple sampling can greatly aid in the efficient use of research time.

...read moreread less

12,470 citations

Journal Article•DOI•

Sufficient Sample Sizes for Multilevel Modeling

[...]

Cora J. M. Maas¹, Joop J. Hox¹•Institutions (1)

Utrecht University¹

01 Jan 2005-Methodology: European Journal of Research Methods for The Behavioral and Social Sciences

TL;DR: In this paper, a simulation study is used to determine the influence of different sample sizes at the group level on the accuracy of the estimates (regression coefficients and variances) and their standard errors.

...read moreread less

Abstract: An important problem in multilevel modeling is what constitutes a sufficient sample size for accurate estimation. In multilevel analysis, the major restriction is often the higher-level sample size. In this paper, a simulation study is used to determine the influence of different sample sizes at the group level on the accuracy of the estimates (regression coefficients and variances) and their standard errors. In addition, the influence of other factors, such as the lowest-level sample size and different variance distributions between the levels (different intraclass correlations), is examined. The results show that only a small sample size at level two (meaning a sample of 50 or less) leads to biased estimates of the second-level standard errors. In all of the other simulated conditions the estimates of the regression coefficients, the variance components, and the standard errors are unbiased and accurate.

...read moreread less

2,931 citations

Journal Article•DOI•

Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data

[...]

Steven J. Phillips¹, Miroslav Dudík², Jane Elith³, Catherine H. Graham⁴, Anthony Lehmann⁵, John R. Leathwick⁶, Simon Ferrier - Show less +3 more•Institutions (6)

AT&T Labs¹, Princeton University², University of Melbourne³, Stony Brook University⁴, University of Geneva⁵, National Institute of Water and Atmospheric Research⁶

01 Jan 2009-Ecological Applications

TL;DR: It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.

...read moreread less

Abstract: Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.

...read moreread less

2,307 citations

Journal Article•DOI•

Overadjustment bias and unnecessary adjustment in epidemiologic studies.

[...]

Enrique F. Schisterman¹, Stephen R. Cole², Robert W. Platt³•Institutions (3)

National Institutes of Health¹, University of North Carolina at Chapel Hill², McGill University³

01 Jul 2009-Epidemiology

TL;DR: This work uses causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish over adjustment bias from unnecessary adjustment.

...read moreread less

Abstract: Overadjustment is defined inconsistently. This term is meant to describe control (eg, by regression adjustment, stratification, or restriction) for a variable that either increases net bias or decreases precision without affecting bias. We define overadjustment bias as control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome. We define unnecessary adjustment as control for a variable that does not affect bias of the causal relation between exposure and outcome but may affect its precision. We use causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish overadjustment bias from unnecessary adjustment. Using simulations, we quantify the amount of bias associated with overadjustment. Moreover, we show that this bias is based on a different causal structure from confounding or selection biases. Overadjustment bias is not a finite sample bias, while inefficiencies due to control for unnecessary variables are a function of sample size.

...read moreread less

1,480 citations

Journal Article•DOI•

spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models

[...]

Matthew E. Aiello-Lammens¹, Matthew E. Aiello-Lammens², Robert A. Boria³, Aleksandar Radosavljevic³, Bruno Vilela, Robert P. Anderson⁴, Robert P. Anderson³ - Show less +3 more•Institutions (4)

Stony Brook University¹, University of Connecticut², City University of New York³, American Museum of Natural History⁴

01 May 2015-Ecography

TL;DR: This work provides a worked example of spatial thinning of species occurrence records for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.

...read moreread less

Abstract: Spatial thinning of species occurrence records can help address problems associated with spatial sampling biases. Ideally, thinning removes the fewest records necessary to substantially reduce the effects of sampling bias, while simultaneously retaining the greatest amount of useful information. Spatial thinning can be done manually; however, this is prohibitively time consuming for large datasets. Using a randomization approach, the ‘thin’ function in the spThin R package returns a dataset with the maximum number of records for a given thinning distance, when run for sufficient iterations. We here provide a worked example for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.

...read moreread less

1,016 citations

Collapse

Network Information

Performance

Metrics

1,155

Papers

61,170

Citations

No. of papers in the topic in previous years
Year	Papers
2023	22
2022	58
2021	87
2020	74
2019	66
2018	59

Sampling bias

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics