scispace - formally typeset
Search or ask a question

Showing papers on "Sampling (statistics) published in 2017"


01 Jan 2017
TL;DR: Piper as discussed by the authors presents a survey of methods for soil and plant analysis, with a focus on a specific analysis or group of related analyses, and gives the working details of all or nearly all the existing methods.
Abstract: BOOKS on methods of analysis can be divided into two classes, one of which is the ‘collected methods’ type. Here each chapter or section of the book is devoted to a specific analysis or group of related analyses, and gives the working details of all or nearly all the existing methods. Such books are not only very useful and convenient but also are a necessity for those analysts to whom the original papers in the literature are not easily accessible. It is, however, the other class of book that the analyst most appreciates, namely, the book in which he is not bewildered by an array of methods but is presented with a selection recommended from considerable experience. Dr. Piper has compiled his book along these lines, and all the methods, with a very few exceptions, are those in use at the Waite Agricultural Research Institute. Concise, and more important still, precise working details are given with ample explanation and a wealth of guidance and help.Soil and Plant AnalysisA Laboratory Manual of Methods for the Examination of Soils and the Determination of the Inorganic Constituents of Plants. By Dr. C. S. Piper. (A Monograph from the Waite Agricultural Research Institute.) Pp. xiv + 368. (Adelaide: University of Adelaide, 1942.) 15s.

4,022 citations


Journal ArticleDOI
30 Sep 2017
TL;DR: In this article, the authors describe snowball sampling as a purposeful method of data collection in qualitative research, which can be applied to facilitate scientific research, provide community-based data, and hold health educational programs.
Abstract: Background and Objectives: Snowball sampling is applied when samples with the target characteristics are not easily accessible. This research describes snowball sampling as a purposeful method of data collection in qualitative research. Methods: This paper is a descriptive review of previous research papers. Data were gathered using English keywords, including “review,” “declaration,” “snowball,” and “chain referral,” as well as Persian keywords that are equivalents of the following: “purposeful sampling,” “snowball,” “qualitative research,” and “descriptive review.” The databases included Google Scholar, Scopus, Irandoc, ProQuest, Science Direct, SID, MagIran, Medline, and Cochrane. The search was limited to Persian and English articles written between 2005 and 2013. Results: The preliminary search yielded 433 articles from PubMed, 88 articles from Scopus, 1 article from SID, and 18 articles from MagIran. Among 125 articles, methodological and non-research articles were omitted. Finally, 11 relevant articles, which met the criteria, were selected for review. Conclusions: Different methods of snowball sampling can be applied to facilitate scientific research, provide community-based data, and hold health educational programs. Snowball sampling can be effectively used to analyze vulnerable groups or individuals under special care. In fact, it allows researchers to access susceptible populations. Thus, it is suggested to consider snowball sampling strategies while working with the attendees of educational programs or samples of research studies. Keywords: Purposeful Sampling, Snowball, Qualitative Research, Descriptive Review

717 citations



Journal Article
TL;DR: In this paper, the authors clarified the proper meaning of sampling and discussed about the different techniques and types of sampling, they mainly concentrated on two types of probability and non- probability and their sub categories.
Abstract: In the field of research different sampling technique are used for different fields. It is very essential to choose the adequate technique of sampling. In this paper first we clarify the proper meaning of sampling. Further we discus about the different techniques and types of sampling. We mainly concentrate on two types of probability and non- probability and their sub categories. Further we discus about the pros and cons of these techniques. Pros are the primary positive aspect of an idea process or thing. Cones are the primary negative aspects. It is very necessary to choose the write sampling technique for a specific research work. Before we choose the sampling technique it is necessary to know about the ‘Pros’ and ‘Cons’ of sampling technique. If the researcher know about the ‘Pros’ and ‘Cons’ he/she will select the adequate technique of sampling for his research work.

407 citations


Journal ArticleDOI
TL;DR: This article presents basic concepts and recent research directions about the stability of sampled-data systems with aperiodic sampling, and indicates the sources of conservatism, the problems that remain open and the possible directions of improvement.

344 citations


Posted Content
TL;DR: This paper argued that clustering is in essence a design problem, either a sampling design or an experimental design issue, and that the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample.
Abstract: In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. It also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, while in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter.

329 citations


Journal ArticleDOI
TL;DR: It is argued that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized and it is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources.

321 citations


Journal ArticleDOI
TL;DR: The protocol for Gaussian Boson Sampling with single-mode squeezed states is presented and it is shown that the proposal with the Hafnian matrix function can retain the higher photon number contributions at the input.
Abstract: Boson sampling has emerged as a tool to explore the advantages of quantum over classical computers as it does not require universal control over the quantum system, which favors current photonic experimental platforms. Here, we introduce Gaussian Boson sampling, a classically hard-to-solve problem that uses squeezed states as a nonclassical resource. We relate the probability to measure specific photon patterns from a general Gaussian state in the Fock basis to a matrix function called the Hafnian, which answers the last remaining question of sampling from Gaussian states. Based on this result, we design Gaussian Boson sampling, a #P hard problem, using squeezed states. This demonstrates that Boson sampling from Gaussian states is possible, with significant advantages in the photon generation probability, compared to existing protocols.

311 citations


Journal ArticleDOI
TL;DR: In this article, the analytical techniques for measuring microplastics in sediment have been evaluated and four primary areas of the analytical process have been identified that include sampling, extraction, quantitation and quality assurance/quality control.
Abstract: In this review the analytical techniques for measuring microplastics in sediment have been evaluated. Four primary areas of the analytical process have been identified that include (1) sampling, (2) extraction, (3) quantitation and (4) quality assurance/quality control (QAQC). Each of those sections have their own subject specific challenges and require further method development and harmonisation. The most common approach to extracting microplastics from sediments is density separation. Following extraction, visual counting with an optical microscope is the most common technique for quantifying microplastics; a technique that is labour intensive and prone to human error. Spectroscopy (FTIR; Raman) are the most commonly applied techniques for identifying polymers collected through visual sorting. Improvements and harmonisation on size fractions, sampling approaches, extraction protocols and units for reporting plastic abundance would aid comparison of data generated by different research teams. Further, we advocate the development of strong QAQC procedures to be adopted like other fields of analytical chemistry. Finally, inter-laboratory proficiency testing is recommended to give an indication of the variation and reliability in measurements reported in the scientific literature that may be under- or overestimations of environmental burdens.

261 citations


Journal ArticleDOI
TL;DR: It is shown that the leader-following consensus problem with stochastic sampling can be transferred into a master-slave synchronization problem with only one master system and two slave systems.
Abstract: This paper is concerned with sampled-data leader-following consensus of a group of agents with nonlinear characteristic. A distributed consensus protocol with probabilistic sampling in two sampling periods is proposed. First, a general consensus criterion is derived for multiagent systems under a directed graph. A number of results in several special cases without transmittal delays or with the deterministic sampling are obtained. Second, a dimension-reduced condition is obtained for multiagent systems under an undirected graph. It is shown that the leader-following consensus problem with stochastic sampling can be transferred into a master–slave synchronization problem with only one master system and two slave systems. The problem solving is independent of the number of agents, which greatly facilitates its application to large-scale networked agents. Third, the network design issue is further addressed, demonstrating the positive and active roles of the network structure in reaching consensus. Finally, two examples are given to verify the theoretical results.

247 citations


Journal ArticleDOI
TL;DR: A standard operating procedure for sampling and extracting microplastics from beach sand is provided, finding that sampling depth, sampling location, number of repeat extractions, and settling times are the critical parameters of variation.

Journal ArticleDOI
Arnak S. Dalalyan1
TL;DR: In this paper, the Langevin Monte Carlo method and its variants have been used to obtain nonasymptotic bounds for the error of approximating the true distribution of a distribution with a smooth and log-concave density.
Abstract: Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no well-developed theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the high-dimensional problems. This paper makes some progress in this direction by considering the problem of sampling from a distribution having a smooth and log-concave density defined on Rp, for some integer p > 0. We establish nonasymptotic bounds for the error of approximating the true distribution by the one obtained by the Langevin Monte Carlo method and its variants. We illustrate the effectiveness of the established guarantees with various experiments. Underlying our analysis are insights from the theory of continuous-time diffusion processes, which may be of interest beyond the framework of distributions with log-concave densities considered in the present work.

Journal ArticleDOI
TL;DR: In this article, the authors review their experience of TLS sampling strategies from 27 campaigns conducted over the past 5 years, across tropical and temperate forest plots, where data was captured with a RIEGL VZ-400 laser scanner.

DOI
01 Jan 2017
TL;DR: The GEOTRACES Standards and Intercalibration (S&I) Committee is charged with ensuring that the data generated during GEOTrACES are as precise and accurate as possible, which includes all the steps from sampling to analysis.
Abstract: The GEOTRACES Standards and Intercalibration (S&I) Committee is charged with ensuring that the data generated during GEOTRACES are as precise and accurate as possible, which includes all the steps from sampling to analysis. Thus, sampling methods for dissolved and particulate constituents must take a representative (of the water depth/water mass) and uncontaminated sample, the samples must be stored (or immediately analyzed) in a fashion that preserves the concentrations (activities) and chemical speciation, and the analyses of these samples must yield accurate data (concentration, activity, isotopic composition, chemical speciation). To this end, experiences from the 2008-2010 GEOTRACES Intercalibration Program, and other related intercalibration efforts, helped to create the protocols in this document. However, methods continually evolve and the GEOTRACES S&I Committee will monitor these advances as validated by intercalibrations and modify the methods as warranted. The protocols here are divided into trace element and isotope groups: Hydrography and Ancillary Parameters, Radioactive Isotopes, Radiogenic Isotopes, Trace Elements, and Nutrient Isotopes. Those who contributed to preparing these protocols are listed in Appendix 1 and are sincerely thanked for their efforts in helping GEOTRACES and the worldwide TEI community.

Journal ArticleDOI
TL;DR: A classical algorithm solves the boson sampling problem for 30 bosons with standard computing hardware, suggesting that a much larger experimental effort will be needed to reach a regime where quantum hardware outperforms classical methods.
Abstract: A classical algorithm solves the boson sampling problem for 30 bosons with standard computing hardware, suggesting that a much larger experimental effort will be needed to reach a regime where quantum hardware outperforms classical methods. It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling1 is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy2,3,4,5,6. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling7,8. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.

Journal ArticleDOI
TL;DR: It is concluded that bridge sampling is an attractive method for mathematical psychologists who typically aim to approximate the marginal likelihood for a limited set of possibly high-dimensional models.

Journal ArticleDOI
TL;DR: The precision, accuracy, and stability of the RF, ANN, and SVM models were improved by inclusion of STR sampling, and the RF model is suitable for estimating LAI when sample plots and variation are relatively large and more than one growth period.
Abstract: Leaf area index (LAI) is an important indicator of plant growth and yield that can be monitored by remote sensing. Several models were constructed using datasets derived from SRS and STR sampling methods to determine the optimal model for soybean (multiple strains) LAI inversion for the whole crop growth period and a single growth period. Random forest (RF), artificial neural network (ANN), and support vector machine (SVM) regression models were compared with a partial least-squares regression (PLS) model. The RF model yielded the highest precision, accuracy, and stability with V-R2, SDR2, V-RMSE, and SDRMSE values of 0.741, 0.031, 0.106, and 0.005, respectively, over the whole growth period based on STR sampling. The ANN model had the highest precision, accuracy, and stability (0.452, 0.132, 0.086, and 0.009, respectively) over a single growth phase based on STR sampling. The precision, accuracy, and stability of the RF, ANN, and SVM models were improved by inclusion of STR sampling. The RF model is suitable for estimating LAI when sample plots and variation are relatively large (i.e., the whole growth period or more than one growth period). The ANN model is more appropriate for estimating LAI when sample plots and variation are relatively low (i.e., a single growth period).

Journal ArticleDOI
TL;DR: The robust stabilization of nonlinear systems subject to exogenous inputs using event-triggered output feedback laws is addressed using time-driven (and so periodic) sampling as a particular case, for which the results are new.

Journal ArticleDOI
TL;DR: In this paper, the authors presented a framework for global estimation of depth to bedrock (DTB) from a global compilation of soil profile data (ca. 1,30,000 locations) and borehole data (1.6 million locations).
Abstract: Depth to bedrock serves as the lower boundary of land surface models, which controls hydrologic and biogeochemical processes. This paper presents a framework for global estimation of depth to bedrock (DTB). Observations were extracted from a global compilation of soil profile data (ca. 1,30,000 locations) and borehole data (ca. 1.6 million locations). Additional pseudo-observations generated by expert knowledge were added to fill in large sampling gaps. The model training points were then overlaid on a stack of 155 covariates including DEM-based hydrological and morphological derivatives, lithologic units, MODIS surface reflectance bands and vegetation indices derived from the MODIS land products. Global spatial prediction models were developed using random forest and Gradient Boosting Tree algorithms. The final predictions were generated at the spatial resolution of 250 m as an ensemble prediction of the two independently fitted models. The 10–fold cross-validation shows that the models explain 59% for absolute DTB and 34% for censored DTB (depths deep than 200 cm are predicted as 200 cm). The model for occurrence of R horizon (bedrock) within 200 cm does a good job. Visual comparisons of predictions in the study areas where more detailed maps of depth to bedrock exist show that there is a general match with spatial patterns from similar local studies. Limitation of the data set and extrapolation in data spare areas should not be ignored in applications. To improve accuracy of spatial prediction, more borehole drilling logs will need to be added to supplement the existing training points in under-represented areas.

Journal ArticleDOI
01 May 2017-Catena
TL;DR: In this paper, the authors used maximum entropy (ME) as a machine learning model, with two sampling strategies: Mahalanobis distance (MEMD) and random sampling (MERS), to map landslide susceptibility over the Ziarat watershed in the Golestan Province, Iran.
Abstract: The aim of the current study is to map landslide susceptibility over the Ziarat watershed in the Golestan Province, Iran, using Maximum Entropy (ME), as a machine learning model, with two sampling strategies: Mahalanobis distance (MEMD) and random sampling (MERS). To this aim, a total of 92 landslides in the watershed were recorded as point features using a GPS (Global Positioning System) device, along with several field surveys and available local data. By reviewing landslide-related studies and using principal component analysis, 12 landslide-controlling factors were chosen namely altitude, slope percent, slope aspect, lithological formations, proximity (to faults, streams, and roads), land use/cover, precipitation, plan and profile curvature and the state-of-the-art topo-hydrological factor known as height above the nearest drainage (HAND). Two sampling methods were used to divide landslides into two sets of training (70%) and test (30%). The Area under the success rate curve (AUSRC) and the area under the prediction rate curve (AUPRC) were used to evaluate the results of the MEMD and MERS. The results showed that both MEMD and MERS strategies with the respective AUSRC values of 0.884 and 0.878, have good performance in modelling the landslide susceptibility in the study area. However, AUPRC test showed slightly different results in which MEMD with the value of 0.906 showed excellent predictive power in comparison with the MERS with the AUPRC value of 0.846. The higher AUPRC value in relation to AUSRC indicated the MEMD as the premier model in the current study. According to the MEMD, three landslide controlling factors including lithological formations, proximity to roads and precipitation with the respective contribution percentages of 25.1%, 23.3%, and 19.1%, contained more information in relation to the rest. Moreover, according to one-by-one factor removal test, lithological formations and proximity to faults were identified to have a unique information compared to the rest. According to the MEMD, about 13.8% of the study area is located within high to very high susceptibility classes which can be matter of great interest to decision makers and the local authorities for formulating land use planning strategies and implementing pragmatic measures.

Proceedings Article
06 Aug 2017
TL;DR: An Bayesian expected regret bound for PSRL in finite-horizon episodic Markov decision processes is established, which improves upon the best previous bound of $\tilde{O}(H S \sqrt{AT})$ for any reinforcement learning algorithm.
Abstract: Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an O(H√SAT) Bayesian regret bound for PSRL in finite-horizon episodic Markov decision processes. This improves upon the best previous Bayesian regret bound of O(HS √AT) for any reinforcement learning algorithm. Our theoretical results are supported by extensive empirical evaluation.

Posted Content
TL;DR: The authors argued that clustering is in essence a design problem, either a sampling design or an experimental design issue, and that the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample.
Abstract: In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. This motivation also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, and in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter.

Journal ArticleDOI
TL;DR: A new class of functionals consisting of multiple Lyapunov functions and looped functionals is developed to present sufficient conditions on the exponential stabilization for such systems with sampled-data state feedback.
Abstract: This paper is concerned with sampled-data control of switched linear systems with average dwell time switching under variable sampling. Since the subsystem switching could occur during the sampling intervals while the associated controller does not switch, the asynchronous switching phenomenon arises. A new class of functionals consisting of multiple Lyapunov functions and looped functionals is developed to present sufficient conditions on the exponential stabilization for such systems with sampled-data state feedback. The obtained results are verified by an F-18 aircraft.

Journal ArticleDOI
TL;DR: This synthesis suggests that all current root sampling categories present both advantages and pitfalls and that no single method can appropriately tackle the main current challenge of root functional ecology: i.e. linking fine roots to plant and ecosystem functions in a truly comparable way across all plants.
Abstract: Summary Roots vary in anatomy, morphology and physiology, both spatially (different parts of the same root system) and temporally (plastic changes, root ageing), suggesting that root trait measurements are strongly affected by root sampling categories. In this context, it is urgent to clarify the functional significance of current root sampling categories (e.g. fine roots of the first order, the first three orders, ≤1 mm or ≤2 mm), establish guidelines for choosing between sampling methods and revise root ontology to account for functional differences between traits measured on distinct root categories. Here, we used a worldwide database of fine-root traits to test the hypothesis that distinct fine-root trait values – with link to fine-root functions – were generally affected by different root sampling categories. We observed indeed a clear functional break between first-order roots and roots of all three other sampling categories, and a smaller but substantial break between roots of the three first orders and the ≤2 mm category, demonstrating globally that different sampling methodologies capture different functional parts of roots. Our synthesis suggests that all current root sampling categories present both advantages and pitfalls and that no single method can appropriately tackle the main current challenge of root functional ecology: i.e. linking fine roots to plant and ecosystem functions in a truly comparable way across all plants. We argue instead that a small set of complementary standardized sampling methods is necessary to capture the linkages between root forms and functions. To assist experimenters selecting adequate sampling we developed a decision table following three logical questions: (i) what plant or ecosystem function must be addressed; (ii) what root categories are involved in this function and (iii) what traits should be measured on these root categories. Challenging, strengthening and expending such common reference framework would be a substantial step towards wider comparability of future functional trait datasets. A lay summary is available for this article.

Journal ArticleDOI
TL;DR: A new sequential sampling strategy called Progressive Latin Hypercube Sampling (PLHS), which sequentially generates sample points while progressively preserving the distributional properties of interest (Latin hypercube properties, space-filling, etc.), as the sample size grows, is proposed.
Abstract: Efficient sampling strategies that scale with the size of the problem, computational budget, and users needs are essential for various sampling-based analyses, such as sensitivity and uncertainty analysis. In this study, we propose a new strategy, called Progressive Latin Hypercube Sampling (PLHS), which sequentially generates sample points while progressively preserving the distributional properties of interest (Latin hypercube properties, space-filling, etc.), as the sample size grows. Unlike Latin hypercube sampling, PLHS generates a series of smaller sub-sets (slices) such that (1) the first slice is Latin hypercube, (2) the progressive union of slices remains Latin hypercube and achieves maximum stratification in any one-dimensional projection, and as such (3) the entire sample set is Latin hypercube. The performance of PLHS is compared with benchmark sampling strategies across multiple case studies for Monte Carlo simulation, sensitivity and uncertainty analysis. Our results indicate that PLHS leads to improved efficiency, convergence, and robustness of sampling-based analyses. A new sequential sampling strategy called PLHS is proposed for sampling-based analysis of simulation models.PLHS is evaluated across multiple case studies for Monte Carlo simulation, sensitivity and uncertainty analysis.PLHS provides better performance compared with the other sampling strategies in terms of convergence rate and robustness.PLHS can be used to monitor the performance of the associated sampling-based analysis and to avoid over- or under-sampling.

Journal ArticleDOI
TL;DR: In this article, a new technique implemented on a quantum annealer surpasses these limitations and shows how quantum computing can assist in machine learning tasks, but limitations prevent demonstrations of feasibility.
Abstract: Quantum computing could greatly speed up machine learning techniques that rely on sampling complex probability distributions, but limitations prevent demonstrations of feasibility. A new technique implemented on a quantum annealer surpasses these limitations and shows how quantum computing can assist in machine learning tasks.

Journal ArticleDOI
26 Jul 2017-PLOS ONE
TL;DR: The sample size in qualitative research that is required to reach theoretical saturation is explored and seven guidelines for purposive sampling are formulated and recommend that researchers follow a minimum information scenario.
Abstract: I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.

Journal ArticleDOI
TL;DR: ALAMO's constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs.

01 Sep 2017
TL;DR: In this paper, the second edition of a classic text gives students what they need to apply critical reasoning when reading behavioral science research, including a new chapter on meta-analyses and a series of fictitious journal articles containing built-in flaws in method and interpretation.
Abstract: To become informed consumers of research, students need to thoughtfully evaluate the research they read rather than accept it without question. This second edition of a classic text gives students what they need to apply critical reasoning when reading behavioral science research. It updates the original text with recent developments in research methods, including a new chapter on meta-analyses. Part I gives a thorough overview of the steps in a research project. It focuses on how to assess whether the conclusions drawn in a behavioral science report are warranted by the methods used in the research. Topics include research hypotheses, sampling, experimental design, data analysis, interpretation of results, and ethics. Part II allows readers to practice critical thinking with a series of fictitious journal articles containing built-in flaws in method and interpretation. Clever and engaging, each article is accompanied by a commentary that points out the errors of procedure and logic that have been deliberately embedded in the article. This combination of instruction and practical application will promote active learning and critical thinking in students studying the behavioral sciences.

Posted Content
TL;DR: This paper analyzes several methods of approximate sampling based on discretizations of the Langevin diffusion and establishes guarantees on its error measured in the Wasserstein-2 distance, and provides an upper bound on the error of the first-order Langevin Monte Carlo (LMC) algorithm with optimized varying step-size.
Abstract: In this paper, we revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. We improve, in terms of constants, the existing results when the accuracy of sampling is measured in the Wasserstein distance and provide further insights on relations between, on the one hand, the Langevin Monte Carlo for sampling and, on the other hand, the gradient descent for optimization. More importantly, we establish non-asymptotic guarantees for the accuracy of a version of the Langevin Monte Carlo algorithm that is based on inaccurate evaluations of the gradient. Finally, we propose a variable-step version of the Langevin Monte Carlo algorithm that has two advantages. First, its step-sizes are independent of the target accuracy and, second, its rate provides a logarithmic improvement over the constant-step Langevin Monte Carlo algorithm ;Classification-JEL: Primary 62J05; secondary 62H12