scispace - formally typeset
Search or ask a question

Showing papers on "Sampling (statistics) published in 2016"


Journal ArticleDOI
TL;DR: It is concluded that the choice of the techniques (Convenience Sampling and Purposive Sampling) depends on the nature and type of the research.
Abstract: This article studied and compared the two nonprobability sampling techniques namely, Convenience Sampling and Purposive Sampling. Convenience Sampling and Purposive Sampling are Nonprobability Sampling Techniques that a researcher uses to choose a sample of subjects/units from a population. Although, Nonprobability sampling has a lot of limitations due to the subjective nature in choosing the sample and thus it is not good representative of the population, but it is useful especially when randomization is impossible like when the population is very large. It can be useful when the researcher has limited resources, time and workforce. It can also be used when the research does not aim to generate results that will be used to create generalizations pertaining to the entire population. Therefore, there is a need to use nonprobability sampling techniques. The aim of this study is to compare among the two nonrandom sampling techniques in order to know whether one technique is better or useful than the other. Different articles were reviewed to compare between Convenience Sampling and Purposive Sampling and it is concluded that the choice of the techniques (Convenience Sampling and Purposive Sampling) depends on the nature and type of the research.

4,956 citations


Proceedings Article
27 May 2016
TL;DR: The authors extend the space of probabilistic models using real-valued non-volume preserving transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space.
Abstract: Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

1,221 citations


Posted Content
TL;DR: This work extends the space of probabilistic models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space.
Abstract: Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

908 citations


Journal ArticleDOI
TL;DR: As there are different types of sampling techniques/methods, researcher needs to understand the differences to select the proper sampling method for the research.
Abstract: In order to answer the research questions, it is doubtful that researcher should be able to collect data from all cases. Thus, there is a need to select a sample. This paper presents the steps to go through to conduct sampling. Furthermore, as there are different types of sampling techniques/methods, researcher needs to understand the differences to select the proper sampling method for the research. In the regards, this paper also presents the different types of sampling techniques and methods.

685 citations


Journal ArticleDOI
TL;DR: Four new supervised methods to detect the number of clusters were developed and tested and were found to outperform the existing methods using both evenly and unevenly sampled data sets and a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested.
Abstract: Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such population structure inferences are routinely investigated via the program structure implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical data sets. In this study, I used simulated and empirical microsatellite data sets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward-biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, while at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled data sets. Additionally, a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.

631 citations


Journal ArticleDOI
TL;DR: As demonstrated, metadynamics is not just a practical tool but can also be considered an important development in the theory of statistical mechanics.
Abstract: Atomistic simulations play a central role in many fields of science. However, their usefulness is often limited by the fact that many systems are characterized by several metastable states separated by high barriers, leading to kinetic bottlenecks. Transitions between metastable states are thus rare events that occur on significantly longer timescales than one can simulate in practice. Numerous enhanced sampling methods have been introduced to alleviate this timescale problem, including methods based on identifying a few crucial order parameters or collective variables and enhancing the sampling of these variables. Metadynamics is one such method that has proven successful in a great variety of fields. Here we review the conceptual and theoretical foundations of metadynamics. As demonstrated, metadynamics is not just a practical tool but can also be considered an important development in the theory of statistical mechanics.

496 citations


Posted Content
TL;DR: The Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps, is introduced.
Abstract: The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.

342 citations


Journal ArticleDOI
TL;DR: The snowball sampling method achieved greater participation with more Hispanics but also more individuals with disabilities than a purposive-convenience sampling method, however, priorities for research on chronic pain from both stakeholder groups were similar.
Abstract: Effective community-partnered and patient-centered outcomes research needs to address community priorities However, optimal sampling methods to engage stakeholders from hard-to-reach, vulnerable communities to generate research priorities have not been identified In two similar rural, largely Hispanic communities, a community advisory board guided recruitment of stakeholders affected by chronic pain using a different method in each community: 1) snowball sampling, a chain- referral method or 2) purposive sampling to recruit diverse stakeholders In both communities, three groups of stakeholders attended a series of three facilitated meetings to orient, brainstorm, and prioritize ideas (9 meetings/community) Using mixed methods analysis, we compared stakeholder recruitment and retention as well as priorities from both communities’ stakeholders on mean ratings of their ideas based on importance and feasibility for implementation in their community Of 65 eligible stakeholders in one community recruited by snowball sampling, 55 (85 %) consented, 52 (95 %) attended the first meeting, and 36 (65 %) attended all 3 meetings In the second community, the purposive sampling method was supplemented by convenience sampling to increase recruitment Of 69 stakeholders recruited by this combined strategy, 62 (90 %) consented, 36 (58 %) attended the first meeting, and 26 (42 %) attended all 3 meetings Snowball sampling recruited more Hispanics and disabled persons (all P < 005) Despite differing recruitment strategies, stakeholders from the two communities identified largely similar ideas for research, focusing on non-pharmacologic interventions for management of chronic pain Ratings on importance and feasibility for community implementation differed only on the importance of massage services (P = 0045) which was higher for the purposive/convenience sampling group and for city improvements/transportation services (P = 0004) which was higher for the snowball sampling group In each of the two similar hard-to-reach communities, a community advisory board partnered with researchers to implement a different sampling method to recruit stakeholders The snowball sampling method achieved greater participation with more Hispanics but also more individuals with disabilities than a purposive-convenience sampling method However, priorities for research on chronic pain from both stakeholder groups were similar Although utilizing a snowball sampling method appears to be superior, further research is needed on implementation costs and resources

315 citations


Journal ArticleDOI
01 Mar 2016-Geoderma
TL;DR: This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.

314 citations


Proceedings Article
01 Jan 2016
TL;DR: In this article, the authors introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps.
Abstract: The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network’s own one-step-ahead predictions to do multi-step sampling We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps This is supported by human evaluation of sample quality Trade-offs between Professor Forcing and Scheduled Sampling are discussed We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar

306 citations


Journal ArticleDOI
01 Jan 2016-Geoderma
TL;DR: In this article, the authors used the Land Use and Cover Area Frame Statistical Survey (LUCAS) dataset to map soil properties at a continental scale over the geographical extent of Europe.

Journal ArticleDOI
TL;DR: A more general sampling scheme, under which, either the aggregation approach or the alternative approach of sampling a graph signal by observing the value of the signal at a subset of nodes can be both viewed as particular cases.
Abstract: A new scheme to sample signals defined on the nodes of a graph is proposed. The underlying assumption is that such signals admit a sparse representation in a frequency domain related to the structure of the graph, which is captured by the so-called graph-shift operator. Instead of using the value of the signal observed at a subset of nodes to recover the signal in the entire graph, the sampling scheme proposed here uses as input observations taken at a single node. The observations correspond to sequential applications of the graph-shift operator, which are linear combinations of the information gathered by the neighbors of the node. When the graph corresponds to a directed cycle (which is the support of time-varying signals), our method is equivalent to the classical sampling in the time domain. When the graph is more general, we show that the Vandermonde structure of the sampling matrix, critical when sampling time-varying signals, is preserved. Sampling and interpolation are analyzed first in the absence of noise, and then noise is considered. We then study the recovery of the sampled signal when the specific set of frequencies that is active is not known. Moreover, we present a more general sampling scheme, under which, either our aggregation approach or the alternative approach of sampling a graph signal by observing the value of the signal at a subset of nodes can be both viewed as particular cases. Numerical experiments illustrating the results in both synthetic and real-world graphs close the paper.

Journal ArticleDOI
TL;DR: It is shown that the explicit modeling of fossilization and sampling processes can improve divergence time estimates, but only if all important model aspects, including sampling biases, are adequately addressed.
Abstract: Bayesian total-evidence dating involves the simultaneous analysis of morphological data from the fossil record and morphological and sequence data from recent organisms, and it accommodates the uncertainty in the placement of fossils while dating the phylogenetic tree. Due to the flexibility of the Bayesian approach, total-evidence dating can also incorporate additional sources of information. Here, we take advantage of this and expand the analysis to include information about fossilization and sampling processes. Our work is based on the recently described fossilized birth-death (FBD) process, which has been used to model speciation, extinction, and fossilization rates that can vary over time in a piecewise manner. So far, sampling of extant and fossil taxa has been assumed to be either complete or uniformly at random, an assumption which is only valid for a minority of data sets. We therefore extend the FBD process to accommodate diversified sampling of extant taxa, which is standard practice in studies of higher-level taxa. We verify the implementation using simulations and apply it to the early radiation of Hymenoptera (wasps, ants, and bees). Previous total-evidence dating analyses of this data set were based on a simple uniform tree prior and dated the initial radiation of extant Hymenoptera to the late Carboniferous (309 Ma). The analyses using the FBD prior under diversified sampling, however, date the radiation to the Triassic and Permian (252 Ma), slightly older than the age of the oldest hymenopteran fossils. By exploring a variety of FBD model assumptions, we show that it is mainly the accommodation of diversified sampling that causes the push toward more recent divergence times. Accounting for diversified sampling thus has the potential to close the long-discussed gap between rocks and clocks. We conclude that the explicit modeling of fossilization and sampling processes can improve divergence time estimates, but only if all important model aspects, including sampling biases, are adequately addressed.

Journal ArticleDOI
TL;DR: The Latinized partially stratified sampling method is applied to identify the best sample strategy for uncertainty quantification on a plate buckling problem.

Journal ArticleDOI
TL;DR: The article provides the description and comparison of two non-random samplings which are snowball or chain referral sampling and sequential sampling.
Abstract: The article provides the description and comparison of two non-random samplings which are snowball or chain referral sampling and sequential sampling. Snowball sampling has been widely used in qualitative sociological research, especially in the study of deviant behavior and is used in the place where the population is hard to reach. It also described different form of sampling method. While in sequential sampling, sampling was taken at a given time interval and modification can be made by correcting the research and sampling method to centralize the analysis and make a satisfied decision.

Journal ArticleDOI
TL;DR: Why and how purposeful sampling was used in a qualitative evidence synthesis about ‘sexual adjustment to a cancer trajectory’ and the possible inclusion of new perspectives to the line-of-argument were discussed, which could make the results more conceptually aligned with the synthesis purpose.
Abstract: An increasing number of qualitative evidence syntheses papers are found in health care literature. Many of these syntheses use a strictly exhaustive search strategy to collect articles, mirroring the standard template developed by major review organizations such as the Cochrane and Campbell Collaboration. The hegemonic idea behind it is that non-comprehensive samples in systematic reviews may introduce selection bias. However, exhaustive sampling in a qualitative evidence synthesis has been questioned, and a more purposeful way of sampling papers has been proposed as an alternative, although there is a lack of transparency on how these purposeful sampling strategies might be applied to a qualitative evidence synthesis. We discuss in our paper why and how we used purposeful sampling in a qualitative evidence synthesis about ‘sexual adjustment to a cancer trajectory’, by giving a worked example. We have chosen a mixed purposeful sampling, combining three different strategies that we considered the most consistent with our research purpose: intensity sampling, maximum variation sampling and confirming/disconfirming case sampling. The concept of purposeful sampling on the meta-level could not readily been borrowed from the logic applied in basic research projects. It also demands a considerable amount of flexibility, and is labour-intensive, which goes against the argument of many authors that using purposeful sampling provides a pragmatic solution or a short cut for researchers, compared with exhaustive sampling. Opportunities of purposeful sampling were the possible inclusion of new perspectives to the line-of-argument and the enhancement of the theoretical diversity of the papers being included, which could make the results more conceptually aligned with the synthesis purpose. This paper helps researchers to make decisions related to purposeful sampling in a more systematic and transparent way. Future research could confirm or disconfirm the hypothesis of conceptual enhancement by comparing the findings of a purposefully sampled qualitative evidence synthesis with those drawing on an exhaustive sample of the literature.

Journal ArticleDOI
01 Jul 2016
TL;DR: An adaptive path planning algorithm is proposed for multiple AUVs to estimate the scalar field over a region of interest and the sampling positions of the AUVs are determined to improve the quality of future samples by maximizing the mutual information between the Scalar field model and observations.
Abstract: Autonomous underwater vehicles (AUVs) have been widely employed in ocean survey, monitoring, and search and rescue tasks for both civil and military applications. It is beneficial to use multiple AUVs that perform environmental sampling and sensing tasks for the purposes of efficiency and cost effectiveness. In this paper, an adaptive path planning algorithm is proposed for multiple AUVs to estimate the scalar field over a region of interest. In the proposed method, a measurable model composed of multiple basis functions is defined to represent the scalar field. A selective basis function Kalman filter is developed to achieve model estimation through the information collected by multiple AUVs. In addition, a path planning method, the multidimensional rapidly exploring random trees star algorithm, which uses mutual information, is proposed for the multi-AUV system. Employing the path planning algorithm, the sampling positions of the AUVs are determined to improve the quality of future samples by maximizing the mutual information between the scalar field model and observations. Extensive simulation results are provided to demonstrate the effectiveness of the proposed algorithm. Additionally, an indoor experiment using four robotic fishes is carried out to validate the algorithms presented.

Journal ArticleDOI
TL;DR: The knowledge on species richness, species composition and endemism in the Brazilian biodiversity is strongly biased spatially, and despite differences in sampling effort for each taxonomic group, roadside bias affected them equally.
Abstract: Aim The knowledge of biodiversity facets such as species composition, distribution and ecological niche is fundamental for the construction of biogeographic hypotheses and conservation strategies. However, the knowledge on these facets is affected by major shortfalls, which are even more pronounced in the tropics. This study aims to evaluate the effect of sampling bias and variation in collection effort on Linnean, Wallacean and Hutchinsonian shortfalls and diversity measures as species richness, endemism and beta-diversity. Location Brazil. Methods We have built a database with over 1.5 million records of arthropods, vertebrates and angiosperms of Brazil, based on specimens deposited in scientific collections and on the taxonomic literature. We used null models to test the collection bias regarding the proximity to access routes. We also tested the influence of sampling effort on diversity measures by regression models. To investigate the Wallacean shortfall, we modelled the geographic distribution of over 4000 species and compared their observed distribution with models. To quantify the Hutchinsonian shortfall, we used environmental Euclidean distance of the records to identify regions with poorly sampled environmental conditions. To estimate the Linnean shortfall, we measured the similarity of species composition between regions close to and far from access routes. Results We demonstrated that despite the differences in sampling effort, the strong collection bias affects all taxonomic groups equally, generating a pattern of spatially biased sampling effort. This collection pattern contributes greatly to the biodiversity knowledge shortfalls, which directly affects the knowledge on the distribution patterns of diversity. Main conclusions The knowledge on species richness, species composition and endemism in the Brazilian biodiversity is strongly biased spatially. Despite differences in sampling effort for each taxonomic group, roadside bias affected them equally. Species composition similarity decreased with the distance from access routes, suggesting collection surveys at sites far from roads could increase the probability of sampling new geographic records or new species.

Journal ArticleDOI
TL;DR: The basic elements related to the selection of participants for a health research are discussed and sample representativeness, sample frame, types of sampling, as well as the impact that non-respondents may have on results of a study are described.
Abstract: Background: In this paper, the basic elements related to the selection of participants for a health research are discussed. Sample representativeness, sample frame, types of sampling, as well as the impact that non-respondents may have on results of a study are described. The whole discussion is supported by practical examples to facilitate the reader's understanding. Objective: To introduce readers to issues related to sampling.

Journal ArticleDOI
TL;DR: This paper presents a simple, easily-implemented algorithm for dynamically adapting the temperature configuration of a sampler while sampling, and dynamically adjusts the temperature spacing to achieve a uniform rate of exchanges between chains at neighbouring temperatures.
Abstract: Modern problems in astronomical Bayesian inference require efficient methods for sampling from complex, high-dimensional, often multi-modal probability distributions. Most popular methods, such as Markov chain Monte Carlo sampling, perform poorly on strongly multi-modal probability distributions, rarely jumping between modes or settling on just one mode without finding others. Parallel tempering addresses this problem by sampling simultaneously with separate Markov chains from tempered versions of the target distribution with reduced contrast levels. Gaps between modes can be traversed at higher temperatures, while individual modes can be efficiently explored at lower temperatures. In this paper, we investigate how one might choose the ladder of temperatures to achieve more efficient sampling, as measured by the autocorrelation time of the sampler. In particular, we present a simple, easily-implemented algorithm for dynamically adapting the temperature configuration of a sampler while sampling. This algorithm dynamically adjusts the temperature spacing to achieve a uniform rate of exchanges between chains at neighbouring temperatures. We compare the algorithm to conventional geometric temperature configurations on a number of test distributions and on an astrophysical inference problem, reporting efficiency gains by a factor of 1.2-2.5 over a well-chosen geometric temperature configuration and by a factor of 1.5-5 over a poorly chosen configuration. On all of these problems a sampler using the dynamical adaptations to achieve uniform acceptance ratios between neighbouring chains outperforms one that does not.

Posted Content
23 Mar 2016
TL;DR: The Manual for Sampling Techniques used in Social Sciences is an effort to describe various types of sampling methodologies that are used in researches of social sciences in an easy and understandable way.
Abstract: The Manual for Sampling Techniques used in Social Sciences is an effort to describe various types of sampling methodologies that are used in researches of social sciences in an easy and understandable way. Characteristics, benefits, crucial issues/ draw backs, and examples of each sampling type are provided separately. The manual begins by describing What is Sampling and its Purposes then it moves forward discussing the two broader types: probability sampling and non-probability sampling. Later in the text various types of each of the broader category are discussed. Reading the manual from beginning to the end you will find some points are repeated under various headings. This is done to make each topic exclusively a complete whole so that there might not remain any requirement to read other topics for understanding the one. Also, similar examples with a little modification are used in the description of different sampling techniques. The purpose behind doing this is to clarify the minor distinction in the applicability and usage of different types of sampling techniques. I have also included a section Comparison of some Resembling Sampling Techniques, the purpose of which is to eliminate confusions among the techniques that look somewhat similar to each other. Both types of characteristics are described: that make the techniques resembling, and that create the difference between them. In the section Which Sampling Technique to use in your Research, it has been tried to describe what techniques are most suitable for the various sorts of researches. So one may easily decide which particular technique is applicable and most suitable of his or her research project. There are three appendices in the manual which are giving a concise view of all the techniques discussed in the text. Appendix I is giving a comparison of two broader categories of sampling methods: probability, and non probability. Appendix II is portraying a brief summary of various types of probability sampling technique. Appendix III is presenting a brief summary of various types of non-probability sampling technique. A glossary is also provided in the manual. The words that are used as synonyms to one another are mentioned. Moreover, definitions of the terms that are repetitively used throughout the manual are provided. The words defined in the glossary are written with italic letters in the text.

Journal Article
TL;DR: In this paper, the authors evaluate the performance of sampling and projection algorithms for the low-rank approximation of symmetric positive semi-definite matrices such as Laplacian and kernel matrices.
Abstract: We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices Our results highlight complementary aspects of sampling versus projection methods; they characterize the effects of common data preprocessing steps on the performance of these algorithms; and they point to important differences between uniform sampling and nonuniform sampling methods based on leverage scores In addition, our empirical results illustrate that existing theory is so weak that it does not provide even a qualitative guide to practice Thus, we complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods These bounds are qualitatively superior to existing bounds--eg, improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error--and they point to future directions to make these algorithms useful in even larger-scale machine learning applications

Proceedings Article
Daniel Russo1
06 Jun 2016
TL;DR: In this paper, the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs is studied. But the authors focus on the problem of selecting the best design after a small number of measurements.
Abstract: This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. I propose three simple Bayesian algorithms for adaptively allocating measurement effort. One is Top-Two Probability sampling, which computes the two designs with the highest posterior probability of being optimal, and then randomizes to select among these two. One is a variant a top-two sampling which considers not only the probability a design is optimal, but the expected amount by which its quality exceeds that of other designs. The final algorithm is a modified version of Thompson sampling that is tailored for identifying the best design. I prove that these simple algorithms satisfy a strong optimality property. In a frequestist setting where the true quality of the designs is fixed, one hopes the posterior definitively identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. I show that under the proposed algorithms this convergence occurs at an exponential rate, and the corresponding exponent is the best possible among all allocation rules.

Proceedings Article
12 Feb 2016
TL;DR: A new, tight lower bound on the sample complexity is proved on the complexity of best-arm identification in one-parameter bandit problems and the `Track-and-Stop' strategy is proposed, which is proved to be asymptotically optimal.
Abstract: We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

Journal ArticleDOI
TL;DR: Citizen science datasets, which rely on untrained amateurs, are more heavily prone to spatial biases from infrastructure and human population density, andObjectives and protocols of mass-participating projects should be designed with this in mind.
Abstract: Aim To understand how the integration of contextual spatial data on land cover and human infrastructure can help reduce spatial bias in sampling effort, and improve the utilization of citizen science-based species recording schemes. By comparing four different citizen science projects, we explore how the sampling design's complexity affects the role of these spatial biases. Location Denmark, Europe. Methods We used a point process model to estimate the effect of land cover and human infrastructure on the intensity of observations from four different citizen science species recording schemes. We then use these results to predict areas of under- and oversampling as well as relative biodiversity ‘hotspots’ and ‘deserts’, accounting for common spatial biases introduced in unstructured sampling designs. Results We demonstrate that the explanatory power of spatial biases such as infrastructure and human population density increased as the complexity of the sampling schemes decreased. Despite a low absolute sampling effort in agricultural landscapes, these areas still appeared oversampled compared to the observed species richness. Conversely, forests and grassland appeared undersampled despite higher absolute sampling efforts. We also present a novel and effective analytical approach to address spatial biases in unstructured sampling schemes and a new way to address such biases, when more structured sampling is not an option. Main conclusions We show that citizen science datasets, which rely on untrained amateurs, are more heavily prone to spatial biases from infrastructure and human population density. Objectives and protocols of mass-participating projects should thus be designed with this in mind. Our results suggest that, where contextual data is available, modelling the intensity of individual observation can help understand and quantify how spatial biases affect the observed biological patterns.

Journal ArticleDOI
TL;DR: This work aims to review enhanced sampling methods that do not require predefined system-dependent CVs for biomolecular simulations and as such do not suffer from the hidden energy barrier problem as encountered in the CV-biasing methods.
Abstract: Free energy calculations are central to understanding the structure, dynamics and function of biomolecules. Yet insufficient sampling of biomolecular configurations is often regarded as one of the main sources of error. Many enhanced sampling techniques have been developed to address this issue. Notably, enhanced sampling methods based on biasing collective variables (CVs), including the widely used umbrella sampling, adaptive biasing force and metadynamics, have been discussed in a recent excellent review (Abrams and Bussi, Entropy, 2014). Here, we aim to review enhanced sampling methods that do not require predefined system-dependent CVs for biomolecular simulations and as such do not suffer from the hidden energy barrier problem as encountered in the CV-biasing methods. These methods include, but are not limited to, replica exchange/parallel tempering, self-guided molecular/Langevin dynamics, essential energy space random walk and accelerated molecular dynamics. While it is overwhelming to describe all details of each method, we provide a summary of the methods along with the applications and offer our perspectives. We conclude with challenges and prospects of the unconstrained enhanced sampling methods for accurate biomolecular free energy calculations.

Posted Content
TL;DR: This work proposes to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained, Inspired by generative adversarial networks.
Abstract: Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training. This can be approximately achieved by Markov chain Monte Carlo methods, but may still face a formidable obstacle that is the difficulty of mixing between modes with sharp concentrations of probability. Whereas an MCMC process is usually derived from a given energy function based on mathematical considerations and requires an arbitrarily long time to obtain good and varied samples, we propose to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained. Inspired by generative adversarial networks, the proposed framework involves training of two models that represent dual views of the estimated probability distribution: the energy function (mapping an input configuration to a scalar energy value) and the generator (mapping a noise vector to a generated configuration), both represented by deep neural networks.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for quantifying methane emissions from lakes using distributed measurement stations and long-term sampling campaigns, but the method is not suitable for large-scale monitoring.
Abstract: Methane emissions from lakes are widely thought to be highly irregular and difficult to quantify with anything other than numerous distributed measurement stations and long-term sampling campaigns. ...

Journal ArticleDOI
TL;DR: A virtual special issue is introduced that reviews the development of analytical approaches to the determination of phosphorus species in natural waters, focusing on sampling and sample treatment, analytical methods and quality assurance of the data.

Journal ArticleDOI
TL;DR: The mechanisms of popular bioaerosol sampling devices such as impingers, cyclones, impactors, and filters are described, explaining both their strengths and weaknesses, and the consequences for microbial bioefficiency.