Showing papers on "Sampling (statistics) published in 2003"

PDF

Open Access

Journal Article•DOI•

Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems

[...]

Jon C. Helton¹, Freddie J. Davis²•Institutions (2)

Arizona State University¹, Sandia National Laboratories²

01 Jul 2003-Reliability Engineering & System Safety

TL;DR: The following techniques for uncertainty and sensitivity analysis are briefly summarized: Monte Carlo analysis, differential analysis, response surface methodology, Fourier amplitude sensitivity test, Sobol' variance decomposition, and fast probability integration.

...read moreread less

1,780 citations

Journal Article•DOI•

Assessment of the surface water quality in Northern Greece.

[...]

Vasil Simeonov¹, John A. Stratis², Constantini Samara², Georgios Zachariadis², Dimitra Voutsa², Aristidis N. Anthemidis, M. Sofoniou², Th-H. Kouimtzis² - Show less +4 more•Institutions (2)

Sofia University¹, Aristotle University of Thessaloniki²

01 Oct 2003-Water Research

TL;DR: The necessity and usefulness of multivariate statistical assessment of large and complex databases in order to get better information about the quality of surface water, the design of sampling and analytical protocols and the effective pollution control/management of the surface waters is presented.

...read moreread less

1,136 citations

Journal Article•DOI•

Modelling air quality in street canyons : a review

[...]

Sotiris Vardoulakis, Bernard Fisher, Koulis Pericleous, Norbert Gonzalez-Flesca

01 Jan 2003-Atmospheric Environment

TL;DR: In this paper, a range of monitoring techniques are used to measure pollutant concentrations in urban street canyons, such as continuous monitoring, passive and active pre-concentration sampling, and grab sampling.

...read moreread less

1,003 citations

Book Chapter•DOI•

Monte Carlo Sampling Methods

[...]

Alexander Shapiro¹•Institutions (1)

Georgia Institute of Technology¹

01 Jan 2003

TL;DR: In this article, Monte Carlo sampling methods for solving large scale stochastic programming problems are discussed, where a random sample is generated outside of an optimization procedure, and then the constructed, so-called sample average approximation (SAA), problem is solved by an appropriate deterministic algorithm.

...read moreread less

Abstract: In this chapter we discuss Monte Carlo sampling methods for solving large scale stochastic programming problems We concentrate on the “exterior” approach where a random sample is generated outside of an optimization procedure, and then the constructed, so-called sample average approximation (SAA), problem is solved by an appropriate deterministic algorithm We study statistical properties of the obtained SAA estimators The developed statistical inference is incorporated into validation analysis and error estimation We describe some variance reduction techniques which may enhance convergence of sampling based estimates We also discuss difficulties in extending this methodology to multistage stochastic programming Finally, we briefly discuss the SAA method applied to stochastic generalized equations and variational inequalities

...read moreread less

990 citations

Journal Article•DOI•

Invited comment on the paper "Slice Sampling" by Radford Neal

[...]

Stephen G. Walker

01 Jan 2003

TL;DR: In this paper, a Markov chain is constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal "slice" defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant.

...read moreread less

Abstract: Markov chain sampling methods that adapt to characteristics of the distribution being sampled can be constructed using the principle that one can ample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal "slice" defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant. Such "slice sampling" methods are easily implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling and more efficient than simple Metropolis updates, due to the ability of slice sampling to adaptively choose the magnitude of changes made. It is therefore attractive for routine and automated use. Slice sampling methods that update all variables simultaneously are also possible. These methods can adaptively choose the magnitudes of changes made to each variable, based on the local properties of the density function. More ambitiously, such methods could potentially adapt to the dependencies between variables by constructing local quadratic approximations. Another approach is to improve sampling efficiency by suppressing random walks. This can be done for univariate slice sampling by "overrelaxation," and for multivariate slice sampling by "reflection" from the edges of the slice.

...read moreread less

968 citations

Book•

Introduction to Survey Quality

[...]

Paul P. Biemer, Lars E. Lyberg

24 Feb 2003

TL;DR: This book discusses the evolution of Survey Process Quality and its implications for Questionnaire Design, as well as practical Survey Design for Minimizing Total Survey Error.

...read moreread less

Abstract: Preface. Chapter 1. The Evolution of Survey Process Quality. 1.1 The Concept of a Survey. 1.2 Types of Surveys. 1.3 Brief History of Survey Methodology. 1.4 The Quality Revolution. 1.5 Definitions of Quality and Quality in Statistical Organizations. 1.6 Measuring Quality. 1.7 Improving Quality. 1.8 Quality in a Nutshell. Chapter 2. The Survey Process and Data Quality. 2.1 Overview of the Survey Process. 2.2 Data Quality and Total Survey Error. 2.3 Decomposing Nonsampling Error into Its Component Parts. 2.4 Gauging the Magnitude of Total Survey Error. 2.5 Mean Squared Error. 2.6 An Illustration of the Concepts. Chapter 3. Coverage and Nonresponse Error. 3.1 Coverage Error. 3.2 Measures of Coverage Bias. 3.3 Reducing Coverage Bias. 3.4 Unit Nonresponse Error. 3.5 Calculating Response Rates. 3.6 Reducing Nonresponse Bias. Chapter 4. The Measurement Process and Its Implications for Questionnaire Design. 4.1Components of Measurement Error. 4.2 Errors Arising from the Questionnaire Design. 4.3 Understanding the Response Process. Chapter 5. Errors Due to Interviewers and Interviewing. 5.1 Role of the Interviewer. 5.2 Interviewer Variability. 5.3 Design Factors that Influence Interviewer Effects. 5.4 Evaluation of Interviewer Performance. Chapter 6. Data Collection Modes and Associated Errors. 6.1 Modes of Data Collection. 6.2 Decision Regarding Mode. 6.3 Some Examples of Mode Effects. Chapter 7. Data Processing: Errors and Their Control. 7.1 Overview of Data Processing Steps. 7.2 Nature of Data Processing Error. 7.3 Data Capture Errors. 7.4 Post-Data Capture Editing. 7.5 Coding. 7.6 File Preparation. 7.7 Applications of Continuous Quality Improvement: The Case of Coding. 7.8 Integration Activities. Chapter 8. Overview of Survey Error Evaluation Methods. 8.1 Purposes of Survey Error Evaluation. 8.2 Evaluation Methods for Designing and Pretesting Surveys. 8.3 Methods for Monitoring and Controlling Data Quality. 8.4 Postsurvey Evaluations. 8.5 Summary of Evaluation Methods. Chapter 9. Sampling Error. 9.1 Brief History of Sampling. 9.2 Nonrandom Sampling Methods. 9.3 Simple Random Sampling. 9.4 Statistical Inference in the Presence of Nonsampling Errors. 9.5 Other Methods of Random Sampling. 9.6 Concluding Remarks. Chapter 10.1 Practical Survey Design for Minimizing Total Survey Error. 10.1 Balance Between Cost, Survey Error, and Other Quality Features. 10.2 Planning a Survey for Optimal Quality. 10.3 Documenting Survey Quality. 10.4 Organizational Issues Related to Survey Quality. References. Index.

...read moreread less

795 citations

Journal Article•DOI•

On Latin hypercube sampling for structural reliability analysis

[...]

Anders Olsson¹, Göran Sandberg¹, Ola Dahlblom¹•Institutions (1)

Lund University¹

01 Jan 2003-Structural Safety

TL;DR: In this article, it was shown that more than 50% of the computer effort can be saved by using Latin hypercubes instead of simple Monte Carlo in importance sampling, however, the exact savings are dependent on details in the use of Latin Hypercubes and on the shape of the failure surfaces of the problems.

...read moreread less

586 citations

Journal Article•DOI•

A statistical sampling algorithm for RNA secondary structure prediction

[...]

Ye Ding¹, Charles E. Lawrence¹•Institutions (1)

New York State Department of Health¹

15 Dec 2003-Nucleic Acids Research

TL;DR: A statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures is presented, showing that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics.

...read moreread less

Abstract: An RNA molecule, particularly a long-chain mRNA, may exist as a population of structures. Further more, multiple structures have been demonstrated to play important functional roles. Thus, a representation of the ensemble of probable structures is of interest. We present a statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures. The forward step of the algorithm computes the equilibrium partition functions of RNA secondary structures with recent thermodynamic parameters. Using conditional probabilities computed with the partition functions in a recursive sampling process, the backward step of the algorithm quickly generates a statistically representative sample of structures. With cubic run time for the forward step, quadratic run time in the worst case for the sampling step, and quadratic storage, the algorithm is efficient for broad applicability. We demonstrate that, by classifying sampled structures, the algorithm enables a statistical delineation and representation of the Boltzmann ensemble. Applications of the algorithm show that alternative biological structures are revealed through sampling. Statistical sampling provides a means to estimate the probability of any structural motif, with or without constraints. For example, the algorithm enables probability profiling of single-stranded regions in RNA secondary structure. Probability profiling for specific loop types is also illustrated. By overlaying probability profiles, a mutual accessibility plot can be displayed for predicting RNA:RNA interactions. Boltzmann probability-weighted density of states and free energy distributions of sampled structures can be readily computed. We show that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics. Our applications suggest that the sampling algorithm may be well suited to prediction of mRNA structure and target accessibility. The algorithm is applicable to the rational design of small interfering RNAs (siRNAs), antisense oligonucleotides, and trans-cleaving ribozymes in gene knock-down studies.

...read moreread less

558 citations

Posted Content•

How Often to Sample a Continuous-Time Process in the Presence of Market Microstructure Noise

[...]

Yacine Ait-Sahalia, Per A. Mykland

01 Apr 2003-Research Papers in Economics

TL;DR: This work shows that the optimal sampling frequency at which to estimate the parameters of a discretely sampled continuous-time model can be finite when the observations are contaminated by market microstructure effects, and addresses the question of what to do about the presence of the noise.

...read moreread less

Abstract: Classical statistics suggest that for inference purposes one should always use as much data as is available. We study how the presence of market microstructure noise in high-frequency financial data can change that result. We show that the optimal sampling frequency at which to estimate the parameters of a discretely sampled continuous-time model can be finite when the observations are contaminated by market microstructure effects. We then address the question of what to do about the presence of the noise. We show that modelling the noise term explicitly restores the first order statistical effect that sampling as often as possible is optimal. But, more surprisingly, we also demonstrate that this is true even if one misspecifies the assumed distribution of the noise term. Not only is it still optimal to sample as often as possible, but the estimator has the same variance as if the noise distribution had been correctly specified, implying that attempts to incorporate the noise into the analysis cannot do more harm than good. Finally, we study the same questions when the observations are sampled at random time intervals, which are an essential feature of transaction-level data.

...read moreread less

520 citations

Journal Article•DOI•

Bias in species range estimates from minimum convex polygons: implications for conservation and options for improved planning

[...]

Mark A. Burgman¹, Julian C. Fox¹•Institutions (1)

University of Melbourne¹

01 Feb 2003-Animal Conservation

TL;DR: In this paper, the authors show that the bias increases with sample size, and is affected by the underlying shape of the species habitat, the magnitude of errors in locations, and the spatial and temporal distribution of sampling effort.

...read moreread less

Abstract: Minimum convex polygons (convex hulls) are an internationally accepted, standard method for estimating species’ ranges, particularly in circumstances in which presence-only data are the only kind of spatially explicit data available. One of their main strengths is their simplicity. They are used to make area statements and to assess trends in occupied habitat, and are an important part of the assessment of the conservation status of species. We show by simulation that these estimates are biased. The bias increases with sample size, and is affected by the underlying shape of the species habitat, the magnitude of errors in locations, and the spatial and temporal distribution of sampling effort. The errors affect both area statements and estimates of trends. Some of these errors may be reduced through the application of αhulls, which are generalizations of convex hulls, but they cannot be eliminated entirely. α-hulls provide an explicit means for excluding discontinuities within a species range. Strengths and weaknesses of alternatives including kernel estimators were examined. Convex hulls exhibit larger bias than α-hulls when used to quantify habitat extent and to detect changes in range, and when subject to differences in the spatial and temporal distribution of sampling effort and spatial accuracy. α-hulls should be preferred for estimating the extent of and trends in species’ ranges.

...read moreread less

462 citations

Posted Content•

A User's Guide to the Brave New World of Designing Simulation Experiments

[...]

Jack P. C. Kleijnen¹, Susan M. Sanchez², Thomas W. Lucas², Thomas M. Cioppa³•Institutions (3)

Tilburg University¹, Government of the United States of America², Naval Postgraduate School³

01 Jan 2003-Social Science Research Network

TL;DR: In this paper, the authors present an approach to solve the problem of the "missing link" problem in IJOC, which is located at http://dx.doi.org/10.1287/ijoc.1050.0136

...read moreread less

Abstract: The article of record as published may be located at http://dx.doi.org/10.1287/ijoc.1050.0136

...read moreread less

Procedures to estimate fecundity of marine fish species in relation to their reproductive strategy

[...]

Hilario Murua, Gerd Kraus, Fran Saborido-Rey, Peter R. Witthames, Anders Thorsen, S. Junquera - Show less +2 more

01 Dec 2003

Proceedings Article•DOI•

Overview of modern design of experiments methods for computational simulations

[...]

Anthony A. Giunta¹, Steven F. Wojtkiewicz¹, Michael S. Eldred¹•Institutions (1)

Sandia National Laboratories¹

01 Jan 2003

TL;DR: An overview of modern design of experiments (DOE) techniques that can be applied in computational engineering design studies and several types of modern DOE methods are described including pseudo-Monte Carlo sampling, quasi-monte Carlo sampled, Latin hypercube sampling, orthogonal array sampling, and Hammersley sequence sampling.

...read moreread less

Abstract: The intent of this paper is to provide an overview of modern design of experiments (DOE) techniques that can be applied in computational engineering design studies. The term modern refers to DOE techniques specifically designed for use with deterministic computer simulations. In addition, this term is used to contrast classical DOE techniques that were developed for laboratory and field experiments that possess random error sources. Several types of modern DOE methods are described including pseudo-Monte Carlo sampling, quasi-Monte Carlo sampling, Latin hypercube sampling, orthogonal array sampling, and Hammersley sequence sampling.

...read moreread less

Journal Article•DOI•

Geographical sampling bias and its implications for conservation priorities in Africa

[...]

Sushma Reddy¹, Liliana M. Dávalos¹•Institutions (1)

American Museum of Natural History¹

01 Nov 2003-Journal of Biogeography

TL;DR: To design and apply statistical tests for measuring sampling bias in the raw data used to the determine priority areas for conservation, and to discuss their impact on conservation analyses for the region.

...read moreread less

Abstract: Aim To design and apply statistical tests for measuring sampling bias in the raw data used to the determine priority areas for conservation, and to discuss their impact on conservation analyses for the region. LocationSub-Saharan Africa. Methods An extensive data set comprising 78,083 vouchered locality records for 1068 passerine birds in sub-Saharan Africa has been assembled. Using geographical information systems, we designed and applied two tests to determine if sampling of these taxa was biased. First, we detected possible biases because of accessibility by measuring the proximity of each record to cities, rivers and roads. Second, we quantified the intensity of sampling of each species inside and surrounding proposed conservation priority areas and compared it with sampling intensity in non-priority areas. We applied statistical tests to determine if the distribution of these sampling records deviated significantly from random distributions. Results The analyses show that the location and intensity of collecting have historically been heavily influenced by accessibility. Sampling localities show dense, significant aggregation around city limits, and along rivers and roads. When examining the collecting sites of each individual species, the pattern of sampling has been significantly concentrated within and immediately surrounding areas now designated as conservation priorities. Main conclusions Assessment of patterns of species richness and endemicity at the scale useful for establishing conservation priorities, below the continental level, undoubtedly reflects biases in taxonomic sampling. This is especially problematic for priorities established using the criterion of complementarity because the estimated spatial costs of this approach are highly sensitive to sampling artefacts. Hence such conservation priorities should be interpreted with caution proportional to the bias found. We argue that conservation priority setting analyses require (1) statistical tests to detect these biases, and (2) data treatment to reflect species distribution rather than patterns of collecting effort.

...read moreread less

Journal Article•DOI•

Applications of artificial neural networks for patterning and predicting aquatic insect species richness in running waters

[...]

Young-Seuk Park¹, Régis Céréghino¹, Arthur Compin¹, Sovan Lek¹•Institutions (1)

Centre national de la recherche scientifique¹

15 Feb 2003-Ecological Modelling

TL;DR: In this article, two artificial neural networks (ANNs), unsupervised and supervised learning algorithms, were applied to suggest practical approaches for the analysis of ecological data, and the results suggested that methodologies successively using two different neural networks are helpful to understand ecological data through ordination first, and then to predict target variables.

...read moreread less

Proceedings Article•DOI•

The bridge test for sampling narrow passages with probabilistic roadmap planners

[...]

David Hsu¹, Tingting Jiang, John H. Reif, Zheng Sun•Institutions (1)

National University of Singapore¹

10 Nov 2003

TL;DR: A hybrid sampling strategy in the PRM framework for finding paths through narrow passages is presented, which enables relatively small roadmaps to reliably capture the connectivity of configuration spaces with difficult narrow passages.

...read moreread less

Abstract: Probabilistic roadmap (PRM) planners have been successful in path planning of robots with many degrees of freedom, but narrow passages in a robot's configuration space create significant difficulty for PRM planners. This paper presents a hybrid sampling strategy in the PRM framework for finding paths through narrow passages. A key ingredient of the new strategy is the bridge test, which boosts the sampling density inside narrow passages. The bridge test relies on simple tests of local geometry and can be implemented efficiently in high-dimensional configuration spaces. The strengths of the bridge test and uniform sampling complement each other naturally and are combined to generate the final hybrid sampling strategy. Our planner was tested on point robots and articulated robots in planar workspaces. Preliminary experiments show that the hybrid sampling strategy enables relatively small roadmaps to reliably capture the connectivity of configuration spaces with difficult narrow passages.

...read moreread less

Proceedings Article•DOI•

Sampling biases in IP topology measurements

[...]

Anukool Lakhina¹, John W. Byers¹, Mark Crovella¹, P. Xie¹•Institutions (1)

Boston University¹

09 Jul 2003

TL;DR: It is shown that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph, and why this effect arises is explored.

...read moreread less

Abstract: Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail. We argue that the evidence to date for this conclusion is at best insufficient We show that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph. For example, given a sparse Erdos-Renyi random graph, the subgraph formed by a collection of shortest paths from a small set of random sources to a larger set of random destinations can exhibit a degree distribution remarkably like a power-law. We explore the reasons for how this effect arises, and show that in such a setting, edges are sampled in a highly biased manner. This insight allows us to formulate tests for determining when sampling bias is present. When we apply these tests to a number of well-known datasets, we find strong evidence for sampling bias.

...read moreread less

Journal Article•DOI•

IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments.

[...]

Chiara Romualdi¹, Stefania Bortoluzzi¹, Fabio d’Alessi¹, Gian Antonio Danieli¹•Institutions (1)

University of Padua¹

15 Jan 2003-Physiological Genomics

TL;DR: A novel web tool for the statistical analysis of gene expression data in multiple tag sampling experiments, using six different test statistics to detectially expressed genes.

...read moreread less

Abstract: Here we present a novel web tool for the statistical analysis of gene expression data in multiple tag sampling experiments. Differentially expressed genes are detected by using six different test s...

...read moreread less

Journal Article•DOI•

Equilibrium sampling devices.

[...]

Philipp Mayer, Johannes Tolls¹, Joop L. M. Hermens², Donald Mackay³•Institutions (3)

Henkel¹, Utrecht University², Trent University³

01 May 2003-Environmental Science & Technology

TL;DR: These devices are part of an emerging strategy for monitoring exposure to hydrophobic organic chemicals and are designed to provide real-time information about human exposure to these chemicals.

...read moreread less

Abstract: These devices are part of an emerging strategy for monitoring exposure to hydrophobic organic chemicals.

...read moreread less

Journal Article•DOI•

Important sampling in high dimensions

[...]

Siu-Kui Au¹, James L. Beck²•Institutions (2)

Nanyang Technological University¹, California Institute of Technology²

01 Apr 2003-Structural Safety

TL;DR: The conditions under which importance sampling is applicable in high dimensions are investigated and it is found that importance sampling densities using design points are applicable if the covariance matrix associated with each design point does not deviate significantly from the identity matrix.

...read moreread less

Proceedings Article•DOI•

Estimating flow distributions from sampled flow statistics

[...]

Nick Duffield¹, Carsten Lund¹, Mikkel Thorup¹•Institutions (1)

AT&T Labs¹

25 Aug 2003

TL;DR: This paper provides methods that use flow statistics formed from sampled packet stream to infer the frequencies of the number of packets per flow in the unsampled stream, and by exploiting protocol level detail reported in flow records.

...read moreread less

Abstract: Passive traffic measurement increasingly employs sampling at the packet level. Many high-end routers form flow statistics from a sampled substream of packets. Sampling is necessary in order to control the consumption of resources by the measurement operations. However, knowledge of the statistics of flows in the unsampled stream remains useful, for understanding both characteristics of source traffic, and consumption of resources in the network.This paper provide methods that use flow statistics formed from sampled packet stream to infer the absolute frequencies of lengths of flows in the unsampled stream. A key part of our work is inferring the numbers and lengths of flows of original traffic that evaded sampling altogether. We achieve this through statistical inference, and by exploiting protocol level detail reported in flow records. The method has applications to detection and characterization of network attacks: we show how to estimate, from sampled flow statistics, the number of compromised hosts that are sending attack traffic past the measurement point. We also investigate the impact on our results of different implementations of packet sampling.

...read moreread less

Journal Article•DOI•

Influence of Spatial Structure on Accuracy of Interpolation Methods

[...]

Alexandra Kravchenko¹•Institutions (1)

Michigan State University¹

01 Sep 2003-Soil Science Society of America Journal

TL;DR: In this article, the authors evaluated the effect of data variability and the strength of spatial correlation in the data on the performance of grid soil sampling of different sampling density and two interpolation procedures, ordinary point kriging and optimal inverse distance weighting (IDW).

...read moreread less

Abstract: Effectiveness of precision agriculture depends on accurate and efficient mapping of soil properties. Among the factors that most affect soil property mapping are the number of soil samples, the distance between sampling locations, and the choice of interpolation procedures. The objective of this study is to evaluate the effect of data variability and the strength of spatial correlation in the data on the performance of (i) grid soil sampling of different sampling density and (ii) two interpolation procedures, ordinary point kriging and optimal inverse distance weighting (IDW). Soil properties with coefficients of variation (CV) ranging from 12 to 67% were sampled in a 20-ha field using a regular grid with a 30-m distance between grid points. Data sets with different spatial structures were simulated based on the soil sample data using a simulated annealing procedure. The strength of simulated spatial structures ranged from weak with nugget to sill (N/S) ratio of 0.6 to strong (N/S ratio of 0.1). The results indicated that regardless of CV values, soil properties with a strong spatial structure were mapped more accurately than those that had weak spatial structure. Kriging with known variogram parameters performed significantly better than the IDW for most of the studied cases (P < 0.01). However, when variogram parameters were determined from sample variograms, kriging was as accurate as the IDW only for sufficiently large data sets, but was less precise when a reliable sample variogram could not be obtained from the data.

...read moreread less

Book Chapter•DOI•

Representative sampling for text classification using support vector machines

[...]

Zhao Xu¹, Kai Yu², Volker Tresp³, Xiaowei Xu⁴, Jizhi Wang¹ - Show less +1 more•Institutions (4)

Tsinghua University¹, Ludwig Maximilian University of Munich², Siemens³, University of Arkansas at Little Rock⁴

14 Apr 2003

TL;DR: A straightforward active learning heuristic, representative sampling, is described, which explores the clustering structure of 'uncertain' documents and identifies the representative samples to query the user opinions, for the purpose of speeding up the convergence of Support Vector Machine (SVM) classifiers.

...read moreread less

Abstract: In order to reduce human efforts, there has been increasing interest in applying active learning for training text classifiers. This paper describes a straightforward active learning heuristic, representative sampling, which explores the clustering structure of 'uncertain' documents and identifies the representative samples to query the user opinions, for the purpose of speeding up the convergence of Support Vector Machine (SVM) classifiers. Compared with other active learning algorithms, the proposed representative sampling explicitly addresses the problem of selecting more than one unlabeled documents. In an empirical study we compared representative sampling both with random sampling and with SVM active learning. The results demonstrated that representative sampling offers excellent learning performance with fewer labeled documents and thus can reduce human efforts in text classification tasks.

...read moreread less

Journal Article•DOI•

CPR sampling: the technical background, materials and methods, consistency and comparability

[...]

Sonia D. Batten, R Clark, J Flinkman¹, Graeme C. Hays², Eurgain H. John², A.W.G. John, TD Jonas, J.A. Lindley, DP Stevens, A.W. Walne - Show less +6 more•Institutions (2)

Finnish Institute of Marine Research¹, Swansea University²

01 Aug 2003-Progress in Oceanography

TL;DR: The Continuous Plankton Recorder (CPR) has been deployed for 70 years and has been used to sample plankton from the seafloor of the European continental shelf as mentioned in this paper.

...read moreread less

Book•

Ranked Set Sampling: Theory and Applications

[...]

Zehua Chen, Zhidong Bai, Bimal K. Sinha

03 Nov 2003

TL;DR: In this article, the authors propose a balanced ranked set sampling method for distribution-free tests with Ranked Set Sampling, which is based on nonparametric and parametric set sampling.

...read moreread less

Abstract: 1 Introduction.- 2 Balanced Ranked Set Sampling I: Nonparametric.- 3 Balanced Ranked Set Sampling II: Parametric.- 4 Unbalanced Ranked Set Sampling and Optimal Designs.- 5 Distribution-Free Tests with Ranked Set Sampling.- 6 Ranked Set Sampling with Concomitant Variables.- 7 Ranked Set Sampling as Data Reduction Tools.- 8 Case Studies.- References.

...read moreread less

Book•

Spatial and Temporal Statistics - Sampling Field Soils and Their Vegetation

[...]

D. R. Nielsen, O. Wendroth

01 Jan 2003

Journal Article•DOI•

Sampling with Arbitrary Sampling and Reconstruction Spaces and Oblique Dual Frame Vectors

[...]

Yonina C. Eldar¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2003-Journal of Fourier Analysis and Applications

TL;DR: In this paper, a general framework for sampling and reconstruction procedures based on a consistency requirement was introduced, which allows for almost arbitrary sampling and reconstruction spaces, as well as arbitrary input signals.

...read moreread less

Abstract: This article introduces a general framework for sampling and reconstruction procedures based on a consistency requirement, introduced by Unser and Aldroubi in [29]. The procedures we develop allow for almost arbitrary sampling and reconstruction spaces, as well as arbitrary input signals. We first derive a nonredundant sampling procedure. We then introduce the new concept of oblique dual frame vectors, that lead to frame expansions in which the analysis and synthesis frame vectors are not constrained to lie in the same space. Based on this notion, we develop a redundant sampling procedure that can be used to reduce the quantization error when quantizing the measurements prior to reconstruction.

...read moreread less

Proceedings Article•DOI•

Structured importance sampling of environment maps

[...]

Sameer Agarwal¹, Ravi Ramamoorthi², Serge Belongie¹, Henrik Wann Jensen¹•Institutions (2)

University of California, San Diego¹, Columbia University²

01 Jul 2003

TL;DR: This work introduces structured importance sampling, a new technique for efficiently rendering scenes illuminated by distant natural illumination given in an environment map, and presents a novel hierarchical stratification algorithm that uses the authors' metric to automatically stratify the environment map into regular strata.

...read moreread less

Abstract: We introduce structured importance sampling, a new technique for efficiently rendering scenes illuminated by distant natural illumination given in an environment map. Our method handles occlusion, high-frequency lighting, and is significantly faster than alternative methods based on Monte Carlo sampling. We achieve this speedup as a result of several ideas. First, we present a new metric for stratifying and sampling an environment map taking into account both the illumination intensity as well as the expected variance due to occlusion within the scene. We then present a novel hierarchical stratification algorithm that uses our metric to automatically stratify the environment map into regular strata. This approach enables a number of rendering optimizations, such as pre-integrating the illumination within each stratum to eliminate noise at the cost of adding bias, and sorting the strata to reduce the number of sample rays. We have rendered several scenes illuminated by natural lighting, and our results indicate that structured importance sampling is better than the best previous Monte Carlo techniques, requiring one to two orders of magnitude fewer samples for the same image quality.

...read moreread less

Journal Article•DOI•

Theoretical justification of sampling choices in international marketing research: key issues and guidelines for researchers

[...]

Nina Reynolds¹, Antonis C. Simintiras¹, Adamantios Diamantopoulos²•Institutions (2)

Swansea University¹, Loughborough University²

01 Jan 2003-Journal of International Business Studies

TL;DR: In this article, a framework for determining a sampling approach in international studies is proposed, based on an assessment of the way in which sampling affects the validity of research results, and shows how different research objectives impact upon the desired sampling method and the desired sample characteristics.

...read moreread less

Abstract: Sampling in the international environment needs to satisfy the same requirements as sampling in the domestic environment, but there are additional issues to consider, such as the need to balance within-country representativeness with cross-national comparability. However, most international marketing research studies fail to provide theoretical justification for their choice of sampling approach. This is because research design theory and sampling theory have not been well integrated in the context of international research. This paper seeks to fill the gap by developing a framework for determining a sampling approach in international studies. The framework is based on an assessment of the way in which sampling affects the validity of research results, and shows how different research objectives impact upon (a) the desired sampling method and (b) the desired sample characteristics. The aim is to provide researchers with operational guidance in choosing a sampling approach that is theoretically appropriate to their particular research aims.

...read moreread less

Journal Article•DOI•

Small-mammal density estimation: A field comparison of grid-based vs. web-based density estimators

[...]

Robert R. Parmenter¹, Terry L. Yates¹, David E. Anderson², Kenneth P. Burnham², Jonathan L. Dunnum¹, Alan B. Franklin², Michael T. Friggens¹, Bruce C. Lubow², Michael W. Miller², Gail S. Olson², Cheryl A. Parmenter¹, John R. Pollard³, Eric A. Rexstad⁴, Tanya M. Shenk, Thomas R. Stanley⁵, Gary C. White² - Show less +12 more•Institutions (5)

University of New Mexico¹, Colorado State University², University of St Andrews³, University of Alaska Fairbanks⁴, United States Geological Survey⁵

01 Feb 2003-Ecological Monographs

TL;DR: In this paper, two general classes of density estimation models have been developed: models that use data sets from capture-recapture or removal sampling techniques (often derived from trapping grids) from which separate estimates of population size (N) and effective sampling area (Â) are used to calculate density (D = N/Â), and models applicable to sampling regimes using distance-sampling theory (typically transect lines or trapping webs) to estimate detection functions and densities directly from the distance data.

...read moreread less

Abstract: Statistical models for estimating absolute densities of field populations of animals have been widely used over the last century in both scientific studies and wildlife management programs. To date, two general classes of density estimation models have been developed: models that use data sets from capture–recapture or removal sampling techniques (often derived from trapping grids) from which separate estimates of population size (N) and effective sampling area (Â) are used to calculate density (D = N/Â); and models applicable to sampling regimes using distance-sampling theory (typically transect lines or trapping webs) to estimate detection functions and densities directly from the distance data. However, few studies have evaluated these respective models for accuracy, precision, and bias on known field populations, and no studies have been conducted that compare the two approaches under controlled field conditions. In this study, we evaluated both classes of density estimators on known densities of e...

...read moreread less

Collapse