Showing papers on "Sampling (statistics) published in 2006"

PDF

Open Access

Journal Article•DOI•

On pixel-based texture synthesis by non-parametric sampling

[...]

Seunghyup Shin¹, Seunghyup Shin², Tomoyuki Nishita³, Sung Yong Shin²•Institutions (3)

Electronics and Telecommunications Research Institute¹, KAIST², University of Tokyo³

01 Oct 2006-Computers & Graphics

TL;DR: The objective is to enhance texture quality as much as possible with a minor sacrifice in efficiency in order to support the conjecture that the pixel-based approach would yield high quality images.

...read moreread less

1,462 citations

Proceedings Article•DOI•

Sampling from large graphs

[...]

Jure Leskovec¹, Christos Faloutsos¹•Institutions (1)

Carnegie Mellon University¹

20 Aug 2006

TL;DR: The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

...read moreread less

Abstract: Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

...read moreread less

1,290 citations

Journal Article•DOI•

Survey of sampling-based methods for uncertainty and sensitivity analysis

[...]

Jon C. Helton¹, Jay D. Johnson, Cédric J. Sallaberry², Curtis B. Storlie³•Institutions (3)

Arizona State University¹, Sandia National Laboratories², Colorado State University³

01 Jun 2006-Reliability Engineering & System Safety

TL;DR: Sampling-based methods for uncertainty and sensitivity analysis are reviewed and special attention is given to the determination of sensitivity analysis results.

...read moreread less

1,179 citations

Book Chapter•DOI•

Sampling strategies for bag-of-features image classification

[...]

Eric Nowak¹, Frédéric Jurie¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

07 May 2006

TL;DR: In this article, the authors show experimentally that for a representative selection of commonly used test databases and for moderate to large numbers of samples, random sampling gives equal or better classifiers than the sophisticated multiscale interest operators that are in common use.

...read moreread less

Abstract: Bag-of-features representations have recently become popular for content based image classification owing to their simplicity and good performance. They evolved from texton methods in texture analysis. The basic idea is to treat images as loose collections of independent patches, sampling a representative set of patches from the image, evaluating a visual descriptor vector for each patch independently, and using the resulting distribution of samples in descriptor space as a characterization of the image. The four main implementation choices are thus how to sample patches, how to describe them, how to characterize the resulting distributions and how to classify images based on the result. We concentrate on the first issue, showing experimentally that for a representative selection of commonly used test databases and for moderate to large numbers of samples, random sampling gives equal or better classifiers than the sophisticated multiscale interest operators that are in common use. Although interest operators work well for small numbers of samples, the single most important factor governing performance is the number of patches sampled from the test image and ultimately interest operators can not provide enough patches to compete. We also study the influence of other factors including codebook size and creation method, histogram normalization method and minimum scale for feature extraction.

...read moreread less

1,099 citations

Journal Article•DOI•

A global sensitivity analysis tool for the parameters of multi-variable catchment models

[...]

A. van Griensven¹, Thomas Meixner¹, Sabine Grunwald², Thomas F. A. Bishop², M. Diluzio³, Raghavan Srinivasan³ - Show less +2 more•Institutions (3)

University of California, Riverside¹, University of Florida², Texas A&M University³

15 Jun 2006-Journal of Hydrology

TL;DR: In this article, a sampling strategy that is a combination of latin-hypercube and one-factor-at-a-time sampling that allows a global sensitivity analysis for a long list of parameters with only a limited number of model runs is described.

...read moreread less

1,069 citations

Journal Article•DOI•

A conditioned Latin hypercube method for sampling in the presence of ancillary information

[...]

Budiman Minasny¹, Alex B. McBratney¹•Institutions (1)

University of Sydney¹

01 Nov 2006-Computers & Geosciences

TL;DR: The cLHS method with a search algorithm based on heuristic rules combined with an annealing schedule is presented, illustrated with a simple 3-D example and an application in digital soil mapping of part of the Hunter Valley of New South Wales, Australia.

...read moreread less

744 citations

Proceedings Article•DOI•

Very sparse random projections

[...]

Ping Li¹, Trevor Hastie¹, Kenneth Church²•Institutions (2)

Stanford University¹, Microsoft²

20 Aug 2006

TL;DR: This paper proposes sparse random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space that multiplies A by a random matrix R in RD x k, reducing the D dimensions down to just k for speeding up the computation.

...read moreread less

Abstract: There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A in Rn x D be our n points in D dimensions. The method multiplies A by a random matrix R in RD x k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0,1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0,1) entries in R with entries in -1,0,1 with probabilities 1/6, 2/3, 1/6, achieving a threefold speedup in processing time.We recommend using R of entries in -1,0,1 with probabilities 1/2√D, 1-1√D, 1/2√D for achieving a significant √D-fold speedup, with little loss in accuracy.

...read moreread less

668 citations

Journal Article•DOI•

Decision by sampling

[...]

Neil Stewart¹, Nick Chater¹, Gordon D. A. Brown¹•Institutions (1)

University of Warwick¹

01 Aug 2006-Cognitive Psychology

TL;DR: A theory of decision by sampling (DbS) in which, in contrast with traditional models, there are no underlying psychoeconomic scales is presented and it is assumed that an attribute's subjective value is constructed from a series of binary, ordinal comparisons to a sample of attribute values drawn from memory.

...read moreread less

623 citations

Book•

Sampling for Natural Resource Monitoring

[...]

J.J. de Gruijter, Dick J. Brus, Marc F. P. Bierkens, Martin Knotters

22 Feb 2006

TL;DR: This presentation is broader than standard statistical texts, as the authors pay much attention to how statistical methodology can be employed and embedded in real-life spatial inventory and monitoring projects.

...read moreread less

Abstract: The book presents the statistical knowledge and methodology of sampling and data analysis useful for spatial inventory and monitoring of natural resources. The authors omitted all theory not essential for applications or for basic understanding. This presentation is broader than standard statistical texts, as the authors pay much attention to how statistical methodology can be employed and embedded in real-life spatial inventory and monitoring projects. Thus they discuss in detail how efficient sampling schemes and monitoring systems can be designed in view of the aims and constraints of the project.

...read moreread less

495 citations

Journal Article•DOI•

Statistical properties of sampled networks.

[...]

Sang Hoon Lee¹, Pan-Jun Kim¹, Hawoong Jeong¹•Institutions (1)

KAIST¹

04 Jan 2006-Physical Review E

TL;DR: In this paper, the authors exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original networks.

...read moreread less

Abstract: We study the statistical properties of the sampled scale-free networks, deeply related to the proper identification of various real-world networks. We exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original networks. It is found that the quantities related to those properties in sampled networks appear to be estimated quite differently for each sampling method. We explain why such a biased estimation of quantities would emerge from the sampling procedure and give appropriate criteria for each sampling method to prevent the quantities from being overestimated or underestimated.

...read moreread less

487 citations

Journal Article•DOI•

Using Niche-Based Models to Improve the Sampling of Rare Species

[...]

Antoine Guisan¹, Olivier Broennimann¹, Robin Engler¹, Mathias Vust¹, Nigel G. Yoccoz², Anthony Lehmann, Niklaus E. Zimmermann - Show less +3 more•Institutions (2)

University of Lausanne¹, University of Tromsø²

01 Apr 2006-Conservation Biology

TL;DR: The model-based approach to sampling for rare species helps in the discovery of new populations of the target species in remote areas where the predicted habitat suitability is high and may save up to 70% of the time spent in the field.

...read moreread less

Abstract: Because data on rare species usually are sparse, it is important to have efficient ways to sample additional data. Traditional sampling approaches are of limited value for rare species because a very large proportion of randomly chosen sampling sites are unlikely to shelter the species. For these species, spatial predictions from niche-based distribution models can be used to stratify the sampling and increase sampling efficiency. New data sampled are then used to improve the initial model. Applying this approach repeatedly is an adaptive process that may allow increasing the number of new occurrences found. We illustrate the approach with a case study of a rare and endangered plant species in Switzerland and a simulation experiment. Our field survey confirmed that the method helps in the discovery of new populations of the target species in remote areas where the predicted habitat suitability is high. In our simulations the model-based approach provided a significant improvement (by a factor of 1.8 to 4 times, depending on the measure) over simple random sampling. In terms of cost this approach may save up to 70% of the time spent in the field.

...read moreread less

Journal Article•DOI•

Properties of principal component methods for functional and longitudinal data analysis

[...]

Peter Hall, Hans-Georg Müller, Jane-Ling Wang

01 Aug 2006-Annals of Statistics

TL;DR: In this article, the authors show that if there is a sample of n functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-n consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise.

...read moreread less

Abstract: The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of "functional data analysis," it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analysis a random function typically represents a patient, or subject, who is observed at only a small number of randomly distributed points, with nonnegligible measurement error. Nevertheless, essentially the same methods can be used in both these cases, as well as in the vast number of settings that lie between them. How is performance affected by the sampling plan? In this paper we answer that question. We show that if there is a sample of n functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-n consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points. In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.

...read moreread less

Journal Article•DOI•

A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations

[...]

Luca Maragliano¹, Eric Vanden-Eijnden¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

26 Jul 2006-Chemical Physics Letters

TL;DR: An extended system where the collective variables are treated as dynamical ones and shown that this allows to sample the free energy landscape of these variables directly, and how this method can be generalized and used as an alternative to the Kirkwood generalized thermodynamic integration approach for the calculation of free energy differences is discussed.

...read moreread less

Journal Article•DOI•

Send-On-Delta Concept: An Event-Based Data Reporting Strategy

[...]

Marek Miskowicz¹•Institutions (1)

AGH University of Science and Technology¹

20 Jan 2006-Sensors

TL;DR: It is shown that the lower bound of the send-on-delta effectiveness is independent of the sampling resolution, and constitutes the built-in feature of the input signal.

...read moreread less

Abstract: : The paper addresses the issue of the send-on-delta data collecting strategy to capture information from the environment. Send-on-delta concept is the signal-dependent temporal sampling scheme, where the sampling is triggered if the signal deviates by delta defined as the significant change of its value. It is an attractive scheme for wireless sensor networking due to effective energy consumption. The quantitative evaluations of send-on-delta scheme for a general type continuous-time bandlimited signal are presented in the paper. The bounds on the mean traffic of reports for a given signal, and assumed sampling resolution, are evaluated. Furthermore, the send-on-delta effectiveness, defined as the reduction of the mean rate of reports in comparison to the periodic sampling for a given resolution, is derived. It is shown that the lower bound of the send-on-delta effectiveness (i.e. the guaranteed reduction) is independent of the sampling resolution, and constitutes the built-in feature of the input signal. The calculation of the effectiveness for standard signals, that model the state evolution of dynamic environment in time, is exemplified. Finally, the example of send-on-delta programming is shown.

...read moreread less

Journal Article•DOI•

Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling

[...]

Matthew J. Salganik

26 Aug 2006-Journal of Urban Health-bulletin of The New York Academy of Medicine

TL;DR: A bootstrap method is presented for constructing confidence intervals around respondent-driven sampling estimates and it is demonstrated in simulations that it outperforms the naive method currently in use.

...read moreread less

Abstract: Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling.

...read moreread less

Journal Article•DOI•

Building Statistical Models To Analyze Species Distributions

[...]

Andrew M. Latimer¹, Shanshan Wu¹, Alan E. Gelfand², John A. Silander¹•Institutions (2)

University of Connecticut¹, Duke University²

01 Feb 2006-Ecological Applications

TL;DR: It is demonstrated that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results.

...read moreread less

Abstract: Models of the geographic distributions of species have wide application in ecology. But the nonspatial, single-level, regression models that ecologists have often employed do not deal with problems of irregular sampling intensity or spatial dependence, and do not adequately quantify uncertainty. We show here how to build statistical models that can handle these features of spatial prediction and provide richer, more powerful inference about species niche relations, distributions, and the effects of human disturbance. We begin with a familiar generalized linear model and build in additional features, including spatial random effects and hierarchical levels. Since these models are fully specified sta- tistical models, we show that it is possible to add complexity without sacrificing inter- pretability. This step-by-step approach, together with attached code that implements a simple, spatially explicit, regression model, is structured to facilitate self-teaching. All models are developed in a Bayesian framework. We assess the performance of the models by using them to predict the distributions of two plant species (Proteaceae) from South Africa's Cape Floristic Region. We demonstrate that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results. Adding hierarchical levels to the models has further advantages in allowing human trans- formation of the landscape to be taken into account, as well as additional features of the sampling process.

...read moreread less

Journal Article•DOI•

SimFlex: Statistical Sampling of Computer System Simulation

[...]

Thomas F. Wenisch¹, Roland E. Wunderlich¹, Michael Ferdman¹, Anastasia Ailamaki¹, Babak Falsafi¹, James C. Hoe¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

01 Jul 2006-IEEE Micro

TL;DR: Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism.

...read moreread less

Abstract: Timing-accurate full-system multiprocessor simulations can take years because of architecture and application complexity. Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism

...read moreread less

Journal Article•DOI•

Estimation under purposive sampling

[...]

Jacqueline Guarte, Erniel B. Barrios¹•Institutions (1)

University of the Philippines¹

01 Jul 2006-Communications in Statistics - Simulation and Computation

TL;DR: In this article, a nonparametric bootstrap is proposed in estimating location parameters and the corresponding variances, and an estimate of bias and a measure of variance of the point estimate are computed using the Monte Carlo method.

...read moreread less

Abstract: Purposive sampling is described as a random selection of sampling units within the segment of the population with the most information on the characteristic of interest. Nonparametric bootstrap is proposed in estimating location parameters and the corresponding variances. An estimate of bias and a measure of variance of the point estimate are computed using the Monte Carlo method. The bootstrap estimator of the population mean is efficient and consistent in the homogeneous, heterogeneous, and two-segment populations simulated. The design-unbiased approximation of the standard error estimate differs substantially from the bootstrap estimate in severely heterogeneous and positively skewed populations.

...read moreread less

Journal Article•DOI•

Trans-dimensional inverse problems, model comparison and the evidence

[...]

Malcolm Sambridge¹, Kerry Gallagher², Andrew Jackson³, Peter Rickwood¹•Institutions (3)

Australian National University¹, Imperial College London², ETH Zurich³

01 Nov 2006-Geophysical Journal International

TL;DR: In this paper, a particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces, and it is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques.

...read moreread less

Abstract: SUMMARY In most geophysical inverse problems the properties of interest are parametrized using a fixed number of unknowns. In some cases arguments can be used to bound the maximum number of parameters that need to be considered. In others the number of unknowns is set at some arbitrary value and regularization is used to encourage simple, non-extravagant models. In recent times variable or self-adaptive parametrizations have gained in popularity. Rarely, however, is the number of unknowns itself directly treated as an unknown. This situation leads to a transdimensional inverse problem, that is, one where the dimension of the parameter space is a variable to be solved for. This paper discusses trans-dimensional inverse problems from the Bayesian viewpoint. A particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces. A quantity termed the evidence or marginal likelihood plays a key role in this type of problem. It is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques. Numerical examples are used to illustrate the main points. The evidence can be difficult to calculate, especially in high-dimensional non-linear inverse problems. Nevertheless some general strategies are discussed and analytical expressions given for certain linear problems.

...read moreread less

Journal Article•DOI•

Three-dimensional radial ultrashort echo-time imaging with T2 adapted sampling.

[...]

Jürgen Rahmer¹, Peter Börnert¹, Jan Groen¹, Clemens Bos¹•Institutions (1)

Philips¹

01 May 2006-Magnetic Resonance in Medicine

TL;DR: The application of 3D radial sampling of the free‐induction decay to proton ultrashort echo‐time (UTE) imaging is reported and a maximal signal‐to‐noise ratio (SNR) with negligible decay‐induced loss in spatial resolution is obtained.

...read moreread less

Abstract: The application of 3D radial sampling of the free-induction decay to proton ultrashort echo-time (UTE) imaging is reported. The effects of T2 decay during signal acquisition on the 3D radial point-spread function are analyzed and compared to 2D radial and 1D sampling. It is found that in addition to the use of ultrashort TE, the proper choice of the acquisition-window duration TAQ is essential for imaging short-T2 components. For 3D radial sampling, a maximal signal-to-noise ratio (SNR) with negligible decay-induced loss in spatial resolution is obtained for an acquisition-window duration of TAQ ≈ 0.69 T2. For 2D and 1D sampling, corresponding values are derived as well. Phantom measurements confirm the theoretical findings and demonstrate the impact of different acquisition-window durations on SNR and spatial resolution for a given T2 component. In vivo scans show the potential of 3D UTE imaging with T2-adapted sampling for musculoskeletal imaging using standard MR equipment. The visualization of complex anatomy is demonstrated by extracting curved slices from the isotropically resolved 3D UTE image data. Magn Reson Med, 2006. © 2006 Wiley-Liss, Inc.

...read moreread less

Book•

Sampling Methods, Remote Sensing and GIS Multiresource Forest Inventory

[...]

Michael Köhl, Steen Magnussen, Marco Marchetti

14 Sep 2006

TL;DR: Forest Inventories - an Overview Forest Mensuration, Sampling in Forest Surveys, Remote Sensing, Geographic and Forest Information Systems, Multiresource Forest Inventory as discussed by the authors.

...read moreread less

Abstract: Forest Inventories - an Overview- Forest Mensuration- Sampling in Forest Surveys- Remote Sensing- Geographic and Forest Information Systems- Multiresource Forest Inventory

...read moreread less

Journal Article•DOI•

General Multi-Level Modeling with Sampling Weights

[...]

Tihomir Asparouhov

23 Sep 2006-Communications in Statistics-theory and Methods

TL;DR: A simulation study is conducted to determine the effect various factors have on theMPML estimation method and recommends a multi-stage procedure based on the MPML method that can be used in practical applications.

...read moreread less

Abstract: In this article we study the approximately unbiased multi-level pseudo maximum likelihood (MPML) estimation method for general multi-level modeling with sampling weights. We conduct a simulation study to determine the effect various factors have on the estimation method. The factors we included in this study are scaling method, size of clusters, invariance of selection, informativeness of selection, intraclass correlation, and variability of standardized weights. The scaling method is an indicator of how the weights are normalized on each level. The invariance of the selection is an indicator of whether or not the same selection mechanism is applied across clusters. The informativeness of the selection is an indicator of how biased the selection is. We summarize our findings and recommend a multi-stage procedure based on the MPML method that can be used in practical applications.

...read moreread less

Journal Article•DOI•

A spatial data structure for fast Poisson-disk sample generation

[...]

Daniel Dunbar¹, Greg Humphreys¹•Institutions (1)

University of Virginia¹

01 Jul 2006

TL;DR: A new method for sampling by dart-throwing in O(N log N) time is presented and a novel and efficient variation for generating Poisson-disk distributions in O (N) time and space is introduced.

...read moreread less

Abstract: Sampling distributions with blue noise characteristics are widely used in computer graphics. Although Poisson-disk distributions are known to have excellent blue noise characteristics, they are generally regarded as too computationally expensive to generate in real time. We present a new method for sampling by dart-throwing in O(N log N) time and introduce a novel and efficient variation for generating Poisson-disk distributions in O(N) time and space.

...read moreread less

Proceedings Article•DOI•

Peer counting and sampling in overlay networks: random walk methods

[...]

Laurent Massoulié¹, Erwan Le Merrer, Anne-Marie Kermarrec², Ayalvadi Ganesh¹•Institutions (2)

Microsoft¹, French Institute for Research in Computer Science and Automation²

23 Jul 2006

TL;DR: This article addresses the problem of counting the number of peers in a peer-to-peer system and more generally of aggregating statistics of individual peers over the whole system, and proposes two generic techniques to solve this problem.

...read moreread less

Abstract: In this article we address the problem of counting the number of peers in a peer-to-peer system, and more generally of aggregating statistics of individual peers over the whole system. This functionality is useful in many applications, but hard to achieve when each node has only a limited, local knowledge of the whole system. We propose two generic techniques to solve this problem. The Random Tour method is based on the return time of a continuous time random walk to the node originating the query. The Sample and Collide method is based on counting the number of random samples gathered until a target number of redundant samples are obtained. It is inspired by the "birthday paradox" technique of [6], upon which it improves by achieving a target variance with fewer samples. The latter method relies on a sampling sub-routine which returns randomly chosen peers. Such a sampling algorithm is of independent interest. It can be used, for instance, for neighbour selection by new nodes joining the system. We use a continuous time random walk to obtain such samples. We analyse the complexity and accuracy of the two methods. We illustrate in particular how expansion properties of the overlay affect their performance.

...read moreread less

Book Chapter•DOI•

Eigensolver methods for progressive multidimensional scaling of large data

[...]

Ulrik Brandes¹, Christian Pich¹•Institutions (1)

University of Konstanz¹

18 Sep 2006

TL;DR: A novel sampling-based approximation technique for classical multidimensional scaling that yields an extremely fast layout algorithm suitable even for very large graphs, and is among the fastest methods available.

...read moreread less

Abstract: We present a novel sampling-based approximation technique for classical multidimensional scaling that yields an extremely fast layout algorithm suitable even for very large graphs. It produces layouts that compare favorably with other methods for drawing large graphs, and it is among the fastest methods available. In addition, our approach allows for progressive computation, i.e. a rough approximation of the layout can be produced even faster, and then be refined until satisfaction.

...read moreread less

Journal Article•DOI•

Numerical vs. statistical probabilistic model checking

[...]

Håkan L. S. Younes¹, Marta Kwiatkowska², Gethin Norman², David Parker²•Institutions (2)

Carnegie Mellon University¹, University of Birmingham²

01 Jun 2006-International Journal on Software Tools for Technology Transfer

TL;DR: This study relies on highly efficient sequential acceptance sampling tests, which enables statistical solution techniques to quickly return a result with some uncertainty in CSL model checking, and proposes a novel combination of the two solution techniques for verifying CSL queries with nested probabilistic operators.

...read moreread less

Abstract: Numerical analysis based on uniformisation and statistical techniques based on sampling and simulation are two distinct approaches for transient analysis of stochastic systems. We compare the two solution techniques when applied to the verification of time-bounded until formulae in the temporal stochastic logic CSL, both theoretically and through empirical evaluation on a set of case studies. Our study differs from most previous comparisons of numerical and statistical approaches in that CSL model checking is a hypothesis-testing problem rather than a parameter-estimation problem. We can therefore rely on highly efficient sequential acceptance sampling tests, which enables statistical solution techniques to quickly return a result with some uncertainty. We also propose a novel combination of the two solution techniques for verifying CSL queries with nested probabilistic operators.

...read moreread less

Proceedings Article•DOI•

Is sampled data sufficient for anomaly detection

[...]

Jianning Mai¹, Chen-Nee Chuah¹, Ashwin Sridharan², Tao Ye², Hui Zang² - Show less +1 more•Institutions (2)

University of California, Davis¹, Sprint Corporation²

25 Oct 2006

TL;DR: This paper sampled packet traces captured from a Tier-1 IP-backbone using four popular methods: random packet sampling, random flow sampling, smart sampling, and sample-and-hold to identify the traffic features critical for anomaly detection and analyze how they are affected by sampling.

...read moreread less

Abstract: Sampling techniques are widely used for traffic measurements at high link speed to conserve router resources. Traditionally, sampled traffic data is used for network management tasks such as traffic matrix estimations, but recently it has also been used in numerous anomaly detection algorithms, as security analysis becomes increasingly critical for network providers. While the impact of sampling on traffic engineering metrics such as flow size and mean rate is well studied, its impact on anomaly detection remains an open question.This paper presents a comprehensive study on whether existing sampling techniques distort traffic features critical for effective anomaly detection. We sampled packet traces captured from a Tier-1 IP-backbone using four popular methods: random packet sampling, random flow sampling, smart sampling, and sample-and-hold. The sampled data is then used as input to detect two common classes of anomalies: volume anomalies and port scans. Since it is infeasible to enumerate all existing solutions, we study three representative algorithms: a wavelet-based volume anomaly detection and two portscan detection algorithms based on hypotheses testing. Our results show that all the four sampling methods introduce fundamental bias that degrades the performance of the three detection schemes, however the degradation curves are very different. We also identify the traffic features critical for anomaly detection and analyze how they are affected by sampling. Our work demonstrates the need for better measurement techniques, since anomaly detection operates on a drastically different information region, which is often overlooked by existing traffic accounting methods that target heavy-hitters.

...read moreread less

Journal Article•DOI•

Prevalence of Mixed-methods Sampling Designs in Social Science Research

[...]

Kathleen M. T. Collins¹, Anthony J. Onwuegbuzie², Qun G. Jiao³•Institutions (3)

University of Arkansas¹, University of South Florida², City University of New York³

15 May 2006-Evaluation & Research in Education

TL;DR: In this article, the authors document the prevalence of sampling designs used in mixed-methods research and examine the interpretive consistency between interpretations made in mixedmethods studies and the sampling design used.

...read moreread less

Abstract: The purpose of this mixed-methods study was to document the prevalence of sampling designs utilised in mixed-methods research and to examine the interpretive consistency between interpretations made in mixed-methods studies and the sampling design used. Classification of studies was based on a two-dimensional mixed-methods sampling model. This model provides a typology in which sampling designs can be classified according to the time orientation of the components (i.e. concurrent versus sequential) and the relationship of the qualitative and quantitative samples (i.e. identical versus parallel versus nested versus multilevel). A quantitative analysis of the 42 mixed-methods studies that were published in the four leading school psychology journals revealed that a sequential design using multilevel samples was the most frequent sampling design, being used in 40.5% (n=17) of the studies. More studies utilised a sampling design that was sequential (66.6%; n=28) than concurrent (33.4%; n=14). Also, multilevel...

...read moreread less

Journal Article•DOI•

A survey on sampling and probe methods for inverse problems

[...]

Roland Potthast¹•Institutions (1)

University of Göttingen¹

03 Feb 2006-Inverse Problems

TL;DR: A survey of sampling and probe methods for the solution of inverse acoustic and electromagnetic scattering problems can be found in this paper, where the main ideas, approaches and convergence results of the methods are presented.

...read moreread less

Abstract: The goal of the review is to provide a state-of-the-art survey on sampling and probe methods for the solution of inverse problems. Further, a configuration approach to some of the problems will be presented. We study the concepts and analytical results for several recent sampling and probe methods. We will give an introduction to the basic idea behind each method using a simple model problem and then provide some general formulation in terms of particular configurations to study the range of the arguments which are used to set up the method. This provides a novel way to present the algorithms and the analytic arguments for their investigation in a variety of different settings. In detail we investigate the probe method (Ikehata), linear sampling method (Colton–Kirsch) and the factorization method (Kirsch), singular sources method (Potthast), no response test (Luke–Potthast), range test (Kusiak, Potthast and Sylvester) and the enclosure method (Ikehata) for the solution of inverse acoustic and electromagnetic scattering problems. The main ideas, approaches and convergence results of the methods are presented. For each method, we provide a historical survey about applications to different situations.

...read moreread less

Journal Article•DOI•

Bayesian Geostatistical Design

[...]

Peter J. Diggle¹, Søren Nymand Lophaven²•Institutions (2)

Lancaster University¹, Technical University of Denmark²

01 Mar 2006-Scandinavian Journal of Statistics

TL;DR: A Bayesian design criterion is proposed which focuses on the goal of efficient spatial prediction whilst allowing for the fact that model parameter values are unknown.

...read moreread less

Abstract: This paper describes the use of model-based geostatistics for choosing the optimal set of sampling locations, collectively called the design, for a geostatistical analysis Two types of design situations are considered These are retrospective design, which concerns the addition of sampling locations to, or deletion of locations from, an existing design, and prospective design, which consists of choosing optimal positions for a new set of sampling locations We propose a Bayesian desin criterion which focuses on the goal of ecien t spatial prediction whilst allowing for the fact that model parameter values are unknown The results show that in this situation a wide range of inter-point distances should be included in the design, and the widely used regular design is therefore not the optimal choice

...read moreread less

Collapse