Showing papers on "Sampling (statistics) published in 2013"

PDF

Open Access

Journal Article•DOI•

Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation

[...]

Pontus Olofsson¹, Giles M. Foody², Stephen V. Stehman³, Curtis E. Woodcock¹•Institutions (3)

Boston University¹, University of Nottingham², State University of New York System³

15 Feb 2013-Remote Sensing of Environment

TL;DR: In this article, an error-adjusted estimator of area can be easily produced once an accuracy assessment has been performed and an error matrix constructed, which can then be incorporated into an uncertainty analysis for applications using land change area as an input (e.g., a carbon flux model).

...read moreread less

749 citations

Posted Content•

Minimizing Finite Sums with the Stochastic Average Gradient

[...]

Mark Schmidt¹, Nicolas Le Roux, Francis Bach²•Institutions (2)

University of British Columbia¹, École Normale Supérieure²

10 Sep 2013-arXiv: Optimization and Control

TL;DR: In this paper, the stochastic average gradient (SAG) method was proposed to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.

...read moreread less

Abstract: We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

...read moreread less

744 citations

Journal Article•DOI•

Experimental boson sampling

[...]

Max Tillmann¹, Max Tillmann², Borivoje Dakić¹, René Heilmann³, Stefan Nolte³, Alexander Szameit³, Philip Walther¹, Philip Walther² - Show less +4 more•Institutions (3)

University of Vienna¹, Austrian Academy of Sciences², University of Jena³

01 Jul 2013-Nature Photonics

TL;DR: In this paper, Aaronson and Arkhipov's model of computation with photons in integrated optical circuits was implemented and the authors set a benchmark for a type of quantum computer that can potentially outperform a conventional computer by using only a few photons and linear optical elements.

...read moreread less

Abstract: The boson-sampling problem is experimentally solved by implementing Aaronson and Arkhipov's model of computation with photons in integrated optical circuits. These results set a benchmark for a type of quantum computer that can potentially outperform a conventional computer by using only a few photons and linear optical elements.

...read moreread less

710 citations

Journal Article•DOI•

Importance Nested Sampling and the MultiNest Algorithm

[...]

Farhan Feroz¹, Michael P. Hobson, Ewan Cameron, Anthony N. Pettitt•Institutions (1)

University of Cambridge¹

10 Jun 2013-arXiv: Instrumentation and Methods for Astrophysics

TL;DR: In this article, an alternative summation of the MultiNest draws, called importance nested sampling (INS), is presented, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than vanilla NS with no change in the way Multi-Nest explores the parameter space.

...read moreread less

Abstract: Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla' NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.

...read moreread less

674 citations

Journal Article•DOI•

Photonic Boson Sampling in a Tunable Circuit

[...]

Matthew A. Broome¹, Alessandro Fedrizzi¹, Saleh Rahimi-Keshari¹, Justin Dove², Scott Aaronson², Timothy C. Ralph¹, Andrew White¹ - Show less +3 more•Institutions (2)

University of Queensland¹, Massachusetts Institute of Technology²

15 Feb 2013-Science

TL;DR: The central premise of boson sampling was tested, experimentally verifying that three-photon scattering amplitudes are given by the permanents of submatrices generated from a unitary describing a six-mode integrated optical circuit.

...read moreread less

Abstract: Quantum computers are unnecessary for exponentially efficient computation or simulation if the Extended Church-Turing thesis is correct. The thesis would be strongly contradicted by physical devices that efficiently perform tasks believed to be intractable for classical computers. Such a task is boson sampling: sampling the output distributions of n bosons scattered by some passive, linear unitary process. We tested the central premise of boson sampling, experimentally verifying that three-photon scattering amplitudes are given by the permanents of submatrices generated from a unitary describing a six-mode integrated optical circuit. We find the protocol to be robust, working even with the unavoidable effects of photon loss, non-ideal sources, and imperfect detection. Scaling this to large numbers of photons should be a much simpler task than building a universal quantum computer.

...read moreread less

671 citations

Proceedings Article•DOI•

Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation

[...]

David Ferstl¹, Christian Reinbacher¹, Rene Ranftl¹, Matthias Ruether¹, Horst Bischof¹ - Show less +1 more•Institutions (1)

Graz University of Technology¹

01 Dec 2013

TL;DR: This work formulate a convex optimization problem using higher order regularization for depth image up sampling, and derives a numerical algorithm based on a primal-dual formulation that is efficiently parallelized and runs at multiple frames per second.

...read moreread less

Abstract: In this work we present a novel method for the challenging problem of depth image up sampling. Modern depth cameras such as Kinect or Time-of-Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we formulate a convex optimization problem using higher order regularization for depth image up sampling. In this optimization an an isotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the up sampling. We derive a numerical algorithm based on a primal-dual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel up sampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate ground truth, which, for the first time, enable to benchmark depth up sampling methods using real sensor data.

...read moreread less

538 citations

Journal Article•DOI•

Sampling: why and how of it?

[...]

Anita Shankar Acharya, Anupam Prakash, Pikee Saxena, Aruna Nigam

07 Jul 2013-Indian Journal of Medical Specialities

521 citations

Book•

Sampling and Choosing Cases in Qualitative Research: A Realist Approach

[...]

Nick Emmel

11 Oct 2013

TL;DR: In this paper, the authors discuss the basics of realist sampling and choose cases based on theoretical or purposive sampling strategies in a realist sample collection strategy in the context of qualitative research.

...read moreread less

Abstract: Introduction From Sampling to Choosing Cases PART ONE: THE CASES Theoretical Sampling Purposeful Sampling Theoretical or Purposive Sampling PART TWO: CHOOSING CASES The Basics of Realist Sampling Purposive Work in a Realist Sampling Strategy Purposefully Choosing Cases Interpretation and Explanation Sample Size Choosing Cases in Qualitative Research

...read moreread less

512 citations

Journal Article•DOI•

Compressed Sensing Signal and Data Acquisition in Wireless Sensor Networks and Internet of Things

[...]

Shancang Li¹, Li Da Xu, Xinheng Wang¹•Institutions (1)

Swansea University¹

01 Jan 2013-IEEE Transactions on Industrial Informatics

TL;DR: In this paper, a compressed sensing-based data sampling and data acquisition in wireless sensor networks and the Internet of Things (IoT) has been investigated, in which the end nodes measure, transmit, and store the sampled data in the framework.

...read moreread less

Abstract: The emerging compressed sensing (CS) theory can significantly reduce the number of sampling points that directly corresponds to the volume of data collected, which means that part of the redundant data is never acquired. It makes it possible to create standalone and net-centric applications with fewer resources required in Internet of Things (IoT). CS-based signal and information acquisition/compression paradigm combines the nonlinear reconstruction algorithm and random sampling on a sparse basis that provides a promising approach to compress signal and data in information systems. This paper investigates how CS can provide new insights into data sampling and acquisition in wireless sensor networks and IoT. First, we briefly introduce the CS theory with respect to the sampling and transmission coordination during the network lifetime through providing a compressed sampling process with low computation costs. Then, a CS-based framework is proposed for IoT, in which the end nodes measure, transmit, and store the sampled data in the framework. Then, an efficient cluster-sparse reconstruction algorithm is proposed for in-network compression aiming at more accurate data reconstruction and lower energy efficiency. Performance is evaluated with respect to network size using datasets acquired by a real-life deployment.

...read moreread less

478 citations

Journal Article•DOI•

Estimating occupancy and abundance of stream amphibians using environmental DNA from filtered water samples

[...]

David S. Pilliod¹, Caren S. Goldberg², Robert S. Arkle¹, Lisette P. Waits²•Institutions (2)

United States Geological Survey¹, University of Idaho²

22 May 2013-Canadian Journal of Fisheries and Aquatic Sciences

TL;DR: In this article, the authors compared sampling results of eDNA-based methods for detecting aquatic species with field protocols and precision of the resulting estimates, but with little evaluation of field protocols or precision of results.

...read moreread less

Abstract: Environmental DNA (eDNA) methods for detecting aquatic species are advancing rapidly, but with little evaluation of field protocols or precision of resulting estimates. We compared sampling results...

...read moreread less

475 citations

Posted Content•

Deep Learning of Representations: Looking Forward

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

02 May 2013-arXiv: Learning

TL;DR: In this paper, the authors examine some of the challenges of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data.

...read moreread less

Abstract: Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts. Although the study of deep learning has already led to impressive theoretical results, learning algorithms and breakthrough experiments, several challenges lie ahead. This paper proposes to examine some of these challenges, centering on the questions of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data. It also proposes a few forward-looking research directions aimed at overcoming these challenges.

...read moreread less

Journal Article•DOI•

Efficient sampling of fast and slow cosmological parameters

[...]

Antony Lewis¹•Institutions (1)

University of Sussex¹

16 Apr 2013-Physical Review D

TL;DR: This work describes a method for decorrelating fast and slow parameters so that parameter sampling in the full space becomes almost as efficient as sampled in the slow subspace when the covariance is well known and the distributions are simple.

...read moreread less

Abstract: Physical parameters are often constrained from the data likelihoods using sampling methods. Changing some parameters can be much more computationally expensive (`slow') than changing other parameters (`fast parameters'). I describe a method for decorrelating fast and slow parameters so that parameter sampling in the full space becomes almost as efficient as sampling in the slow subspace when the covariance is well known and the distributions are simple. This gives a large reduction in computational cost when there are many fast parameters. The method can also be combined with a fast 'dragging' method proposed by Neal (2005) that can be more robust and efficient when parameters cannot be fully decorrelated a priori or have more complicated dependencies. I illustrate these methods for the case of cosmological parameter estimation using data likelihoods from the Planck satellite observations with dozens of fast nuisance parameters, and demonstrate a speed up by a factor of five or more. In more complicated cases, especially where the fast subspace is very fast but complex or highly correlated, the fast-slow sampling methods can in principle give arbitrarily large performance gains. The new samplers are implemented in the latest version of the publicly available CosmoMC code.

...read moreread less

Proceedings Article•

Fast dropout training

[...]

Sida I. Wang¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

16 Jun 2013

TL;DR: This work shows how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective, which gives an order of magnitude speedup and more stability.

...read moreread less

Abstract: Preventing feature co-adaptation by encouraging independent contributions from different features often improves classification and regression performance. Dropout training (Hinton et al., 2012) does this by randomly dropping out (zeroing) hidden units and input features during training of neural networks. However, repeatedly sampling a random subset of input features makes training much slower. Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective. This approximation, justified by the central limit theorem and empirical evidence, gives an order of magnitude speedup and more stability. We show how to do fast dropout training for classification, regression, and multilayer neural networks. Beyond dropout, our technique is extended to integrate out other types of noise and small image transformations.

...read moreread less

Journal Article•DOI•

Mass Spectrometry: Recent Advances in Direct Open Air Surface Sampling/Ionization

[...]

María Eugenia Monge¹, Glenn A. Harris², Prabha Dwivedi¹, Facundo M. Fernández¹•Institutions (2)

Georgia Institute of Technology¹, Vanderbilt University²

09 Jan 2013-Chemical Reviews

TL;DR: This review focuses on laser-Based Hybrid Techniques Coupled to ESI or Plasma Ionization, as well as two-Step Thermal/Mechanical Desorption/ Ablation (Non-Laser) Techniques, which were developed in this review.

...read moreread less

Abstract: 1. Scope of this Review 2270 2. Ambient Ionization Techniques 2272 2.1. Solid−Liquid Extraction-Based Techniques 2272 2.1.1. Desorption Electrospray Ionization (DESI) 2272 2.1.2. Desorption Ionization by Charge Exchange (DICE) 2277 2.1.3. Easy Ambient Sonic-Spray Ionization (EASI) 2278 2.1.4. Liquid Micro Junction Surface Sampling Probe (LMJ-SSP) 2279 2.1.5. Liquid Extraction Surface Analysis (LESA) 2279 2.1.6. Nanospray Desorption Electrospray Ionization (nanoDESI) 2280 2.1.7. Desorption Atmospheric Pressure Photoionization (DAPPI) 2280 2.2. Plasma-Based Techniques 2281 2.2.1. Direct Analysis in Real Time (DART) 2282 2.2.2. Flowing Atmospheric-Pressure Afterglow (FAPA) 2286 2.2.3. Low Temperature Plasma (LTP) & Dielectric Barrier Discharge Ionization (DBDI) 2286 2.2.4. Chemical Sputtering/Ionization Techniques 2287 2.3. Two-Step Thermal/Mechanical Desorption/ Ablation (Non-Laser) Techniques 2288 2.3.1. Neutral Desorption Extractive Electrospray Ionization (ND-EESI) 2288 2.3.2. Beta Electron-Assisted Direct Chemical Ionization (BADCI) 2288 2.3.3. Atmospheric Pressure Thermal Desorption-Secondary Ionization (AP-TD/SI) 2289 2.3.4. Probe Electrospray Ionization (PESI) 2289 2.4. Two-Step Laser-Based Desorption Ablation Techniques 2290 2.4.1. Laser-Based Hybrid Techniques Coupled to ESI or Plasma Ionization 2290 2.4.2. Laser Electrospray Mass Spectrometry (LEMS) 2292 2.4.3. Laser Ablation Atmospheric Pressure Photoionization (LAAPPI) 2293 2.4.4. Laser Ablation Sample Transfer 2293 2.5. Acoustic Desorption Techniques 2294 2.5.1. Laser-Induced Acoustic Desorption (LIAD) 2294 2.5.2. Radiofrequency Acoustic Desorption Ionization (RADIO) 2295 2.5.3. Surface Acoustic Wave-Based Techniques 2295 2.6. Multimode Techniques 2296 2.6.1. Desorption Electrospray/Metastable-Induced Ionization (DEMI) 2296 2.7. Other Techniques 2296 2.7.1. Rapid Evaporative Ionization Mass Spectrometry (REIMS) 2296 2.7.2. Laser Desorption Ionization (LDI) 2297 2.7.3. Switched Ferroelectric Plasma Ionizer (SwiFerr) 2297 2.7.4. Laserspray Ionization (LSI) 2297 3. Remote Sampling 2298 3.1. Nonproximate Ambient MS 2298 3.2. Fundamentals of Neutral/Ion Transport 2298 3.3. Transport of Neutrals 2298 3.4. Transport of Ions 2299 4. Future Directions 2300 Author Information 2300 Corresponding Author 2300 Author Contributions 2300 Notes 2300 Biographies 2300 Acknowledgments 2301 References 2301

...read moreread less

Journal Article•DOI•

Sampling in Developmental Science: Situations, Shortcomings, Solutions, and Standards

[...]

Marc H. Bornstein¹, Justin Jager¹, Diane L. Putnick¹•Institutions (1)

National Institutes of Health¹

01 Dec 2013-Developmental Review

TL;DR: This work describes, discusses, and evaluates four prominent sampling strategies in developmental science: population-based probability sampling, convenience sampling, quota sampling, and homogeneous sampling.

...read moreread less

Journal Article•DOI•

Rhythmic sampling within and between objects despite sustained attention at a cued location.

[...]

Ian C. Fiebelkorn¹, Yuri B. Saalmann¹, Sabine Kastner¹•Institutions (1)

Princeton University¹

16 Dec 2013-Current Biology

TL;DR: The data demonstrate that even under static conditions, there is a moment-to-moment reweighting of attentional priorities based on object properties, revealed through rhythmic patterns of visual-target detection both within (at 8 Hz) and between (at 4 Hz).

...read moreread less

Journal Article•DOI•

Evaluation of uncertainties associated with the determination of community drug use through the measurement of sewage drug biomarkers.

[...]

Sara Castiglioni¹, Lubertus Bijlsma², Adrian Covaci³, Erik Emke, Félix Hernández², Malcolm J. Reid⁴, Christoph Ort⁵, Kevin V. Thomas⁴, Alexander L.N. van Nuijs³, Pim de Voogt⁶, Ettore Zuccato¹ - Show less +7 more•Institutions (6)

Mario Negri Institute for Pharmacological Research¹, James I University², University of Antwerp³, Norwegian Institute for Water Research⁴, Swiss Federal Institute of Aquatic Science and Technology⁵, University of Amsterdam⁶

11 Jan 2013-Environmental Science & Technology

TL;DR: For each step of the estimate of community drug consumption through the chemical analysis of sewage biomarkers of illicit drugs, a best practice protocol has been suggested and discussed to reduce and keep to a minimum the uncertainty of the entire procedure and to improve the reliability of the estimates of drug use.

...read moreread less

Abstract: The aim of this study was to integrally address the uncertainty associated with all the steps used to estimate community drug consumption through the chemical analysis of sewage biomarkers of illicit drugs. Uncertainty has been evaluated for sampling, chemical analysis, stability of drug biomarkers in sewage, back-calculation of drug use (specific case of cocaine), and estimation of population size in a catchment using data collected from a recent Europe-wide investigation and from the available literature. The quality of sampling protocols and analytical measurements has been evaluated by analyzing standardized questionnaires collected from 19 sewage treatments plants (STPs) and the results of an interlaboratory study (ILS), respectively. Extensive reviews of the available literature have been used to evaluate stability of drug biomarkers in sewage and the uncertainty related to back-calculation of cocaine use. Different methods for estimating population size in a catchment have been compared and the variability among the collected data was very high (7-55%). A reasonable strategy to reduce uncertainty was therefore to choose the most reliable estimation case by case. In the other cases, the highest uncertainties are related to the analysis of sewage drug biomarkers (uncertainty as relative standard deviation; RSD: 6-26% from ILS) and to the back-calculation of cocaine use (uncertainty; RSD: 26%). Uncertainty can be kept below 10% in the remaining steps, if specific requirements outlined in this work are considered. For each step, a best practice protocol has been suggested and discussed to reduce and keep to a minimum the uncertainty of the entire procedure and to improve the reliability of the estimates of drug use.

...read moreread less

Proceedings Article•

Revisiting the Nystrom method for improved large-scale machine learning

[...]

Alex Gittens¹, Michael W. Mahoney²•Institutions (2)

California Institute of Technology¹, Stanford University²

16 Jun 2013

TL;DR: An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

...read moreread less

Abstract: We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds--e.g., improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error.

...read moreread less

Journal Article•DOI•

Enhanced sampling in molecular dynamics using metadynamics, replica-exchange, and temperature-acceleration

[...]

Cameron F. Abrams¹, Giovanni Bussi¹•Institutions (1)

Drexel University¹

27 Dec 2013-Entropy

TL;DR: In this article, a selection of methods for performing enhanced sampling in molecular dynamics simulations is presented, based on collective variable biasing (CVB) and collective variable sampling (CVB).

...read moreread less

Abstract: We review a selection of methods for performing enhanced sampling in 1 molecular dynamics simulations. We consider methods based on collective variable biasing 2

...read moreread less

Posted Content•

Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks

[...]

Robert Fung, Kuo-Chu Chang

27 Mar 2013-arXiv: Artificial Intelligence

TL;DR: In this paper, the authors propose evidence weighting, which modifies the logic sampling algorithm by weighting each simulation trial by the likelihood of a network's evidence given the sampled state node values for that trial.

...read moreread less

Abstract: Stochastic simulation approaches perform probabilistic inference in Bayesian networks by estimating the probability of an event based on the frequency that the event occurs in a set of simulation trials. This paper describes the evidence weighting mechanism, for augmenting the logic sampling stochastic simulation algorithm [Henrion, 1986]. Evidence weighting modifies the logic sampling algorithm by weighting each simulation trial by the likelihood of a network's evidence given the sampled state node values for that trial. We also describe an enhancement to the basic algorithm which uses the evidential integration technique [Chin and Cooper, 1987]. A comparison of the basic evidence weighting mechanism with the Markov blanket algorithm [Pearl, 1987], the logic sampling algorithm, and the evidence integration algorithm is presented. The comparison is aided by analyzing the performance of the algorithms in a simple example network.

...read moreread less

Posted Content•

k-Sparse Autoencoders

[...]

Alireza Makhzani¹, Brendan J. Frey¹•Institutions (1)

University of Toronto¹

19 Dec 2013-arXiv: Learning

TL;DR: In this article, an autoencoder with linear activation function is proposed, where in hidden layers only the k highest activities are kept, which achieves better classification results than denoising autoencoders, networks trained with dropout, and RBMs.

...read moreread less

Abstract: Recently, it has been observed that when representations are learnt in a way that encourages sparsity, improved performance is obtained on classification tasks. These methods involve combinations of activation functions, sampling steps and different kinds of penalties. To investigate the effectiveness of sparsity by itself, we propose the k-sparse autoencoder, which is an autoencoder with linear activation function, where in hidden layers only the k highest activities are kept. When applied to the MNIST and NORB datasets, we find that this method achieves better classification results than denoising autoencoders, networks trained with dropout, and RBMs. k-sparse autoencoders are simple to train and the encoding stage is very fast, making them well-suited to large problem sizes, where conventional sparse coding algorithms cannot be applied.

...read moreread less

Journal Article•DOI•

A best practices guide for generating forest inventory attributes from airborne laser scanning data using an area-based approach

[...]

Joanne C. White, Michael A. Wulder, Andrés Varhola, Mikko Vastaranta, Nicholas C. Coops, Bruce D. Cook, Doug Pitt, Murray Woods - Show less +4 more

20 Dec 2013-Forestry Chronicle

TL;DR: A best practices guide for the use of airborne laser scanning data (ALS; also referred to as Light Detection and Ranging or LiDAR) in forest inventory applications is now available for download from the Canadian Forest Service bookstore.

...read moreread less

Abstract: A best practices guide for the use of airborne laser scanning data (ALS; also referred to as Light Detection and Ranging or LiDAR) in forest inventory applications is now available for download from the Canadian Forest Service bookstore (White et al ., 2013; http://cfs.nrcan.gc.ca/publications?id= 34887 ). The guide, produced by the Canadian Forest Service, Natural Resources Canada, brings together state-of-the-art approaches, methods, and data to enable readers interested in using ALS data to characterize large forest areas in a costeffective manner. The best practices presented in the guide are based on more than 25 years of scientific research on the application of ALS data to forest inventory. The guide describes the entire process for generating forest inventory attributes from ALS data and recommends best practices for each step of the process—from ground sampling through to metric generation and model development. The collection of ground plot data for model calibration and validation is a crit ical component of the recommended approach and is described in detail in the guide. Appendices to the guide pro vide additional details on ALS data acquisition and metric generation. The area-based approach is typically accomplished in two steps (Fig. 1). In the first step, ALS data are acquired for the entire area of interest (wall-to-wall coverage), tree-level meas ures are acquired from sampled ground plots and summa rized to the plot level, and predictive models are developed (e.g., using regression or non-parametric methods). For the purposes of model development, the ALS data is clipped to correspond to the area and shape of each ground plot. A set of descriptive statistics (referred to as “metrics”) are calculated from the clipped ALS data and include measures such as mean height, height percentiles, and canopy cover (Woods et al . 2011). Inventory attributes of interest are either measured by ground crews (i.e., height, diameter) or modelled (i.e., vol ume, biomass) for each ground plot. It is critical that ground plots represent the full range of variability in the attribute(s) of interest and to accomplish this, the use of a stratified sampling approach is recommended, preferably with strata that are defined using the ALS metrics themselves. Thus, the ALS data must be acquired and processed prior to ground sampling. Finally, predictive models are constructed using the ground plot attributes as the response variable and the ALS-derived metrics as predictors. In the second step of the area-based approach, models that were developed using co-located ground plots and ALS data are then applied to the entire area of interest to generate the desired wall-to-wall estimates and maps of specific forest inventory attributes. The same metrics that are calculated for the clipped ALS data (as described above) are generated for the wall-to-wall ALS data and the predictive equations devel oped from the modelling in the first step are applied to the entire area of interest using the wall-to-wall metrics. The pre diction unit for this application is a grid cell, the size of which relates to the size of the ground-measured plot. Once the pre dictive equations are applied to the wall-to-wall ALS data, each grid cell will have an estimate for the attribute of interest. The primary advantage of the area-based approach is hav ing complete (i.e., wall-to-wall) spatially explicit measures of canopy height, associated metrics, and all modelled attributes for an area of interest (Fig. 2). The area-based approach described in the guide also enables more precise estimates of certain forest variables and the calculation of confidence intervals for stand-level estimates (Woods et al . 2011).

...read moreread less

Journal Article•DOI•

Spatially explicit models for inference about density in unmarked or partially marked populations

[...]

Richard B. Chandler, J. Andrew Royle

01 Jun 2013-The Annals of Applied Statistics

TL;DR: In this article, a spatial capture-recapture (SCR) model is proposed to estimate the number and location of the activity centers of a parula in a collection of closely spaced sample units.

...read moreread less

Abstract: Recently developed spatial capture–recapture (SCR) models represent a major advance over traditional capture–recapture (CR) models because they yield explicit estimates of animal density instead of population size within an unknown area. Furthermore, unlike nonspatial CR methods, SCR models account for heterogeneity in capture probability arising from the juxtaposition of animal activity centers and sample locations. Although the utility of SCR methods is gaining recognition, the requirement that all individuals can be uniquely identified excludes their use in many contexts. In this paper, we develop models for situations in which individual recognition is not possible, thereby allowing SCR concepts to be applied in studies of unmarked or partially marked populations. The data required for our model are spatially referenced counts made on one or more sample occasions at a collection of closely spaced sample units such that individuals can be encountered at multiple locations. Our approach includes a spatial point process for the animal activity centers and uses the spatial correlation in counts as information about the number and location of the activity centers. Camera-traps, hair snares, track plates, sound recordings, and even point counts can yield spatially correlated count data, and thus our model is widely applicable. A simulation study demonstrated that while the posterior mean exhibits frequentist bias on the order of 5–10% in small samples, the posterior mode is an accurate point estimator as long as adequate spatial correlation is present. Marking a subset of the population substantially increases posterior precision and is recommended whenever possible. We applied our model to avian point count data collected on an unmarked population of the northern parula (Parula americana) and obtained a density estimate (posterior mode) of 0.38 (95% CI: 0.19–1.64) birds/ha. Our paper challenges sampling and analytical conventions in ecology by demonstrating that neither spatial independence nor individual recognition is needed to estimate population density—rather, spatial dependence can be informative about individual distribution and density.

...read moreread less

Journal Article•DOI•

Network Sampling: From Static to Streaming Graphs

[...]

Nesreen K. Ahmed¹, Jennifer Neville¹, Ramana Rao Kompella¹•Institutions (1)

Purdue University¹

01 Jun 2013-ACM Transactions on Knowledge Discovery From Data

TL;DR: A family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs.

...read moreread less

Abstract: Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Experimental results indicate that our proposed family of sampling methods more accurately preserve the underlying properties of the graph in both static and streaming domains. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms.

...read moreread less

Journal Article•DOI•

Sampled-Data Exponential Synchronization of Complex Dynamical Networks With Time-Varying Coupling Delay

[...]

Zheng-Guang Wu¹, Peng Shi², Hongye Su¹, Jian Chu¹•Institutions (2)

Zhejiang University¹, Victoria University, Australia²

08 Apr 2013-IEEE Transactions on Neural Networks

TL;DR: A criterion is derived to ensure the exponential stability of the error dynamics, which fully utilizes the available information about the actual sampling pattern, and the design method of the desired sampled-data controllers is proposed to make the CDNs exponentially synchronized and obtain a lower-bound estimation of the largest sampling interval.

...read moreread less

Abstract: This paper studies the problem of sampled-data exponential synchronization of complex dynamical networks (CDNs) with time-varying coupling delay and uncertain sampling. By combining the time-dependent Lyapunov functional approach and convex combination technique, a criterion is derived to ensure the exponential stability of the error dynamics, which fully utilizes the available information about the actual sampling pattern. Based on the derived condition, the design method of the desired sampled-data controllers is proposed to make the CDNs exponentially synchronized and obtain a lower-bound estimation of the largest sampling interval. Simulation examples demonstrate that the presented method can significantly reduce the conservatism of the existing results, and lead to wider applications.

...read moreread less

Journal Article•DOI•

Sampling environmental acoustic recordings to determine bird species richness.

[...]

Jason Wimmer¹, Michael Towsey¹, Paul Roe¹, Ian Williamson¹•Institutions (1)

Queensland University of Technology¹

01 Sep 2013-Ecological Applications

TL;DR: In this article, the authors examined the use of sampling methods to reduce the cost of analyzing large volumes of acoustic sensor data, while retaining high levels of species detection accuracy, and found that randomly selecting 120 one-minute samples from the three hours immediately following dawn over five days of recordings, detected the highest number of species.

...read moreread less

Abstract: Acoustic sensors can be used to estimate species richness for vocal species such as birds. They can continuously and passively record large volumes of data over extended periods. These data must subsequently be analyzed to detect the presence of vocal species. Automated analysis of acoustic data for large numbers of species is complex and can be subject to high levels of false positive and false negative results. Manual analysis by experienced surveyors can produce accurate results; however the time and effort required to process even small volumes of data can make manual analysis prohibitive. This study examined the use of sampling methods to reduce the cost of analyzing large volumes of acoustic sensor data, while retaining high levels of species detection accuracy. Utilizing five days of manually analyzed acoustic sensor data from four sites, we examined a range of sampling frequencies and methods including random, stratified, and biologically informed. We found that randomly selecting 120 one-minute samples from the three hours immediately following dawn over five days of recordings, detected the highest number of species. On average, this method detected 62% of total species from 120 one-minute samples, compared to 34% of total species detected from traditional area search methods. Our results demonstrate that targeted sampling methods can provide an effective means for analyzing large volumes of acoustic sensor data efficiently and accurately. Development of automated and semi-automated techniques is required to assist in analyzing large volumes of acoustic sensor data.

...read moreread less

Proceedings Article•DOI•

Improving Image Matting Using Comprehensive Sampling Sets

[...]

Ehsan Shahrian¹, Deepu Rajan¹, Brian Price², Scott Cohen²•Institutions (2)

Nanyang Technological University¹, Adobe Systems²

23 Jun 2013

TL;DR: A new image matting algorithm is presented that achieves state-of-the-art performance on a benchmark dataset of images by solving two major problems encountered by current sampling based algorithms.

...read moreread less

Abstract: In this paper, we present a new image matting algorithm that achieves state-of-the-art performance on a benchmark dataset of images. This is achieved by solving two major problems encountered by current sampling based algorithms. The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present. Here, we describe a method by which a more comprehensive and representative set of samples is collected so as not to miss out on the true samples. This is accomplished by expanding the sampling range for pixels farther from the foreground or background boundary and ensuring that samples from each color distribution are included. The second problem is the overlap in color distributions of foreground and background regions. This causes sampling based methods to fail to pick the correct samples for foreground and background. Our design of an objective function forces those foreground and background samples to be picked that are generated from well-separated distributions. Comparison on the dataset at and evaluation by www.alphamatting.com shows that the proposed method ranks first in terms of error measures used in the website.

...read moreread less

Journal Article•DOI•

Counting and sampling triangles from a graph stream

[...]

Aduri Pavan¹, Kanat Tangwongsan², Srikanta Tirthapura¹, Kun-Lung Wu²•Institutions (2)

Iowa State University¹, IBM²

01 Sep 2013

TL;DR: This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream.

...read moreread less

Abstract: This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream. Compared to prior work, our algorithm yields significant improvements in the space and time complexity for these fundamental problems. Our algorithm is simple to implement and has very good practical performance on large graphs.

...read moreread less

Journal Article•DOI•

Sampled-Data Synchronization of Chaotic Lur'e Systems With Time Delays

[...]

Zheng-Guang Wu¹, Peng Shi², Hongye Su¹, Jian Chu¹•Institutions (2)

Zhejiang University¹, Victoria University, Australia²

09 Jan 2013-IEEE Transactions on Neural Networks

TL;DR: This paper studies the problem of sampled-data control for master-slave synchronization schemes that consist of identical chaotic Lur'e systems with time delays with a novel Lyapunov functional, which is positive definite at sampling times but not necessarily positive definite inside the sampling intervals.

...read moreread less

Abstract: This paper studies the problem of sampled-data control for master-slave synchronization schemes that consist of identical chaotic Lur'e systems with time delays. It is assumed that the sampling periods are arbitrarily varying but bounded. In order to take full advantage of the available information about the actual sampling pattern, a novel Lyapunov functional is proposed, which is positive definite at sampling times but not necessarily positive definite inside the sampling intervals. Based on the Lyapunov functional, an exponential synchronization criterion is derived by analyzing the corresponding synchronization error systems. The desired sampled-data controller is designed by a linear matrix inequality approach. The effectiveness and reduced conservatism of the developed results are demonstrated by the numerical simulations of Chua's circuit and neural network.

...read moreread less

Journal Article•DOI•

An AUC-based permutation variable importance measure for random forests

[...]

Silke Janitza¹, Carolin Strobl², Anne-Laure Boulesteix¹•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Zurich²

05 Apr 2013-BMC Bioinformatics

TL;DR: The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance, while the performance of both VIMs is very similar in the case of balanced classes.

...read moreread less

Abstract: The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html

...read moreread less

Collapse