scispace - formally typeset
Search or ask a question
Author

Yves Tillé

Bio: Yves Tillé is an academic researcher from University of Neuchâtel. The author has contributed to research in topics: Sampling (statistics) & Population. The author has an hindex of 20, co-authored 79 publications receiving 1356 citations. Previous affiliations of Yves Tillé include Université libre de Bruxelles & École Normale Supérieure.


Papers
More filters
Journal ArticleDOI
TL;DR: The cube method as discussed by the authors selects approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables, depending on the correlations of these variables with the controlled variables, i.e., the correlation of the variables of interest with the control variables.
Abstract: A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables.

242 citations

Journal ArticleDOI
TL;DR: In this paper, a general class of sampling methods without replacement and with unequal probabilities is proposed, which consists of splitting the inclusion probability vector into several new inclusion probability vectors, one of these vectors is chosen randomly; thus, the initial problem is reduced to another sampling problem with unequal probability.
Abstract: SUMMARY A very general class of sampling methods without replacement and with unequal probabilities is proposed. It consists of splitting the inclusion probability vector into several new inclusion probability vectors. One of these vectors is chosen randomly; thus, the initial problem is reduced to another sampling problem with unequal probabilities. This splitting is then repeated on these new vectors of inclusion probabilities; at each step, the sampling problem is reduced to a simpler problem. The simplicity of this technique allows one to generate easily new sampling procedures with unequal probabilities. The splitting method also generalises well-known methods such as the Midzuno method, the elimination procedure and the Chao procedure. Next, a sufficient condition is given in order that a splitting method satisfies the Sen-Yates-Grundy condition. Finally, it is shown that the elimination procedure satisfies the Gabler sufficient condition.

136 citations

Journal ArticleDOI
TL;DR: In this article, a new spatial sampling method is proposed in order to achieve a double property of balancing, where the sample is spatially balanced or well spread so as to avoid selecting neighbouring units.
Abstract: A new spatial sampling method is proposed in order to achieve a double property of balancing. The sample is spatially balanced or well spread so as to avoid selecting neighbouring units. Moreover, the method also enables to satisfy balancing equations on auxiliary variables available on all the sampling units because the Horvitz–Thompson estimator is almost equal to the population totals for these variables. The method works with any definition of distance in a multidimensional space and supports the use of unequal inclusion probabilities. The algorithm is simple and fast. Examples show that the method succeeds in using more information than the local pivotal method, the cube method and the Generalized Random-Tessellation Stratified sampling method, and thus performs better. An estimator of the variance for this sampling design is proposed in order to lead to an inference that takes the effect of the sampling design into account. Copyright © 2012 John Wiley & Sons, Ltd.

93 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed novel resampling methods that may be directly applied to variance estimation, which consist of selecting subsamples under a completely different sampling scheme from that which generated the original sample, which is composed of several sampling designs.
Abstract: In complex designs, classical bootstrap methods result in a biased variance estimator when the sampling design is not taken into account. Resampled units are usually rescaled or weighted in order to achieve unbiasedness in the linear case. In the present article, we propose novel resampling methods that may be directly applied to variance estimation. These methods consist of selecting subsamples under a completely different sampling scheme from that which generated the original sample, which is composed of several sampling designs. In particular, a portion of the subsampled units is selected without replacement, while another is selected with replacement, thereby adjusting for the finite population setting. We show that these bootstrap estimators directly and precisely reproduce unbiased estimators of the variance in the linear case in a time-efficient manner, and eliminate the need for classical adjustment methods such as rescaling, correction factors, or artificial populations. Moreover, we show via sim...

92 citations

Journal ArticleDOI
TL;DR: In this paper, the authors derived a general approximation of variance based on a residual technique, which is useful even in the particular case of unequal probability sampling with fixed sample size, and validated this approximation with a set of numerical studies.

87 citations


Cited by
More filters
Book
21 Jun 2006
TL;DR: The second edition of Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition has been revised and re-released in this paper, with a new cover and a new introduction.
Abstract: It's been over a decade since the first edition of Measurement Error in Nonlinear Models splashed onto the scene, and research in the field has certainly not cooled in the interim. In fact, quite the opposite has occurred. As a result, Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition has been revamped and ex

1,515 citations

Journal ArticleDOI
TL;DR: In his seminal book, Shewhart (1931) makes no demand on the distribution of the characteristic to be plotted on a control chart, so how can the idea that normality is, if not required, at least highly desirable be explained?
Abstract: In his seminal book, Shewhart (1931) makes no demand on the distribution of the characteristic to be plotted on a control chart. How then can we explain the idea that normality is, if not required, at least highly desirable? I believe that it has come about through the many statistical studies of control-chart behavior. If one is to study how a control chart behaves, it is necessary to relate it to some distribution. The obvious choice is the normal distribution because of its ubiquity as a satisfactory model. This is bolstered by the existence of the Central Limit Theorem.

896 citations

Journal Article
TL;DR: Alho and Spencer as discussed by the authors published a book on statistical and mathematical demography, focusing on mature population models, the particular focus of the new author (see, e.g., Caswell 2000).
Abstract: Here are two books on a topic new to Technometrics: statistical and mathematical demography. The first author of Applied Mathematical Demography wrote the first two editions of this book alone. The second edition was published in 1985. Professor Keyfritz noted in the Preface (p. vii) that at age 90 he had no interest in doing another edition; however, the publisher encouraged him to find a coauthor. The result is an additional focus for the book in the world of biology that makes it much more relevant for the sciences. The book is now part of the publisher’s series on Statistics for Biology and Health. Much of it, of course, focuses on the many aspects of human populations. The new material focuses on mature population models, the particular focus of the new author (see, e.g., Caswell 2000). As one might expect from a book that was originally written in the 1970s, it does not include a lot of information on statistical computing. The new book by Alho and Spencer is focused on putting a better emphasis on statistics in the discipline of demography (Preface, p. vii). It is part of the publisher’s Series in Statistics. The authors are both statisticians, so the focus is on statistics as used for demographic problems. The authors are targeting human applications, so their perspective on science does not extend any further than epidemiology. The book actually strikes a good balance between statistical tools and demographic applications. The authors use the first two chapters to teach statisticians about the concepts of demography. The next four chapters are very similar to the statistics content found in introductory books on survival analysis, such as the recent book by Kleinbaum and Klein (2005), reported by Ziegel (2006). The next three chapters are focused on various aspects of forecasting demographic rates. The book concludes with chapters focusing on three areas of applications: errors in census numbers, financial applications, and small-area estimates.

710 citations

Journal ArticleDOI
TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.
Abstract: Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.

462 citations

Journal ArticleDOI
TL;DR: In this article, the authors present coupling of European-wide databases with soil organic matter physical fractionation to determine continental-scale forest and grassland topsoil carbon and nitrogen stocks and their distribution between mineral-associated and particulate organic matter pools.
Abstract: Effective land-based solutions to climate change mitigation require actions that maximize soil carbon storage without generating surplus nitrogen. Land management for carbon sequestration is most often informed by bulk soil carbon inventories, without considering the form in which carbon is stored, its capacity, persistency and nitrogen demand. Here, we present coupling of European-wide databases with soil organic matter physical fractionation to determine continental-scale forest and grassland topsoil carbon and nitrogen stocks and their distribution between mineral-associated and particulate organic matter pools. Grasslands and arbuscular mycorrhizal forests store more soil carbon in mineral-associated organic carbon, which is more persistent but has a higher nitrogen demand and saturates. Ectomycorrhizal forests store more carbon in particulate organic matter, which is more vulnerable to disturbance but has a lower nitrogen demand and can potentially accumulate indefinitely. The share of carbon between mineral-associated and particulate organic matter and the ratio between carbon and nitrogen affect soil carbon stocks and mediate the effects of other variables on soil carbon stocks. Understanding the physical distribution of organic matter in pools of mineral-associated versus particulate organic matter can inform land management for nitrogen-efficient carbon sequestration, which should be driven by the inherent soil carbon capacity and nitrogen availability in ecosystems. Land management strategies for enhancing soil carbon sequestration need to be tailored to different soil types, depending on how much organic matter is stored in pools of mineral-associated and particulate organic matter, suggests an analysis of soil organic matter across Europe.

455 citations