scispace - formally typeset
Search or ask a question

Showing papers in "International Statistical Review in 2009"




Journal ArticleDOI
TL;DR: A unified treatment of ICC as used in CRTs is provided, presenting a general definition of the ICC that may be expressed in different ways depending on the modelling approach used to describe the data, and illustrating how this general definition is applied to continuous and dichotomous outcomes.
Abstract: Summary The intra-cluster correlation coefficient (ICC) of the primary outcome plays a key role in the design and analysis of cluster randomized trials (CRTs), but the precise definition of this parameter is somewhat elusive, especially in the context of non-normally distributed outcomes. In this paper, we provide a unified treatment of ICC as used in CRTs. We present a general definition of the ICC that may be expressed in different ways depending on the modelling approach used to describe the data, illustrating how this general definition is applied to continuous and dichotomous outcomes. Greater complexity arises for dichotomous outcomes; in particular, the usual definition of the ICC cannot be related directly to the parameters of the logistic-normal model that is commonly used for dichotomous outcomes. We show how the definition of the ICC is different when covariates are introduced. Finally, we use our framework and definition of the ICC to draw out implications for those interpreting and choosing values of the ICC when planning CRTs. The intra-cluster correlation coefficient (ICC) of the primary outcome plays a key role in the design and analysis of cluster randomized trials (CRTs), in which clusters such as health care organizations, school classes, or geographic areas are randomized to trial arms, and outcomes are measured on individuals within those clusters. The ICC, usually denoted p, quantifies the degree of similarity in the responses of individuals from the same cluster for the outcome. In the design of CRTs, an estimate of is commonly used to calculate the variance inflation factor (Dormer & Klar, 2000), also known as the design effect (Kish, 1965), which is then used to adjust the sample size that would be required for a trial in which individuals are randomized rather than clusters.

152 citations


Journal ArticleDOI
TL;DR: Existing methods are reviewed and the use of penalized regression techniques is proposed, based on dummy coding, which imposes a difference penalty and a ridge type refitting procedure on ordered categorial predictors.
Abstract: Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation.

71 citations


Journal ArticleDOI
David Cox1
TL;DR: In this paper, a general review of the role of randomization in experimental design is given, with three objectives distinguished, the avoidance of bias, the establishment of a secure base for the estimation of error in traditional designs, and the provision of formally exact tests of significance and confidence limits.
Abstract: Summary A general review is given of the role of randomization in experimental design. Three objectives are distinguished, the avoidance of bias, the establishment of a secure base for the estimation of error in traditional designs, and the provision of formally exact tests of significance and confidence limits. The approximate randomization theory associated with analysis of covariance is outlined and conditionality considerations are used to explain the limited role of randomization in experiments with very small numbers of experimental units. The relation between the so-called design-based and model-based analyses is discussed. Corresponding results in sampling theory are mentioned briefly. Resume On passe en revue le role du traitement aleatoire dans la conception d'experiences. On distingue trois objectifs, la prevention de biais, la constitution d'une base solide pour l'estimation d'erreur dans les conceptions traditionnelles et la fourniture de tests formellement exacts de signification et de limites de confiance. La theorie du traitement aleatoire approximatif associea l'analyse de covariance est presentee et des considerations de conditionnalite sont utilisees pour expliquer le role limite du traitement aleatoire dans les experiences avec de tros petits nombres d'unites experimentales. La relation entre les analyses dites a base de conception et a base de modole est discutee. Les resultats correspondants dans la theorie des sondages sont briovement mentionnes.

64 citations




Journal ArticleDOI
TL;DR: List of important publications in statistics Wikipediahttps://en.m.wikipedia.org/wiki/List of_important_publications_in Importance: Greatly extended the scope of applied Bayesian statistics by using conjugate priors for exponential families.
Abstract: List of important publications in statistics Wikipediahttps://en.m.wikipedia.org/wiki/List_of_important_publications_in Importance: Greatly extended the scope of applied Bayesian statistics by using conjugate priors for exponential families. Extensive treatment of sequential decision making, for example mining decisions. For many years, it was required for all doctoral students at Harvard Business School. Multivariate ...

47 citations


Journal ArticleDOI
Paul Kabaila1
TL;DR: It is considered the important case that the inference of interest is a confidence region, and the literature in which the aim is to utilize uncertain prior information directly in the construction of confidence regions, without requiring the intermediate step of a preliminary statistical model selection.
Abstract: Summary It is very common in applied frequentist (“classical”) statistics to carry out a preliminary statistical (i.e. data-based) model selection by, for example, using preliminary hypothesis tests or minimizing AIC. This is usually followed by the inference of interest, using the same data, based on the assumption that the selected model had been given to us a priori. This assumption is false and it can lead to an inaccurate and misleading inference. We consider the important case that the inference of interest is a confidence region. We review the literature that shows that the resulting confidence regions typically have very poor coverage properties. We also briefly review the closely related literature that describes the coverage properties of prediction intervals after preliminary statistical model selection. A possible motivation for preliminary statistical model selection is a wish to utilize uncertain prior information in the inference of interest. We review the literature in which the aim is to utilize uncertain prior information directly in the construction of confidence regions, without requiring the intermediate step of a preliminary statistical model selection. We also point out this aim as a future direction for research. Resume En statistiques appliquees de l'approche frequentiste (“classique”), il est courant de proceder a une selection preliminaire du modele statistique (c'est-a-dire basee sur des donnees) en utilisant, par exemple, des tests preliminaires fondes sur des hypotheses ou en minimisant AIC. Ceci est generalement suivi par l'inference d'interet, ou les memes donnees sont utilisees, et qui suppose que le modele choisi nous avait ete donnea priori. Cette supposition est erronee et peut entrainer une inference inexacte et trompeuse. Nous examinons un cas primordial ou l'inference d'interet constitue une region de confiance. Nous etudions la documentation qui indique que les regions de confiance qui en resultent ont en principe des proprietes d'application reduites. Nous examinons egalement de maniere succincte les ecrits en etroite relation qui decrivent les proprietes d'application des intervalles de prediction apres la selection preliminaire du modele statistique. Il est possible que la motivation sous-tendant la selection preliminaire du modele statistique represente un desir d'utilizer des renseignements prealables incertains dans l'inference d'interet. Nous etudions la documentation ou l'objectif est d'utilizer des renseignements prealables incertains directement dans l'elaboration de regions de confiance, sans exiger de recourir a l'etape intermediaire de selection preliminaire du modele statistique. Nous precisons egalement que cet objectif constitue un axe de recherche future.

47 citations


Journal ArticleDOI
TL;DR: In this article, the theoretical development of kurtosis is surveyed from a historical perspective of Pearson's work on evolution and it surprisingly emerges that there was no emphasis in Pearson's papers on kurtotic index as measuring (in part) tail heaviness.
Abstract: Summary Although the kurtosis index proposed by Karl Pearson in 1905 is introduced in statistical textbooks at all levels, the measure is not easily interpreted and has been a subject of considerable debate. In this study, the theoretical development of kurtosis is surveyed from a historical perspective of Pearson's work on evolution. It surprisingly emerges that there was no emphasis in Pearson's papers on kurtosis as measuring (in part) tail heaviness. However, it is found that Pearson used to frequently adjust the formalisation of kurtosis depending on his changing needs. This complex development partly explains the confusion that would surround kurtosis in subsequent literature. Our conclusion is that most misunderstandings arise from improper use of the kurtosis coefficient outside the Pearson system of frequency curves.

44 citations



Journal ArticleDOI
Jerome P. Reiter1
TL;DR: Several approaches based on multiple imputation for disclosure limitation, usually called synthetic data, are proposed that could be used to facilitate data integration and dissemination while protecting data confidentiality.
Abstract: Summary In data integration contexts, two statistical agencies seek to merge their separate databases into one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies' need to protect the confidentiality of database subjects, which could be at risk during the integration or dissemination stage. This article proposes several approaches based on multiple imputation for disclosure limitation, usually called synthetic data, that could be used to facilitate data integration and dissemination while protecting data confidentiality. It reviews existing methods for obtaining inferences from synthetic data and points out where new methods are needed to implement the data integration proposals. Resume Dans les contextes d'integration de donnees, deux agences statistiques cherchent a fusionner leurs bases de donnees separees en un fichier. Les agences peuvent aussi chercher a diffuser au public les donnees issues du fichier integre. Ces objectifs peuvent etre compliques par le besoin de proteger la confidentialite des objets de la base de donnees, qui pourrait etre menace pendant la phase d'integration et de diffusion. Cet article propose plusieurs approches basees sur l'imputation multiple pour limiter la divulgation, qu'on appelle habituellement donnees synthetiques, qui pourraient etre utilisees pour faciliter l'integration et la diffusion des donnees tout en protegeant leur confidentialite. Il passe en revue les methodes existantes pour obtenir des inferences a partir de donnees synthetiques et montre les cas ou l'on a besoin de nouvelles methodes pour mettre en œuvre les propositions d'integration de donnees.

Journal ArticleDOI
TL;DR: In this paper, the difference between Bayesian and frequentist statistics in making statements about the relationship between observable values was examined and it was shown that such values are never negatively correlated, and are generally positively correlated under the models used in Bayesian statistics.
Abstract: Summary We examine the difference between Bayesian and frequentist statistics in making statements about the relationship between observable values. We show how standard models under both paradigms can be based on an assumption of exchangeability and we derive useful covariance and correlation results for values from an exchangeable sequence. We find that such values are never negatively correlated, and are generally positively correlated under the models used in Bayesian statistics. We discuss the significance of this result as well as a phenomenon which often follows from the differing methodologies and practical applications of these paradigms – a phenomenon we call Bayes' effect.


Journal ArticleDOI
TL;DR: In this paper, a general algorithm for constructing strata in a population using X, a univariate stratification variable known for all the units in the population, is presented, where the sample is allocated to the strata using a general rule that features proportional allocation, Neyman allocation, and power allocation as special cases.
Abstract: Summary This paper presents a general algorithm for constructing strata in a population using X, a univariate stratification variable known for all the units in the population. Stratum h consists of all the units with an X value in the interval {bh-u bh)- The stratum boundaries [bh] are obtained by minimizing the anticipated sample size for estimating the population total of a survey variable Y with a given level of precision. The stratification criterion allows the presence of a take-none and of a take-all stratum. The sample is allocated to the strata using a general rule that features proportional allocation, Neyman allocation, and power allocation as special cases. The optimization can take into account a stratum-specific anticipated non-response and a model for the relationship between the stratification variable X and the survey variable F. A loglinear model with stratum specific mortality for Y given X is presented in detail. Two numerical algorithms for determining the optimal stratum boundaries, attributable to Sethi and Kozak, are compared in a numerical study. Several examples illustrate the stratified designs that can be constructed with the proposed methodology. All the calculations presented in this paper were carried out with stratification, an R package that will be available on CRAN (Comprehensive R Archive Network).

Journal ArticleDOI
TL;DR: In this paper, the authors provide a critical discussion on real-time estimation of dynamic generalized linear models, and compare three estimation schemes, the first is based on conjugate analysis and linear Bayes methods, the second based on posterior mode estimation, and the third based on sequential Monte Carlo sampling methods.
Abstract: Summary The purpose of this paper is to provide a critical discussion on real-time estimation of dynamic generalized linear models. We describe and contrast three estimation schemes, the first of which is based on conjugate analysis and linear Bayes methods, the second based on posterior mode estimation, and the third based on sequential Monte Carlo sampling methods, also known as particle filters. For the first scheme, we give a summary of inference components, such as prior/posterior and forecast densities, for the most common response distributions. Considering data of arrivals of tourists in Cyprus, we illustrate the Poisson model, providing a comparative analysis of the above three schemes. Resume L'objectif de cet article est de fournir une discussion critique sur l'estimation en temps reel de modeles dynamiques lineaires generalises. Trois approches pour faire l'estimation sont decrites et comparees, la premiere etant basee sur l'analyse conjuguee et les methodes de Bayes lineaires, la deuxieme sur l'estimation posterieure de modes, et la troisieme sur des methodes Monte-Carlo d'echantillonnage sequentiel, aussi connues comme filtres particulaires. Pour la premiere approche, on donne un resume des composants d'inference, telles que les densites anterieures/posterieures et previsionnelles, pour les distributions de reponse les plus communes. A partir de donnees sur l'arrivee de touristes a Chypre, on illustre le modele de Poisson, tout en fournissant une analyse qui compare les trois approches.

Journal ArticleDOI
TL;DR: In the early 19th century, the content and practice of statistics underwent a series of transitions that led to its emergence as a highly specialised mathematical discipline as mentioned in this paper, which was in part brought about by a mathematical-statistical translation of Charles Darwin's redefinition of the biological species as something that could be viewed in terms of populations.
Abstract: Summary At the end of the nineteenth century, the content and practice of statistics underwent a series of transitions that led to its emergence as a highly specialised mathematical discipline. These intellectual and later institutional changes were, in part, brought about by a mathematical-statistical translation of Charles Darwin's redefinition of the biological species as something that could be viewed in terms of populations. Karl Pearson and W.F.R. Weldon's mathematical reconceptualisation of Darwinian biological variation and “statistical” population of species in the 1890s provided the framework within which a major paradigmatic shift occurred in statistical techniques and theory. Weldon's work on the shore crab in Naples and Plymouth from 1892 to 1895 not only brought them into the forefront of ideas of speciation and provided the impetus to Pearson's earliest statistical innovations, but it also led to Pearson shifting his professional interests from having had an established career as a mathematical physicist to developing one as a biometrician. The innovative statistical work Pearson undertook with Weldon in 1892 and later with Francis Galton in 1894 enabled him to lay the foundations of modern mathematical statistics. While Pearson's diverse publications, his establishment of four laboratories and the creation of new academic departments underscore the plurality of his work, the main focus of his life-long career was in the establishment and promulgation of his statistical methodology.

Journal ArticleDOI
TL;DR: In this article, a case study considers variations in life expectancy in 1.118 small areas (known as wards) in Eastern England over a five-year period 1999-2003.
Abstract: Summary Monitoring small area contrasts in life expectancy is important for health policy purposes but subject to difficulties under conventional life table analysis. Additionally, the implicit model underlying conventional life table analysis involves a highly parametrized fixed effect approach. An alternative strategy proposed here involves an explicit model based on random effects for both small areas and age groups. The area effects are assumed to be spatially correlated, reflecting unknown mortality risk factors that are themselves typically spatially correlated. Often mortality observations are disaggregated by demographic category as well as by age and area, e.g. by gender or ethnic group, and multivariate area and age random effects will be used to pool over such groups. A case study considers variations in life expectancy in 1 118 small areas (known as wards) in Eastern England over a five-year period 1999–2003. The case study deaths data are classified by gender, age, and area, and a bivariate model for area and age effects is therefore applied. The interrelationship between the random area effects and two major influences on small area life expectancy is demonstrated in the study, these being area socio-economic status (or deprivation) and the location of nursing and residential homes for frail elderly. Resume Le suivi des contrastes d'esperance de vie entre petites regions est important pour les politiques de sante mais l'analyse en est difficile avec les tables de vie conventionnelles. De plus le modele implicite qui sous-tend l'analyse conventionnelle des tables de vie inclut une approche d'effet fixe fortement parametree. On propose ici une strategie alternative qui comprend un modele explicite base sur des effets aleatoires pour des petites zones ainsi que des groupes d'âge. Les effets de zone sont supposes etre correles spatialement, refletant des facteurs de risque de mortalite inconnus, eux-memes correles spatialement. Les observations de mortalite sont souvent desagregees par categorie demographique de meme que par âge et region, par sexe ou groupe ethnique, et les effets aleatoires multivaries de region et d'âge seront utilises pour mettre en commun de tels groupes. Une etude de cas considere les variations d'esperance de vie dans 1118 petites zones (connues comme unites/circonscriptions) en Angleterre orientale sur une periode de cinq ans 1999–2003. Les donnees de mortalite de l'etude de cas sont classees par sexe, âge et zone, et un modele bivarie pour les effets de zone et d'âge est applique. L'interrelation entre les effets aleatoires de zone et deux influences majeures sur l'esperance de vie dans une petite zone sont demontree dans l'etude: ce sont le statut socioeconomique de la zone et la localization des soins (infirmieres) et des residences pour personnes âgees en situation precaire.




Journal ArticleDOI
TL;DR: In this paper, Bayes-Laplace et al. showed that 3/n is a limite superieure approchee de 95% for the parametre de la loi binomiale quand aucun evenement n'est observe en n essais.
Abstract: La Regle de Trois etablit que 3/n est une limite superieure approchee de 95% pour le parametre de la loi binomiale quand aucun evenement n'est observe en n essais. Cette regle est basee sur la limite unilaterale de Clopper-Pearson, mais il est montre qu'aucune des autres methodes populaires frequentistes ne s'y ramene. Elle peut etre vue comme un cas particulier d'une Regle Bayesienne de Trois, mais il est montre que, parmi les choix classiques d'a priori non-informatif, seulement les a priori de Bayes-Laplace et de Zellner s'y conforment. La Regle de Trois a aussi ete inexactement prolongee a 3 etant une limite superieure 'raisonnable' pour le nombre d'evenements dans une future experience de meme (grande) taille. En estimation Bayesienne, une telle limite devrait decouler de la distribution predictive a posteriori. Cette methode semble donner des resultats plus normaux que, bien que lorsqu'elle est basee sur l'a priori de Bayes-Laplace converge techniquement avec, la methode de limites frequentistes de prediction qui indiquent un niveau de confiance entre 87.5% et 93.75% pour cette Regle prolongee de Trois. Ces resultats amene un nouvel eclairage sur la Regle de Trois en general, suggerent une Regle prolongee de Quatre pour un nombre d'evenements, fournissent une comparaison unique des limites Bayesiennes et frequentistes, et renforcent le choix de l'a priori de Bayes-Laplace parmi les lois non-informatives concurrentes.


Journal ArticleDOI
TL;DR: The relationship between Karl Pearson and the Scandinavian statisticians was more of a competitive than a collaborative nature as mentioned in this paper, and they described the leading statisticians and stochasticists of the Scandinavian school, and related some of their work to the work of Pearson.
Abstract: The relationship between Karl Pearson and the Scandinavian statisticians was more of a competitive than a collaborative nature. We describe the leading statisticians and stochasticists of the Scandinavian school, and relate some of their work to the work of Pearson.

Journal ArticleDOI
TL;DR: The history of quality management and the role of statistics in quality management is inextricably bound to the reconstruction of Japan immediately following the Second World War, and then to developments in the United States over three decades later as discussed by the authors.
Abstract: Summary The history of Quality Management, and of the role of Statistics in Quality Management, is inextricably bound to the reconstruction of Japan immediately following the Second World War, and then to developments in the United States over three decades later. Even though these periods are, in societal history, just moments ago, yet there is profound lack of agreement about what was actually done, and who should be recognized for their contributions. This paper draws on historical materials recently made publicly available in order to clarify what actually took place between 1946 and 1950, and in particular the contribution of a remarkable engineer, Homer Sarasohn. Resume L'histoire de la Gestion de Qualite, et le role que la statistique y a joue, sont inextricablement lies a la reconstruction du Japon apres la seconde guerre mondiale, puis a des developpements aux Etats-Unis pendant les trente annees qui ont suivi. Bien que dans l'histoire des societes ces periodes viennent tout juste de s'ecouler, il y a cependant un profond desaccord sur ce qui a ete reellement fait, et sur ceux qui pourraient etre reconnus pour y avoir contribue. Cet article fait appel a du materiel historique recemment rendu disponible afin de clarifier ce qui s'est reellement passe entre 1946 et1950, et en particulier la contribution d'un ingenieur remarquable, a savoir Homer Sarasohn.


Journal ArticleDOI
TL;DR: In this paper, one-sided coverage intervals for a proportion based on a stratified simple random sample were developed using an Edgeworth expansion to speed up the asymptotics, which is equivalent to a model-free randomization-based framework when finite population correction is ignored.
Abstract: Summary Using an Edgeworth expansion to speed up the asymptotics, we develop one-sided coverage intervals for a proportion based on a stratified simple random sample. To this end, we assume the values of the population units are generated from independent random variables with a common mean within each stratum. These stratum means, in turn, may either be free to vary or are assumed to be equal. The more general assumption is equivalent to a model-free randomization-based framework when finite population correction is ignored. Unlike when an Edgeworth expansion is used to construct one-sided intervals under simple random sampling, it is necessary to estimate the variance of the estimator for the population proportion when the stratum means are allowed to differ. As a result, there may be accuracy gains from replacing the normal z-score in the Edgeworth expansion with a t-score. Resume Nous developpons des intervalles de confiance unilateraux pour une proportion, lorsqu'un echantillon aleatoire simple est tire d'une population, en utilisant un developpement en series de Edgeworth pour accelerer la convergence. Pour obtenir ces intervalles, nous supposons que les valeurs des unites de la population sont generees a partir de variables aleatoires independantes avec la meme moyenne a l'interieur de chaque strate. Ces moyennes de strate peuvent, a leur tour, soit etre libres de varier ou etre supposees constantes. L'hypothese la plus generale est equivalente a utiliser un cadre de travail base sur le plan de sondage (ou “randomization-based”), qui ne necessite donc pas d'hypotheses au sujet d'un modele, et ou l'on ignore la correction pour populations finies. Contrairement au cas dans lequel un developpement en series de Edgeworth est utilise pour construire des intervalles unilateraux sous l'echantillonnage aleatoire simple, il est necessaire de permettre aux moyennes des strates d'etre differentes les unes des autres lorsqu'on estime la variance de l'estimateur de la proportion dans la population. Par consequent, il peut y avoir des gains de precision lorsqu'on remplace le score z normal dans la serie de Edgeworth par un score t.

Journal ArticleDOI
TL;DR: In this paper, the authors show how mixture-amount designs used in industrial experiments may be used to separate amount and mixture effects in traditional and choice-based conjoint analyses (CAs).
Abstract: Summary Traditional and choice-based conjoint analyses (CAs) have used full or fractional factorial designs to generate concept profile descriptions. However, these designs confound two factors when costs are associated with attributes: first is the total cost of the concept profile, and second is the allocation of costs among the attributes. Both factors may influence consumers' preferences. So far, these issues have not been separated in the CA literature. The present paper shows how mixture–amount designs used in industrial experiments may be used to separate amount and mixture effects in traditional CA. The extension to choice-based CA using balance incomplete block (BIB) designs is also given. Resume L'analyse conjointe, classique ou basee sur les choix (CBC), a utilise des plans factoriels complets ou fractionnes pour generer des descriptions de profils de concepts. Toutefois, ces plans confondent deux facteurs lorsque les couts sont associes aux attributs: le premier est le cout total du profil de concept et le second est la repartition des couts entre les attributs. Ces deux facteurs peuvent influencer les preferences des consommateurs. Jusqu'a present, la litterature sur l'analyse conjointe n'a pas traite separement ces problemes. Cet article montre comment des plans de melange mixture–amount utilises dans des experiences en milieu industriel peuvent etre utilises pour separer des effets de type montant et des effets de type melange dans une analyse conjointe traditionnelle. Une extension a l'analyse conjointe CBC fondee sur des plans en blocs incomplets equilibres (BIB) est egalement proposee.

Journal ArticleDOI
TL;DR: The confluence of statistics and probability into mathematical statistics in the Russian Empire through the interaction, 1910-1917, of A.A. Chuprov and A. E. Slutsky in 1912 was instrumental in this confluence.
Abstract: Summary The confluence of statistics and probability into mathematical statistics in the Russian Empire through the interaction, 1910–1917, of A.A. Chuprov and A.A. Markov was influenced by the writings of the English Biometric School, especially those of Karl Pearson. The appearance of the Russian-language exposition of Pearsonian ideas by E. E. Slutsky in 1912 was instrumental in this confluence. Slutsky's predecessors in such writings (Lakhtin, Orzhentskii, and Leontovich) were variously of mathematical, political economy, and biological backgrounds. Work emanating from the interpolational nature of Pearson's system of frequency curves was continued subsequently through the work of Markov, Bernstein, Romanovsky, and Kravchuk (Krawtchouk), who laid a solid probabilistic foundation. The correlational nature in the interpolational early work of Chebyshev, and work of the English Biometric School in the guise of linear least-squares fitting exposited as the main component of Slutsky's book, was developed in population as well as sample context by Chuprov. He also championed the expectation operation in providing exact relations between sample and population moments, in direct interaction with Karl Pearson. Romanovsky emerges as the most adaptive and modern mathematical statistician.

Journal ArticleDOI
TL;DR: A measure of the individual risk of disclosure is proposed, the local outlier factor, which estimates the density around a unit and a selective masking method based on the nearest-neighbour principle and microaggregation is introduced.
Abstract: Summary For continuous key variables, a measure of the individual risk of disclosure is proposed. This risk measure, the local outlier factor, estimates the density around a unit. A selective masking method based on the nearest-neighbour principle and microaggregation is also introduced. Some results of an application to the Italian sample of the Community Innovation Survey are presented.