Showing papers in &quot;Statistical Science in 1994&quot;

Multiple-Imputation Inferences with Uncongenial Sources of Input

TL;DR: This paper informs a statistical readership about Artificial Neural Networks (ANNs), points out some of the links with statistical methodology and encourages cross-disciplinary research in the directions most likely to bear fruit, and treats various topics in more depth.

...read moreread less

Abstract: This paper informs a statistical readership about Artificial Neural Networks (ANNs), points out some of the links with statistical methodology and encourages cross-disciplinary research in the directions most likely to bear fruit. The areas of statistical interest are briefly outlined, and a series of examples indicates the flavor of ANN models. We then treat various topics in more depth. In each case, we describe the neural network architectures and training rules and provide a statistical commentary. The topics treated in this way are perceptrons (from single-unit to multilayer versions), Hopfield-type recurrent networks (including probabilistic versions strongly related to statistical physics and Gibbs distributions) and associative memory networks trained by so-called unsupervised learning rules. Perceptrons are shown to have strong associations with discriminant analysis and regression, and unsupervized networks with cluster analysis. The paper concludes with some thoughts on the future of the interface between neural networks and statistics.

...read moreread less

1,114 citations

Journal Article•DOI•

[...]

Xiao-Li Meng

Small Area Estimation: An Appraisal

TL;DR: When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputation models is to incorporate appropriate importance weights into the standard combining rules.

...read moreread less

Abstract: Conducting sample surveys, imputing incomplete observa- tions, and analyzing the resulting data are three indispensable phases of modern practice with public-use data files and with many other statistical applications. Each phase inherits different input, including the information preceding it and the intellectual assessments available, and aims to provide output that is one step closer to arriving at statistical infer- ences with scientific relevance. However, the role of the imputation phase has often been viewed as merely providing computational convenience for users of data. Although facilitating computation is very important, such a viewpoint ignores the imputer's assessments and information inaccessible to the users. This view underlies the recent controversy over the validity of multiple-imputation inference when a procedure for analyzing multi- ply imputed data sets cannot be derived from (is "uncongenial" to) the model adopted for multiple imputation. Given sensible imputations and complete-data analysis procedures, inferences from standard multiple- imputation combining rules are typically superior to, and thus different from, users' incomplete-data analyses. The latter may suffer from serious nonresponse biases because such analyses often must rely on convenient but unrealistic assumptions about the nonresponse mechanism. When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating impu- tations is to incorporate appropriate importance weights into the standard combining rules. These points are reviewed and explored by simple exam- ples and general theory, from both Bayesian and frequentist perspectives, particularly from the randomization perspective. Some convenient terms are suggested for facilitating communication among researchers from dif- ferent perspectives when evaluating multiple-imputation inferences with uncongenial sources of input.

...read moreread less

790 citations

Journal Article•DOI•

[...]

Malay Ghosh, J. N. K. Rao

Ancestral Inference in Population Genetics

TL;DR: Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes seem to have a distinct advantage over other methods in small area estimation.

...read moreread less

Abstract: Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. It is now widely recognized that direct survey estimates for small areas are likely to yield unacceptably large standard errors due to the smallness of sample sizes in the areas. This makes it necessary to "borrow strength" from related areas to find more accurate estimates for a given area or, simultaneously, for several areas. This has led to the development of alternative methods such as synthetic, sample size dependent, empirical best linear unbiased prediction, empirical Bayes and hierarchical Bayes estimation. The present article is largely an appraisal of some of these methods. The performance of these methods is also evaluated using some synthetic data resembling a business population. Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes, for most purposes, seem to have a distinct advantage over other methods.

...read moreread less

738 citations

Journal Article•DOI•

[...]

Robert C. Griffiths, Simon Tavaré

A Conversation with Sir David Cox

TL;DR: In this article, the authors discuss some aspects of estimation and inference that arise in the study of such variability, focusing in particular on the estimation of substitution rates and their use in calibrating estimates of the time since the most recent common ancestor of a sample of sequences.

...read moreread less

Abstract: Mitochondrial DNA sequence variation is now being used to study the history of our species. In this paper we discuss some aspects of estimation and inference that arise in the study of such variability, focusing in particular on the estimation of substitution rates and their use in calibrating estimates of the time since the most recent common ancestor of a sample of sequences. Observed DNA sequence variation is generated by superimposing the effects of mutation on the ancestral tree of the sequences. For data of the type studied here, this ancestral tree has to be modeled as a random process. Superimposing the effects of mutation produces complicated sampling distributions that form the basis of any statistical model for the data. Using such distributions--for example, for maximum likelihood estimation of rates--poses some difficult computational problems. We describe a Monte Carlo method, a cousin of the popular "Markov chain Monte Carlo," that has proved very useful in addressing some of these issues.

...read moreread less

426 citations

Journal Article•DOI•

[...]

Nancy Reid

Bootstrap: More than a Stab in the Dark?

TL;DR: Cox as discussed by the authors was an assistant lecturer at the University of Cambridge from 1950 to 1955, and then visited the United States for 15 months, mainly at University of North Carolina, where he was employed at the Royal Aircraft Establishment and Wool Industries Research Association in Leeds.

...read moreread less

Abstract: David Roxbee Cox was born in Birmingham on July 15, 1924. He attended Handsworth Grammar School and St. John's College, Cambridge. From 1944 to 1946 he was employed at the Royal Aircraft Establishment, and from 1946 to 1950 he was employed at the Wool Industries Research Association in Leeds. He obtained his Ph.D. from the University of Leeds in 1949. He was an assistant lecturer at the University of Cambridge from 1950 to 1955, and then visited the United States for 15 months, mainly at the University of North Carolina. From 1956 to 1966 he was Reader and then Professor of Statistics at Birkbeck College, London, and from 1966 to 1988 was Professor of Statistics at Imperial College, London. In 1988 he moved to Oxford to become the Warden of Nuffield College, a post from which he retired on July 31, 1994. He is now an Honorary Fellow of Nuffield College and a member of the Department of Statistics at the University of Oxford. In 1947 he married Joyce Drummond. They have four children and two grandchildren. Among his many honours, Sir David has received to date 10 honorary doctorates, an honorary fellowship from St. John's College, Cambridge, and honorary membership in four international academies. He has been awarded the Guy medals in Silver (1961) and Gold (1973) by the Royal Statistical Society. He was elected Fellow of the Royal Society of London in 1973 and was knighted in 1985. In 1990 he won the Kettering prize and gold medal for cancer research. He has authored or coauthored over 200 papers and 15 books. A list of his publications through 1988 is included in Hinkley, Reid and Snell (1991). From 1966 through 1991 he was the editor of Biometrika. He has supervised, encouraged and collaborated with innumerable students, postdoctoral fellows and colleagues. He has served as president of the Bernoulli Society and the Royal Statistical Society, and he is presidentelect of the International Statistical Institute. This conversation took place in Sir David's office at Nuffield College on October 26 and 27, 1993.

...read moreread less

159 citations

Journal Article•DOI•

[...]

G. Alastair Young

Sequence Comparison Significance and Poisson Approximation

TL;DR: A critical review of recent research activity on bootstrap and related procedures is given in this paper, where the authors argue that much theoretical work is not serving the immediate needs of statistical practice.

...read moreread less

Abstract: A critical review is given of recent research activity on bootstrap and related procedures. Theoretical work has shown the bootstrap approach to be a potentially powerful addition to the statistician's toolkit. We consider its impact on statistical practice and argue that, measured against the hopes raised by theoretical advances, this has been until now fairly modest. We suggest that while this state of affairs is a consequence to be expected of the sophisticated character of the bootstrap procedures required to cope reliably in many of the settings of most interest, much theoretical work is not serving the immediate needs of statistical practice. Emerging lines of research are reviewed and important future research directions suggested. In particular, we appeal for greater focussing of research activity on practicalities.

...read moreread less

158 citations

Journal Article•DOI•

[...]

Michael S. Waterman, Martin Vingron

Citation Patterns in the Journals of Statistics and Probability

TL;DR: Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance of sequence alignment scores with gaps are extended.

...read moreread less

Abstract: The Chen-Stein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring with no gaps. However, there has not been a valid method to assign statistical significance to alignment scores with gaps. In this paper we extend Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance.

...read moreread less

152 citations

Journal Article•DOI•

[...]

Stephen M. Stigler

[Neural Networks: A Review from Statistical Perspective]: Rejoinder

TL;DR: In this paper, a study of the use of citation data to investigate the role statistics journals play in communication within that field and be- tween statistics and other fields is presented.

...read moreread less

Abstract: This is a study of the use of citation data to investigate the role statistics journals play in communication within that field and be- tween statistics and other fields. The study looks at citations as import- export statistics reflecting intellectual influence. The principal findings include: there is little variability in both the number and diversity of im- ports, but great variability in both the number and diversity of exports and hence in the balance of trade; there is a tendency for influence to flow from theory to applications to a much greater extent than in the reverse direction; there is little communication between statistics and probabil- ity journals. The export scores model is introduced and employed to map a set of journals' bilateral intellectual influences onto a one-dimensional scale, and the Cox effect is identified as a phenomenon that can occur when a disciplinary paper attracts a large degree of attention from out- side its discipline.

...read moreread less

135 citations

Journal Article•DOI•

[...]

Bing Cheng, D. M. Titterington

Some Applications of Number-Theoretic Methods in Statistics

128 citations

Journal Article•DOI•

[...]

Kai-Tai Fang, Yuan Wang, Peter M. Bentler

Monte Carlo Likelihood in Genetic Mapping

TL;DR: For example, number-theoretic methods (NTM) are a class of techniques by which representative points of the uniform distribution on the unit cube of $R^s$ can be generated as mentioned in this paper.

...read moreread less

Abstract: Number-theoretic methods (NTM's) are a class of techniques by which representative points of the uniform distribution on the unit cube of $R^s$ can be generated. NTM have been widely used in numerical analysis, especially in evaluation of high-dimensional integrals. Recently, NTM's have been extended to generate representative points for many useful multivariate distributions and have been systematically applied in statistics. In this paper, we shall introduce NTM's and review their applications in statistics, such as evaluation of the expected value of a random vector, statistical inference, regression analysis, geometric probability and experimental design.

...read moreread less

95 citations

Journal Article•DOI•

[...]

Elizabeth A. Thompson

DNA Fingerprinting: A Review of the Controversy

TL;DR: In this work, multilocus segregation indicators are defined and proposed as the latent variables of choice in the case of very few individuals observed in each pedigree structure, such as occurs in homozygosity mapping and affected relative pair methods of genetic mapping.

...read moreread less

Abstract: Monte Carlo likelihood is becoming increasingly used where exact likelihood analysis is computationally infeasible. One area in which such likelihoods arise is that of genetic mapping, where, increasingly, re- searchers wish to extract additional information from limited trait data through the use of multiple genetic markers. In the genetic analysis con- text, Monte Carlo likelihood is most conveniently considered as a latent variable problem. Markov chain Monte Carlo provides a method of obtain- ing realisations of underlying latent variables simulated under a genetic model, conditional upon observed data. Hence a Monte Carlo estimate of the likelihood surface can be formed. Choice of the latent variables can be as critical as choice of sampler. In the case of very few individuals observed in each pedigree structure, such as occurs in homozygosity mapping and affected relative pair methods of genetic mapping, multilocus segregation indicators are defined and proposed as the latent variables of choice. An example of five Werner's syndrome pedigrees is given; these are a subset of the 21 pedigrees on which homozygosity mapping has recently confirmed the location of the Werner's syndrome gene on chromosome 8. However, multilocus computations on these pedigrees are impractical with standard methods of exact likelihood computation.

...read moreread less

Journal Article•DOI•

[...]

Kathryn Roeder

Can One See $\alpha$-Stable Variables and Processes?

TL;DR: The thesis in this article is that, for most cases, the tremendous genetic variability among individuals obviates concern arising from minor violations of modeling assumptions.

...read moreread less

Abstract: Forensic scientists have used genetic material (DNA) as evidence in criminal cases such as rape and murder since the middle of the last decade. The forensic scientist's interpretation of the evidence, however, has been subject to some criticism, especially when it involves statistical issues (including relevant areas of population genetics in the realm of statistics). These issues include the appropriate method of summarizing data subject to measurement error, independence of events in a DNA pattern or profile; characterization of heterogeneity of populations; appropriate sampling methods to develop reference databases; and probabilistic evaluation of evidence under uncertainty of appropriate reference database. I review these issues, with the goal of making them accessible to the statistical community. My thesis in this article is that, for most cases, the tremendous genetic variability among individuals obviates concern arising from minor violations of modeling assumptions.

...read moreread less

Journal Article•DOI•

[...]

Aleksander Janicki, Aleksander Weron

Equidistant Letter Sequences in the Book of Genesis

TL;DR: It turns out that with the use of suitable statistical estimation techniques, computer simulation procedures and numerical discretization methods it is possible to construct approximations of stochastic integrals with stable measures as integrators, and an effective, general method giving approximate solutions for a wide class of stoChastic differential equations involving such integrals is obtained.

...read moreread less

Abstract: In this paper, we demonstrate some properties of $\alpha$-stable (stable) random variables and processes. It turns out that with the use of suitable statistical estimation techniques, computer simulation procedures and numerical discretization methods it is possible to construct approximations of stochastic integrals with stable measures as integrators. As a consequence we obtain an effective, general method giving approximate solutions for a wide class of stochastic differential equations involving such integrals. Application of computer graphics provides interesting quantitative and visual information on those features of stable variates which distinguish them from their commonly used Gaussian counterparts. It is possible to demonstrate evolution in time of densities with heavy tails of appropriate processes, to visualize the effect of jumps of trajectories, etc. We try to demonstrate that stable variates can be very useful in stochastic modeling of problems of different kinds, arising in science and engineering, which often provide better description of real life phenomena than their Gaussian counterparts.

...read moreread less

Journal Article•DOI•

[...]

Doron Witztum, Eliyahu Rips, Yoav Rosenberg

The 1991 Census Adjustment: Undercount or Bad Data?

TL;DR: This paper found that when the Book of Genesis is written as two-dimensional arrays, equidistant letter sequences spelling words with related meanings often appear in close proximity and showed that the effect is significant at the level of 0.00002.

...read moreread less

Abstract: It has been noted that when the Book of Genesis is written as two-dimensional arrays, equidistant letter sequences spelling words with related meanings often appear in close proximity. Quantitative tools for measuring this phenomenon are developed. Randomization analysis shows that the effect is significant at the level of 0.00002.

...read moreread less

Journal Article•DOI•

[...]

Leo Breiman

[Neural Networks: A Review from Statistical Perspective]: Comment

TL;DR: Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau are used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates.

...read moreread less

Abstract: The question of whether to adjust the 1990 [US] census using a capture-recapture model has been hotly argued in statistical journals and courtrooms Most of the arguments to date concern methodological issues rather than data quality Following the Post Enumeration Survey which was designed to provide the basic data for adjustment the Census Bureau carried out various evaluation studies to try to determine the accuracy of the adjusted counts as compared to the census counts This resulted in the P-project reports which totaled over a thousand pages of evaluation descriptions and tables Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau is used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates Comments and rejoinders on this and related papers are included (pp 508-37) (EXCERPT)

...read moreread less

Journal Article•DOI•

[...]

Leo Breiman

A Lewis Carroll Pillow Problem: Probability of an Obtuse Triangle

Journal Article•DOI•

[...]

Stephen Portnoy

Heterogeneity and Census Adjustment for the Intercensal Base

TL;DR: In this paper, an alternative solution to the problem of finding the vertices of an obtuse-angled triangle has been proposed, which seems rather natural and should be especially appealing to statisticians and suggests a method for using transformation groups to give meaning to the phrase "at random" in general situations.

...read moreread less

Abstract: On the 100th anniversary (1993) of Lewis Carroll's Pillow Problems, Eugene Seneta presented a selection of the problems the author, Charles Dodgson, claims to have solved while in bed. The se- lection omits the one problem in continuous probability: "Three points are taken at random on an infinite plane. Find the chance of their being the vertices of an obtuse-angled triangle." Charles Dodgson presents a solution that involves a clear error in conditioning. An alternative solu- tion is suggested here. This solution seems rather natural and should be especially appealing to statisticians. The nature of the solution suggests a method for using transformation groups to give meaning to the phrase "at random" in somewhat general situations.

...read moreread less

Journal Article•DOI•

[...]

David A. Freedman, Kenneth W. Wachter

Statistical Issues in Constructing High Resolution Physical Maps

TL;DR: The authors used 1990 census data to assess the synthetic assumption and found that heterogeneity within post-strata is quite large, with a corresponding impact on local undercount rates estimated by the synthetic method, and any comparison of error rates between the census and adjusted counts should take heterogeneity into account.

...read moreread less

Abstract: Current techniques for census adjustment involve the "synthetic assumption" that undercount rates are constant within "post-strata" across geographical areas. A poststratum is a subgroup of people with given demographic characteristics; poststrata are chosen to minimize heterogeneity in undercount rates. This paper will use 1990 census data to assess the synthetic assumption. We find that heterogeneity within poststrata is quite large, with a corresponding impact on local undercount rates estimated by the synthetic method. Thus, any comparison of error rates between the census and adjusted counts should take heterogeneity into account.

...read moreread less

Journal Article•DOI•

[...]

David O. Nelson, Terence P. Speed

[Neural Networks: A Review from Statistical Perspective]: Comment

TL;DR: The Human Genome Project (HGP) as mentioned in this paper was the first attempt to construct a physical map of the entire human genome, which has been used to identify genes responsible for serious inherited diseases like Huntington's disease, cystic fibrosis and myotonic dystrophy.

...read moreread less

Abstract: One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like Huntington's disease, cystic fibrosis and myotonic dystrophy. Instrumental in these efforts has been the construction of so-called physical maps of regions of human chromosomes. A major goal of the Human Genome Project is to construct physical maps of the entire human genome. Such maps will reduce the time and expense required to isolate and study interesting chromosomal regions by many orders of magnitude. This article describes what physical maps are and how they have been used, and it outlines some of the statistical issues involved in making them.

...read moreread less

Journal Article•DOI•

[...]

B. D. Ripley

[Neural Networks: A Review from Statistical Perspective]: Comment

Journal Article•DOI•

[...]

Andrew R. Barron

Can We Reach Consensus on Census Adjustment

TL;DR: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive as discussed by the authors.

...read moreread less

Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.

...read moreread less

Journal Article•DOI•

[...]

Thomas R. Belin, John E. Rolph

Comment: Some causes for concern about DNA profiles

TL;DR: In this paper, the authors provide context for decisions about census-taking strategy and comment on the recent literature on census adjustment including the papers by Freedman and Wachter and by Breiman contained in this issue; they also discuss the Census Bureau plans for the year 2000.

...read moreread less

Abstract: After providing context for decisions about census-taking strategy we comment on the recent literature on census adjustment including the papers by Freedman and Wachter and by Breiman contained in this issue; we also discuss the Census Bureaus plans for the year 2000. We conclude that the 1990 approach to summarizing the accuracy of an adjusted census can be improved upon but that many of the criticisms of census adjustment do not reflect a balanced decision-making perspective. We also conclude that the Census Bureau is pursuing constructive research in evaluating a `one-number census and we suggest that statisticians have a role to play in avoiding the costly legal battles that have plagued recent censuses by assisting in the process of deciding on a design for the 2000 census. Comments and rejoinders on this and related papers are included (pp. 508-37). (EXCERPT)

...read moreread less

Journal Article•DOI•

[...]

David J. Balding, Peter Donnelly, Richard A. Nichols

01 Jan 1994-Statistical Science

Journal Article•DOI•

Physical Oceanography: A Brief Overview for Statisticians

[...]

Dudley B. Chelton

Analysis of Genetic Data from the Polymerase Chain Reaction

TL;DR: Physical oceanography is the study of the physics of the ocean as discussed by the authors, which encompasses a very broad diversity of phenomena, ranging from the smallest space and time scales of order 1 second and 1 cm associated with vertical turbulent mixing, to the largest space and timescale of order centuries and 10,000 km associated with global climate variations.

...read moreread less

Abstract: Physical oceanography is the study of the physics of the ocean. As such, the discipline encompasses a very broad diversity of phenomena, ranging from the smallest space and time scales of order 1 second and 1 cm associated with vertical turbulent mixing, to the largest space and time scales of order centuries and 10,000 km associated with global climate variations. The processes occurring at different scales interact in very complicated ways. The multiscale characteristics of physical oceanographic data require sophisticated statistical analysis techniques to investigate a specific process and its interactions with other processes. Collaborative interactions between physical oceanographers and statisticians could potentially result in the development of new and innovative statistical techniques that could improve the present understanding of physical oceanography. The Statistics and Physical Oceanography report reproduced in this volume represents one element of an effort by the Office of Naval Research to stimulate more collaborations between the two disciplines. This introduction to the report provides a framework for understanding the context of the report. For the benefit of statisticians with little or no prior exposure to physical oceanography, this introduction also provides a brief survey of the general topics of physical oceanographic research, a description of the temporal and spatial scales of physical oceanographic data, and a summary of the demographics of physical oceanographers.

...read moreread less

Journal Article•DOI•

[...]

W. Navidi, N. Arnheim

[Neural Networks: A Review from Statistical Perspective]: Comment

TL;DR: The polymerase chain reaction makes possible rapid generation of a very large number of copies of a specific region of DNA, which enables the typing of quantities of DNA as small as a single molecule.

...read moreread less

Abstract: The polymerase chain reaction (PCR) makes possible rapid generation of a very large number of copies of a specific region of DNA. This enables the typing of quantities of DNA as small as a single molecule. PCR has led to the development of laboratory experiments which provide new approaches to many classic problems in genetics, such as estimation of linkage, marker ordering and genetic disease diagnosis. We describe some of these experiments and the statistical techniques that have been used to design them and to analyze the data they produce.

...read moreread less

Journal Article•DOI•

[...]

Robert Tibshirani

Small-area estimation: an appraisal - Comment

Journal Article•DOI•

[...]

Noel A Cressie¹, Mark S Kaiser¹•Institutions (1)

University of Wollongong¹

A Conversation with William Kruskal

TL;DR: In the comment, it will be discussed what opportunities there might be to expand the class of statistical models for small area data and to consider multivariate aspects of small area estimation.

...read moreread less

Abstract: Malay Ghosh and Jon Rao have presented us with a well written exposition of the topic of small area estimation. The past literature has been de-cidedly influenced by linear modeling, and we see that clearly in their paper. There has also been a tendency to judge the performance of the estimation methods by concentrating on a single, arbitrary small area. In our comment, we shall discuss what opportunities there might be to expand the class of statistical models for small area data and to consider multivariate aspects of small area estimation.

...read moreread less

Journal Article•DOI•

[...]

Sandy L. Zabell