scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 1994"


Journal Article•DOI•
TL;DR: This paper informs a statistical readership about Artificial Neural Networks (ANNs), points out some of the links with statistical methodology and encourages cross-disciplinary research in the directions most likely to bear fruit, and treats various topics in more depth.
Abstract: This paper informs a statistical readership about Artificial Neural Networks (ANNs), points out some of the links with statistical methodology and encourages cross-disciplinary research in the directions most likely to bear fruit. The areas of statistical interest are briefly outlined, and a series of examples indicates the flavor of ANN models. We then treat various topics in more depth. In each case, we describe the neural network architectures and training rules and provide a statistical commentary. The topics treated in this way are perceptrons (from single-unit to multilayer versions), Hopfield-type recurrent networks (including probabilistic versions strongly related to statistical physics and Gibbs distributions) and associative memory networks trained by so-called unsupervised learning rules. Perceptrons are shown to have strong associations with discriminant analysis and regression, and unsupervized networks with cluster analysis. The paper concludes with some thoughts on the future of the interface between neural networks and statistics.

1,114 citations


Journal Article•DOI•
TL;DR: When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputation models is to incorporate appropriate importance weights into the standard combining rules.
Abstract: Conducting sample surveys, imputing incomplete observa- tions, and analyzing the resulting data are three indispensable phases of modern practice with public-use data files and with many other statistical applications. Each phase inherits different input, including the information preceding it and the intellectual assessments available, and aims to provide output that is one step closer to arriving at statistical infer- ences with scientific relevance. However, the role of the imputation phase has often been viewed as merely providing computational convenience for users of data. Although facilitating computation is very important, such a viewpoint ignores the imputer's assessments and information inaccessible to the users. This view underlies the recent controversy over the validity of multiple-imputation inference when a procedure for analyzing multi- ply imputed data sets cannot be derived from (is "uncongenial" to) the model adopted for multiple imputation. Given sensible imputations and complete-data analysis procedures, inferences from standard multiple- imputation combining rules are typically superior to, and thus different from, users' incomplete-data analyses. The latter may suffer from serious nonresponse biases because such analyses often must rely on convenient but unrealistic assumptions about the nonresponse mechanism. When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating impu- tations is to incorporate appropriate importance weights into the standard combining rules. These points are reviewed and explored by simple exam- ples and general theory, from both Bayesian and frequentist perspectives, particularly from the randomization perspective. Some convenient terms are suggested for facilitating communication among researchers from dif- ferent perspectives when evaluating multiple-imputation inferences with uncongenial sources of input.

790 citations


Journal Article•DOI•
TL;DR: Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes seem to have a distinct advantage over other methods in small area estimation.
Abstract: Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. It is now widely recognized that direct survey estimates for small areas are likely to yield unacceptably large standard errors due to the smallness of sample sizes in the areas. This makes it necessary to "borrow strength" from related areas to find more accurate estimates for a given area or, simultaneously, for several areas. This has led to the development of alternative methods such as synthetic, sample size dependent, empirical best linear unbiased prediction, empirical Bayes and hierarchical Bayes estimation. The present article is largely an appraisal of some of these methods. The performance of these methods is also evaluated using some synthetic data resembling a business population. Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes, for most purposes, seem to have a distinct advantage over other methods.

738 citations


Journal Article•DOI•
TL;DR: In this article, the authors discuss some aspects of estimation and inference that arise in the study of such variability, focusing in particular on the estimation of substitution rates and their use in calibrating estimates of the time since the most recent common ancestor of a sample of sequences.
Abstract: Mitochondrial DNA sequence variation is now being used to study the history of our species. In this paper we discuss some aspects of estimation and inference that arise in the study of such variability, focusing in particular on the estimation of substitution rates and their use in calibrating estimates of the time since the most recent common ancestor of a sample of sequences. Observed DNA sequence variation is generated by superimposing the effects of mutation on the ancestral tree of the sequences. For data of the type studied here, this ancestral tree has to be modeled as a random process. Superimposing the effects of mutation produces complicated sampling distributions that form the basis of any statistical model for the data. Using such distributions--for example, for maximum likelihood estimation of rates--poses some difficult computational problems. We describe a Monte Carlo method, a cousin of the popular "Markov chain Monte Carlo," that has proved very useful in addressing some of these issues.

426 citations


Journal Article•DOI•
TL;DR: Cox as discussed by the authors was an assistant lecturer at the University of Cambridge from 1950 to 1955, and then visited the United States for 15 months, mainly at University of North Carolina, where he was employed at the Royal Aircraft Establishment and Wool Industries Research Association in Leeds.
Abstract: David Roxbee Cox was born in Birmingham on July 15, 1924. He attended Handsworth Grammar School and St. John's College, Cambridge. From 1944 to 1946 he was employed at the Royal Aircraft Establishment, and from 1946 to 1950 he was employed at the Wool Industries Research Association in Leeds. He obtained his Ph.D. from the University of Leeds in 1949. He was an assistant lecturer at the University of Cambridge from 1950 to 1955, and then visited the United States for 15 months, mainly at the University of North Carolina. From 1956 to 1966 he was Reader and then Professor of Statistics at Birkbeck College, London, and from 1966 to 1988 was Professor of Statistics at Imperial College, London. In 1988 he moved to Oxford to become the Warden of Nuffield College, a post from which he retired on July 31, 1994. He is now an Honorary Fellow of Nuffield College and a member of the Department of Statistics at the University of Oxford. In 1947 he married Joyce Drummond. They have four children and two grandchildren. Among his many honours, Sir David has received to date 10 honorary doctorates, an honorary fellowship from St. John's College, Cambridge, and honorary membership in four international academies. He has been awarded the Guy medals in Silver (1961) and Gold (1973) by the Royal Statistical Society. He was elected Fellow of the Royal Society of London in 1973 and was knighted in 1985. In 1990 he won the Kettering prize and gold medal for cancer research. He has authored or coauthored over 200 papers and 15 books. A list of his publications through 1988 is included in Hinkley, Reid and Snell (1991). From 1966 through 1991 he was the editor of Biometrika. He has supervised, encouraged and collaborated with innumerable students, postdoctoral fellows and colleagues. He has served as president of the Bernoulli Society and the Royal Statistical Society, and he is presidentelect of the International Statistical Institute. This conversation took place in Sir David's office at Nuffield College on October 26 and 27, 1993.

159 citations


Journal Article•DOI•
TL;DR: A critical review of recent research activity on bootstrap and related procedures is given in this paper, where the authors argue that much theoretical work is not serving the immediate needs of statistical practice.
Abstract: A critical review is given of recent research activity on bootstrap and related procedures. Theoretical work has shown the bootstrap approach to be a potentially powerful addition to the statistician's toolkit. We consider its impact on statistical practice and argue that, measured against the hopes raised by theoretical advances, this has been until now fairly modest. We suggest that while this state of affairs is a consequence to be expected of the sophisticated character of the bootstrap procedures required to cope reliably in many of the settings of most interest, much theoretical work is not serving the immediate needs of statistical practice. Emerging lines of research are reviewed and important future research directions suggested. In particular, we appeal for greater focussing of research activity on practicalities.

158 citations


Journal Article•DOI•
TL;DR: Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance of sequence alignment scores with gaps are extended.
Abstract: The Chen-Stein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring with no gaps. However, there has not been a valid method to assign statistical significance to alignment scores with gaps. In this paper we extend Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance.

152 citations


Journal Article•DOI•
TL;DR: In this paper, a study of the use of citation data to investigate the role statistics journals play in communication within that field and be- tween statistics and other fields is presented.
Abstract: This is a study of the use of citation data to investigate the role statistics journals play in communication within that field and be- tween statistics and other fields. The study looks at citations as import- export statistics reflecting intellectual influence. The principal findings include: there is little variability in both the number and diversity of im- ports, but great variability in both the number and diversity of exports and hence in the balance of trade; there is a tendency for influence to flow from theory to applications to a much greater extent than in the reverse direction; there is little communication between statistics and probabil- ity journals. The export scores model is introduced and employed to map a set of journals' bilateral intellectual influences onto a one-dimensional scale, and the Cox effect is identified as a phenomenon that can occur when a disciplinary paper attracts a large degree of attention from out- side its discipline.

135 citations



Journal Article•DOI•
TL;DR: For example, number-theoretic methods (NTM) are a class of techniques by which representative points of the uniform distribution on the unit cube of $R^s$ can be generated as mentioned in this paper.
Abstract: Number-theoretic methods (NTM's) are a class of techniques by which representative points of the uniform distribution on the unit cube of $R^s$ can be generated. NTM have been widely used in numerical analysis, especially in evaluation of high-dimensional integrals. Recently, NTM's have been extended to generate representative points for many useful multivariate distributions and have been systematically applied in statistics. In this paper, we shall introduce NTM's and review their applications in statistics, such as evaluation of the expected value of a random vector, statistical inference, regression analysis, geometric probability and experimental design.

95 citations


Journal Article•DOI•
TL;DR: In this work, multilocus segregation indicators are defined and proposed as the latent variables of choice in the case of very few individuals observed in each pedigree structure, such as occurs in homozygosity mapping and affected relative pair methods of genetic mapping.
Abstract: Monte Carlo likelihood is becoming increasingly used where exact likelihood analysis is computationally infeasible. One area in which such likelihoods arise is that of genetic mapping, where, increasingly, re- searchers wish to extract additional information from limited trait data through the use of multiple genetic markers. In the genetic analysis con- text, Monte Carlo likelihood is most conveniently considered as a latent variable problem. Markov chain Monte Carlo provides a method of obtain- ing realisations of underlying latent variables simulated under a genetic model, conditional upon observed data. Hence a Monte Carlo estimate of the likelihood surface can be formed. Choice of the latent variables can be as critical as choice of sampler. In the case of very few individuals observed in each pedigree structure, such as occurs in homozygosity mapping and affected relative pair methods of genetic mapping, multilocus segregation indicators are defined and proposed as the latent variables of choice. An example of five Werner's syndrome pedigrees is given; these are a subset of the 21 pedigrees on which homozygosity mapping has recently confirmed the location of the Werner's syndrome gene on chromosome 8. However, multilocus computations on these pedigrees are impractical with standard methods of exact likelihood computation.

Journal Article•DOI•
TL;DR: The thesis in this article is that, for most cases, the tremendous genetic variability among individuals obviates concern arising from minor violations of modeling assumptions.
Abstract: Forensic scientists have used genetic material (DNA) as evidence in criminal cases such as rape and murder since the middle of the last decade. The forensic scientist's interpretation of the evidence, however, has been subject to some criticism, especially when it involves statistical issues (including relevant areas of population genetics in the realm of statistics). These issues include the appropriate method of summarizing data subject to measurement error, independence of events in a DNA pattern or profile; characterization of heterogeneity of populations; appropriate sampling methods to develop reference databases; and probabilistic evaluation of evidence under uncertainty of appropriate reference database. I review these issues, with the goal of making them accessible to the statistical community. My thesis in this article is that, for most cases, the tremendous genetic variability among individuals obviates concern arising from minor violations of modeling assumptions.

Journal Article•DOI•
TL;DR: It turns out that with the use of suitable statistical estimation techniques, computer simulation procedures and numerical discretization methods it is possible to construct approximations of stochastic integrals with stable measures as integrators, and an effective, general method giving approximate solutions for a wide class of stoChastic differential equations involving such integrals is obtained.
Abstract: In this paper, we demonstrate some properties of $\alpha$-stable (stable) random variables and processes. It turns out that with the use of suitable statistical estimation techniques, computer simulation procedures and numerical discretization methods it is possible to construct approximations of stochastic integrals with stable measures as integrators. As a consequence we obtain an effective, general method giving approximate solutions for a wide class of stochastic differential equations involving such integrals. Application of computer graphics provides interesting quantitative and visual information on those features of stable variates which distinguish them from their commonly used Gaussian counterparts. It is possible to demonstrate evolution in time of densities with heavy tails of appropriate processes, to visualize the effect of jumps of trajectories, etc. We try to demonstrate that stable variates can be very useful in stochastic modeling of problems of different kinds, arising in science and engineering, which often provide better description of real life phenomena than their Gaussian counterparts.

Journal Article•DOI•
TL;DR: This paper found that when the Book of Genesis is written as two-dimensional arrays, equidistant letter sequences spelling words with related meanings often appear in close proximity and showed that the effect is significant at the level of 0.00002.
Abstract: It has been noted that when the Book of Genesis is written as two-dimensional arrays, equidistant letter sequences spelling words with related meanings often appear in close proximity. Quantitative tools for measuring this phenomenon are developed. Randomization analysis shows that the effect is significant at the level of 0.00002.

Journal Article•DOI•
TL;DR: Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau are used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates.
Abstract: The question of whether to adjust the 1990 [US] census using a capture-recapture model has been hotly argued in statistical journals and courtrooms Most of the arguments to date concern methodological issues rather than data quality Following the Post Enumeration Survey which was designed to provide the basic data for adjustment the Census Bureau carried out various evaluation studies to try to determine the accuracy of the adjusted counts as compared to the census counts This resulted in the P-project reports which totaled over a thousand pages of evaluation descriptions and tables Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau is used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates Comments and rejoinders on this and related papers are included (pp 508-37) (EXCERPT)


Journal Article•DOI•
TL;DR: In this paper, an alternative solution to the problem of finding the vertices of an obtuse-angled triangle has been proposed, which seems rather natural and should be especially appealing to statisticians and suggests a method for using transformation groups to give meaning to the phrase "at random" in general situations.
Abstract: On the 100th anniversary (1993) of Lewis Carroll's Pillow Problems, Eugene Seneta presented a selection of the problems the author, Charles Dodgson, claims to have solved while in bed. The se- lection omits the one problem in continuous probability: "Three points are taken at random on an infinite plane. Find the chance of their being the vertices of an obtuse-angled triangle." Charles Dodgson presents a solution that involves a clear error in conditioning. An alternative solu- tion is suggested here. This solution seems rather natural and should be especially appealing to statisticians. The nature of the solution suggests a method for using transformation groups to give meaning to the phrase "at random" in somewhat general situations.

Journal Article•DOI•
TL;DR: The authors used 1990 census data to assess the synthetic assumption and found that heterogeneity within post-strata is quite large, with a corresponding impact on local undercount rates estimated by the synthetic method, and any comparison of error rates between the census and adjusted counts should take heterogeneity into account.
Abstract: Current techniques for census adjustment involve the "synthetic assumption" that undercount rates are constant within "post-strata" across geographical areas. A poststratum is a subgroup of people with given demographic characteristics; poststrata are chosen to minimize heterogeneity in undercount rates. This paper will use 1990 census data to assess the synthetic assumption. We find that heterogeneity within poststrata is quite large, with a corresponding impact on local undercount rates estimated by the synthetic method. Thus, any comparison of error rates between the census and adjusted counts should take heterogeneity into account.

Journal Article•DOI•
TL;DR: The Human Genome Project (HGP) as mentioned in this paper was the first attempt to construct a physical map of the entire human genome, which has been used to identify genes responsible for serious inherited diseases like Huntington's disease, cystic fibrosis and myotonic dystrophy.
Abstract: One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like Huntington's disease, cystic fibrosis and myotonic dystrophy. Instrumental in these efforts has been the construction of so-called physical maps of regions of human chromosomes. A major goal of the Human Genome Project is to construct physical maps of the entire human genome. Such maps will reduce the time and expense required to isolate and study interesting chromosomal regions by many orders of magnitude. This article describes what physical maps are and how they have been used, and it outlines some of the statistical issues involved in making them.


Journal Article•DOI•
TL;DR: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive as discussed by the authors.
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.

Journal Article•DOI•
TL;DR: In this paper, the authors provide context for decisions about census-taking strategy and comment on the recent literature on census adjustment including the papers by Freedman and Wachter and by Breiman contained in this issue; they also discuss the Census Bureau plans for the year 2000.
Abstract: After providing context for decisions about census-taking strategy we comment on the recent literature on census adjustment including the papers by Freedman and Wachter and by Breiman contained in this issue; we also discuss the Census Bureaus plans for the year 2000. We conclude that the 1990 approach to summarizing the accuracy of an adjusted census can be improved upon but that many of the criticisms of census adjustment do not reflect a balanced decision-making perspective. We also conclude that the Census Bureau is pursuing constructive research in evaluating a `one-number census and we suggest that statisticians have a role to play in avoiding the costly legal battles that have plagued recent censuses by assisting in the process of deciding on a design for the 2000 census. Comments and rejoinders on this and related papers are included (pp. 508-37). (EXCERPT)


Journal Article•DOI•
TL;DR: Physical oceanography is the study of the physics of the ocean as discussed by the authors, which encompasses a very broad diversity of phenomena, ranging from the smallest space and time scales of order 1 second and 1 cm associated with vertical turbulent mixing, to the largest space and timescale of order centuries and 10,000 km associated with global climate variations.
Abstract: Physical oceanography is the study of the physics of the ocean. As such, the discipline encompasses a very broad diversity of phenomena, ranging from the smallest space and time scales of order 1 second and 1 cm associated with vertical turbulent mixing, to the largest space and time scales of order centuries and 10,000 km associated with global climate variations. The processes occurring at different scales interact in very complicated ways. The multiscale characteristics of physical oceanographic data require sophisticated statistical analysis techniques to investigate a specific process and its interactions with other processes. Collaborative interactions between physical oceanographers and statisticians could potentially result in the development of new and innovative statistical techniques that could improve the present understanding of physical oceanography. The Statistics and Physical Oceanography report reproduced in this volume represents one element of an effort by the Office of Naval Research to stimulate more collaborations between the two disciplines. This introduction to the report provides a framework for understanding the context of the report. For the benefit of statisticians with little or no prior exposure to physical oceanography, this introduction also provides a brief survey of the general topics of physical oceanographic research, a description of the temporal and spatial scales of physical oceanographic data, and a summary of the demographics of physical oceanographers.

Journal Article•DOI•
TL;DR: The polymerase chain reaction makes possible rapid generation of a very large number of copies of a specific region of DNA, which enables the typing of quantities of DNA as small as a single molecule.
Abstract: The polymerase chain reaction (PCR) makes possible rapid generation of a very large number of copies of a specific region of DNA. This enables the typing of quantities of DNA as small as a single molecule. PCR has led to the development of laboratory experiments which provide new approaches to many classic problems in genetics, such as estimation of linkage, marker ordering and genetic disease diagnosis. We describe some of these experiments and the statistical techniques that have been used to design them and to analyze the data they produce.


Journal Article•DOI•
TL;DR: In the comment, it will be discussed what opportunities there might be to expand the class of statistical models for small area data and to consider multivariate aspects of small area estimation.
Abstract: Malay Ghosh and Jon Rao have presented us with a well written exposition of the topic of small area estimation. The past literature has been de-cidedly influenced by linear modeling, and we see that clearly in their paper. There has also been a tendency to judge the performance of the estimation methods by concentrating on a single, arbitrary small area. In our comment, we shall discuss what opportunities there might be to expand the class of statistical models for small area data and to consider multivariate aspects of small area estimation.

Journal Article•DOI•
TL;DR: Kruskal was a member of the 1970-71 President's Commission on Federal Statistics, out of which grew the present Committee on National Statistics in the National Academy of Sciences-National Research Council as mentioned in this paper.
Abstract: William Henry Kruskal was born in New York City on 10 October 1919. His basic education was primarily in the public schools of New Rochelle, New York, a suburb of New York City. He attended Antioch College for two years and then transferred to Harvard College, from which he received the S.B. degree in 1940 and an M.S. in mathematics in 1941. Then he went to the U.S. Naval Proving Ground in Dahlgren, Virginia, first as a civilian and later with a USN commission. After World War II, he worked toward the Ph.D. in mathematical statistics at Columbia University, but joined the faculty of the newly formed statistics group at the University of Chicago before he completed and received his degree in 1955. He has remained at Chicago except for a summer at Harvard University and visits to the University of California at Berkeley and the Center for Advanced Study in the Behavioral Sciences at Stanford. Kruskal's research and teaching are closely linked. Among his primary research areas have been linear structures, nonparametric procedures, the taking of censuses, government statistics in general, the history of statistics, clarification of such concepts as representative sampling and normality, miracles and statistics, and the relative importances of causelike variables. From 1958 to 1961, Kruskal was the Editor of The Annals of Mathematical Statistics. He was a member of the 1970-71 President's Commission on Federal Statistics, out of which grew the present Committee on National Statistics in the National Academy of Sciences-National Research Council. Kruskal headed this Committee during its first six years. A different kind of activity was his editorship of the statistical part of the International Encyclopedia of the Social Sciences and his co-editorship (with Judith M. Tanur) of the International Encyclopedia of Statistics. He has for years been a trustee of the National Opinion Research Center (NORC). He held a Senior Postdoctoral NSF fellowship and was a Fellow of the John Simon Guggenheim Memorial Foundation. He has held a number of offices in professional organizations, including the presidencies of both the Institute of Mathematical Statistics and the American Statistical Association. At the University of Chicago, Kruskal chaired the Department of Statistics for six years and later was Dean of the University's Division of the Social Sciences for nine years. He also served in 1988-89 as Dean Pro Tempore of what is now the Irving B. Harris Graduate School of Public Policy Studies. Since 1973, he has been an Ernest DeWitt Burton Distinguished Service Professor (now Emeritus). In 1942 he and Norma Jane Evans, alas no longer alive, were married. There are three children: Vincent Joseph, Thomas Evan and Jonas David. Sandy Zabell is Professor, Departments of Mathematics and Statistics, Northwestern University, Evanston, Illinois 60208.