scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 1998"


Journal ArticleDOI
TL;DR: It is shown that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences.
Abstract: Computing (ratios of) normalizing constants of probability models is a fundamental computational problem for many statistical and scientific studies. Monte Carlo simulation is an effective technique, es- pecially with complex and high-dimensional models. This paper aims to bring to the attention of general statistical audiences of some effective methods originating from theoretical physics and at the same time to ex- plore these methods from a more statistical perspective, through estab- lishing theoretical connections and illustrating their uses with statistical problems. We show that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences. The former generalizes importance sampling through the use of a single "bridge" density and is thus a case of bridge sampling in the sense of Meng and Wong. Thermodynamic integration, which is also known in the numerical analysis literature as Ogata's method for high-dimensional integration, corresponds to the use of infinitely many and continuously connected bridges (and thus a "path"). Our path sampling formulation offers more flexibility and thus potential efficiency to thermodynamic integration, and the search of op- timal paths turns out to have close connections with the Jeffreys prior density and the Rao and Hellinger distances between two densities. We provide an informative theoretical example as well as two empirical ex- amples (involving 17- to 70-dimensional integrations) to illustrate the potential and implementation of path sampling. We also discuss some open problems.

1,035 citations


Journal ArticleDOI
TL;DR: The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques as mentioned in this paper is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms.
Abstract: The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. Several examples are used to illustrate the methods. A brief account of stability issues and areas of applications of the techniques is also given.

255 citations


Journal ArticleDOI
TL;DR: It is suggested that most approaches can be embedded into a suitable version of the multiple change-point problem, and the various methods are reviewed in this light.
Abstract: This article examines methods, issues and controversies that have arisen over the last decade in the effort to organize sequences of DNA base information into homogeneous segments. An array of different models and techniques have been considered and applied. We demonstrate that most approaches can be embedded into a suitable version of the multiple change-point problem, and we review the various methods in this light. We also propose and discuss a promising local segmentation method, namely, the application of split local polynomial fitting. The genome of bacteriophage $\lambda$ serves as an example sequence throughout the paper.

224 citations


Journal ArticleDOI
TL;DR: In this paper an exposition is given of likelihood based frequentist inference that shows in particular which aspects of such inference cannot be separated from consideration of the missing value mechanism.
Abstract: One of the most often quoted results from the original work of Rubin and Little on the classification of missing value processes is the validity of likelihood based inferences under missing at random (MAR) mechanisms. Although the sense in which this result holds was precisely defined by Rubin, and explored by him in later work, it appears to be now used by some authors in a general and rather imprecise way, particularly with respect to the use of frequentist modes of inference. In this paper an exposition is given of likelihood based frequentist inference under an MAR mechanism that shows in particular which aspects of such inference cannot be separated from consideration of the missing value mechanism. The development is illustrated with three simple setups: a bivariate binary outcome, a bivariate Gaussian outcome and a two-stage sequential procedure with Gaussian outcome and with real longitudinal examples, involving both categorical and continuous outcomes. In particular, it is shown that the classical expected information matrix is biased and the use of the observed information matrix is recommended.

181 citations


Journal ArticleDOI
TL;DR: Fisher's philosophy is characterized as a series of shrewd compromises between the Bayesian and frequentist viewpoints, augmented by some unique characteristics that are particularly useful in applied problems as mentioned in this paper.
Abstract: Fisher is the single most important figure in 20th century statistics. This talk examines his influence on modern statistical thinking, trying to predict how Fisherian we can expect the 21st century to be. Fisher's philosophy is characterized as a series of shrewd compromises between the Bayesian and frequentist viewpoints, augmented by some unique characteristics that are particularly useful in applied problems. Several current research topics are examined with an eye toward Fisherian influence, or the lack of it, and what this portends for future statistical developments. Based on the 1996 Fisher lecture, the article closely follows the text of that talk.

135 citations


Book ChapterDOI
TL;DR: The handwritten digits taken from US envelopes are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.
Abstract: Figure 1 shows some handwritten digits taken from US envelopes. Each image consists of 16 × 16 pixels of greyscale values ranging from 0 – 255. These 256 pixel values are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

132 citations


Journal ArticleDOI
TL;DR: The International Organization for Standardization (ISO), in cooperation with several other international organizations, issued Guide to the Expression of Uncertainty in Measurement in order to establish, and standardize for international use, a set of general rules for evaluation and expressing uncertainty in measurement as discussed by the authors.
Abstract: In 1993 the International Organization for Standardization (ISO), in cooperation with several other international organizations, issued Guide to the Expression of Uncertainty in Measurement in order to establish, and standardize for international use, a set of general rules for evaluation and expressing uncertainty in measurement. The ISO recommendation has been of concern to many statisticians because it appears to combine frequentist performance measures and indices of subjective distributions in a way that neither frequentists nor Bayesians can fully endorse. The purpose of this review of the ISO Guide is to describe the essential recommendations made in the Guide, then to show how these recommendations can be regarded as approximate solutions to certain frequentist and Bayesian inference problems. The framework thus provided will, hopefully, allow statisticians to develop improvements to the ISO recommendations (particularly in the approximations used), and also better communicate with the physical science researchers who will be following the ISO guidelines.

81 citations


Journal ArticleDOI
TL;DR: A brief account of the history of sample measures of dispersion, with major emphasis on early developments, can be found in this paper, where the main contributors to this history are in chronological order, Lambert, Laplace, Gauss, Bienayme, Abbe, Helmert and Galton.
Abstract: This paper attempts a brief account of the history of sample measures of dispersion, with major emphasis on early developments. The statistics considered include standard deviation, mean deviation, median absolute deviation, mean difference, range, interquartile distance and linear functions of order statistics. The multiplicity of measures is seen to result from constant efforts to strike a balance between efficiency and ease of computation, with some recognition also of the desirabil- ity of robustness and theoretical convenience. Many individuals shaped this history, especially Gauss. The main contributors to our story are in chronological order, Lambert, Laplace, Gauss, Bienayme, Abbe, Helmert and Galton.

74 citations


Journal ArticleDOI
TL;DR: Alternative designs that have been proposed in the literature for randomized clinical trials that utilize clinician preferences differently than the standard randomized trial design are reviewed, the effects of clinician preference on the ability to estimate causal treatment differences from observational data are examined, and an alternative method of analysis for observational data that uses Clinician preferences explicitly is proposed.
Abstract: Clinician treatment preferences affect the ability to perform randomized clinical trials and the ability to analyze observational data for treatment effects. In clinical trials, clinician preferences that are based on a subjective analysis of the patient can make it difficult to define eligibility criteria for which clinicians would agree to randomize all patients who satisfy the criteria. In addition, since each clinician typically has some preference for the choice of treatment for a given patient, there are concerns about how strong that preference needs to be before it is inappropriate for him to randomize the choice of treatment. In observational studies, the fact that clinician preferences affect the choice of treatment is a major source of selection bias when estimating treatment effects. In this paper we review alternative designs that have been proposed in the literature for randomized clinical trials that utilize clinician preferences differently than the standard randomized trial design. We also examine the effects of clinician preferences on the ability to estimate causal treatment differences from observational data, and propose an alternative method of analysis for observational data that uses clinician preferences explicitly. We report on our experience to date in using our alternative randomized clinical trial design and our new method of observational analysis to compare two treatments at the orthodontic clinics at the University of California San Francisco and the University of the Pacific, San Francisco.

52 citations


Journal ArticleDOI
TL;DR: In this article, a coherent method of decision-making is examined in detail for a simple trial of bioequivalence, and the result is shown to differ seriously from the inferential method, using significance tests, ordinarily used.
Abstract: It is argued that the determination of bioequivalence involves a decision, and is not purely a problem of inference. A coherent method of decision-making is examined in detail for a simple trial of bioequiva- lence. The result is shown to differ seriously from the inferential method, using significance tests, ordinarily used. The reason for the difference is explored. It is shown how the decision-analytic method can be used in more complicated and realistic trials and the case for its general use presented. A recent paper in this journal (Berger and Hsu, 1996, and its ensuing discussion, hereinafter re- ferred to as BH) was on the topic of bioequivalence trials. The discussants included statisticians and practitioners in the pharmaceutical industry. There was nowhere any mention of an alternative ap- proach to bioequivalence that is both simpler and more practically relevant than those in BH. The purpose of this note is to outline the principle behind the alternative, illustrate it on a simple example and indicate how it might be used in practice.

38 citations


Journal ArticleDOI
TL;DR: In this article, the authors use the standard chi-square significance level to compare two binomial distributions and show that the posterior probability that the proportion of successes in the first population is larger than in the second population can be estimated from the standard (uncorrected) chi-squared significance level.
Abstract: The $2\times2$ table is used as a vehicle for discussing different approaches to statistical inference. Several of these approaches (both classical and Bayesian) are compared, and difficulties with them are highlighted. More frequent use of one-sided tests is advocated. Given independent samples from two binomial distributions, and taking independent Jeffreys priors, we note that the posterior probability that the proportion of successes in the first population is larger than in the second can be estimated from the standard (uncorrected) chi-square significance level. An exact formula for this probability is derived. However, we argue that usually it will be more appropriate to use dependent priors, and we suggest a particular "standard prior" for the $2\times2$ table. For small numbers of observations this is more conservative than Fisher's exact test, but it is less conservative for larger sample sizes. Several examples are given.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss selected applications of statistical theory and practice as motivated by and applied to environmental sciences and illustrate how the interaction between environmental scientists and quantitative researchers has been used to enhance and further learning in both areas, and how this interaction provides a source of further challenges and growth for the statistical community.
Abstract: We discuss selected applications of statistical theory and practice as motivated by and applied to environmental sciences. Included in the presentation are illustrations on how the interaction between environmental scientists and quantitative researchers has been used to enhance and further learning in both areas, and how this interaction provides a source of further challenges and growth for the statistical community.

Journal ArticleDOI
TL;DR: This paper describes a particular set of algorithms for clustering and shows how they lead to codes which can be used to com- press images, and argues for digital approaches to imaging in general.
Abstract: In this paper, we describe a particular set of algorithms for clustering and show how they lead to codes which can be used to com- press images. The approach is called tree-structured vector quantization (TSVQ) and amounts to a binary tree-structured two-means clustering, very much in the spirit of CART. This coding is thereafter put into the larger framework of information theory. Finally, we report the method- ology for how image compression was applied in a clinical setting, where the medical issue was the measurement of major blood vessels in the chest and the technology was magnetic resonance (MR) imaging. Mea- suring the sizes of blood vessels, of other organs and of tumors is fun- damental to evaluating aneurysms, especially prior to surgery. We argue for digital approaches to imaging in general, two benefits being improved archiving and transmission, and another improved clinical usefulness through the application of digital image processing. These goals seem particularly appropriate for technologies like MR that are inherently digital. However, even in this modern age, archiving the images of a busy radiological service is not possible without substantially compress- ing them. This means that the codes by which images are stored digi- tally, whether they arise from TSVQ or not, need to be "lossy," that is, not invertible. Since lossy coding necessarily entails the loss of digital in- formation, it behooves those who recommend it to demonstrate that the quality of medicine practiced is not diminished thereby. There is a grow- ing literature concerning the impact of lossy compression upon tasks that involve detection. However, we are not aware of similar studies of mea- surement. We feel that the study reported here of 30 scans compressed to 5 different levels, with measurements being made by 3 accomplished radiologists, is consistent with 16:1 lossy compression as we practice it being acceptable for the problem at hand.

Journal ArticleDOI
TL;DR: This paper showed that the singular sets of LAD and LMS are at least as large as that of LS and often much larger than LAD, and that LAD is frequently unstable.
Abstract: Say that a regression method is "unstable" at a data set if a small change in the data can cause a relatively large change in the fitted plane. A well-known example of this is the instability of least squares regression (LS) near (multi)collinear data sets. It is known that least absolute deviation (LAD) and least median of squares (LMS) linear regression can exhibit instability at data sets that are far from collinear. Clear-cut instability occurs at a "singularity"--a data set, arbitrarily small changes to which can substantially change the fit. For example, the collinear data sets are the singularities of LS. One way to measure the extent of instability of a regression method is to measure the size of its "singular set" (set of singularities). The dimension of the singular set is a tractable measure of its size that can be estimated without distributional assumptions or asymptotics. By applying a general theorem on the dimension of singular sets, we find that the singular sets of LAD and LMS are at least as large as that of LS and often much larger. Thus, prima facie, LAD and LMS are frequently unstable. This casts doubt on the trustworthiness of LAD and LMS as exploratory regression tools.

Journal ArticleDOI
TL;DR: This article classified statistical models into overlapping types called empirical, stochastic and predictive, all drawing on a common mathematical theory of probability, and all facilitating statements with logical and epistemic content.
Abstract: Arguments are presented to support increased emphasis on logical aspects of formal methods of analysis, depending on probability in the sense of R. A. Fisher. Formulating probabilistic models that convey uncertain knowledge of objective phenomena and using such models for inductive reasoning are central activities of individuals that introduce limited but necessary subjectivity into science. Statistical models are classified into overlapping types called here empirical, stochastic and predictive, all drawing on a common mathematical theory of probability, and all facilitating statements with logical and epistemic content. Contexts in which these ideas are intended to apply are discussed via three major examples.

Journal ArticleDOI
TL;DR: The Iowa State Statistical Laboratory was established in 1933, with George W. Snedecor as director, and the forces leading to this early creation of a formal unit are described.
Abstract: The Iowa State Statistical Laboratory was established in 1933, with George W. Snedecor as director. The forces leading to this early creation of a formal unit are described, including the roles played by Henry A. Wallace and R. A. Fisher. Preceding this account, the state of statistics in 1933 is outlined, with special emphasis on U.S. universities. The lives and contributions of several leading personalities are sketched.

Journal ArticleDOI
TL;DR: The Pullman meeting of IMS-WNAR as discussed by the authors had as one of its themes, "Statistical consulting." In this overview of the case studies presented there, an attempt is made to draw together some of the lessons of these papers, showing the diverse role of the statistician in collecting, analyzing and presenting the information contained in the data.
Abstract: The Pullman meeting of IMS-WNAR had, as one of its themes, "Statistical consulting." In this overview of the case studies presented there, an attempt is made to draw together some of the lessons of these papers, showing the diverse role of the statistician in collecting, analyzing and presenting the information contained in the data.

Journal ArticleDOI
TL;DR: In this article, it was shown that the result of Gauss cannot be obtained by the least squares method nor by any other approach mentioned by Gauss, and it has been suggested that Gauss used the method of least squares on a data set published in 1799.
Abstract: It has been suggested that Gauss used the method of least squares on a data set published in 1799. The data set and its adjustment are reexamined, and it is concluded that the result of Gauss cannot be obtained by the least-squares method nor by any other approach mentioned by Gauss.

Journal ArticleDOI
TL;DR: Shanti Gupta as discussed by the authors was one of the pioneers in the field of statistical decision theory and applied it to the problem of ranking and selection in the Indian Statistical Institute (ISI).
Abstract: Shanti Gupta was born and raised in Saunasi, Mainpuri, India. He attended Delhi University and received B.A. Honours and M.S. degrees in mathematics. He then took a one-year diploma course in applied statistics at the Indian Council of Agricultural Research, New Delhi, and his distinguished career in statistics was launched. He came to the United States in 1953 and received his Ph.D. in mathematical statistics in 1956 from the University of North Carolina at Chapel Hill working under the guidance of Professor Raj Chandra Bose. His thesis, "On a decision rule for a problem in ranking means," began his prolific formulation of and investigation into a class of problems referred to as "subset approach to ranking and selection." Shanti has held research and teaching positions at Delhi College, Bell Telephone Laboratories, University of Alberta, Courant Institute of Mathematical Science at New York University, Stanford University, and, most notably, Purdue University, where he has spent most of his professional career. At Purdue, he became Head of the newly created Department of Statistics in 1968 and served in that capacity until 1995 when he stepped down to devote more fully his time and energy to teaching and research. Under his leadership, Purdue's Department of Statistics grew into one of the premier departments in the country. Twenty-eight Ph.D. dissertations were directed by Shanti over this period of time. Shanti has provided exemplary service to the profession. A partial list of his activities includes the following: Founding Editor of the IMS Lecture Notes--Monograph Series in Statistics and Probability (1979-1988); Chairman of the Joint Management Committee of the ASA and IMS for Current Index to Statistics (1981-1988); Member of the NRC Advisory Committee on U.S. Army Basic Scientific Research (1983-1988). He served as President of the IMS during 1989-1990. He has also been very active with many of the statistical journals, serving in both editorial and board-member capacities. Shanti has received many honors in recognition of his valuable contributions. He is a Fellow of the ASA, AAAS and IMS and is an elected member of the International Statistical Institute. He has held special short-term visiting positions such as Special Chair, Institute of Mathematics, Academia Sinica, Taipei, Taiwan; and Erskine Fellow, University of Canterbury, Christchurch, New Zealand. He was one of the special invitees at the 1990 Taipei International Symposium in Statistics and was presented with the key to the city by the Mayor. A recent book, Advances in Statistical Decision Theory and Applications, published in honor of Shanti, contains many papers in the areas where his influence as a teacher and a researcher have been felt: Bayesian inference; decision theory; point and interval estimation; tests of hypotheses; ranking and selection; distributions theory; and industrial applications. The following conversation took place at the Department of Statistics, Purdue University, on 22 September 1997.

Journal ArticleDOI
TL;DR: Watson as mentioned in this paper was the first to use the Durbin-Watson test for serial correlation, the Nadaraya-W Watson estimator in nonparametric regression and fundamental methods for analyzing directional or axial data.
Abstract: Geoffrey Stuart Watson, Professor Emeritus at Princeton University, celebrated his 75th birthday on December 3, 1996. A native Australian, his early education included Bendigo High School and Scotch College in Melbourne. After graduating with a B.A. (Hons.) from Melbourne University in December 1942, he spent the next few years, during and after World War II, doing research and teaching on applied mathematical topics. His wandering as a scholar began in 1947, when he became a graduate student in the Institute of Statistics in Raleigh, North Carolina. Leaving Raleigh after two years, he wrote his thesis while visiting the Department of Applied Economics in Cambridge University. Raleigh awarded him the Ph.D. degree in 1951. That same year, he returned to Australia, to a Senior Lectureship in Statistics at Melbourne University. He moved in 1954 to a Senior Fellowship at the Australian National University. Three years later, he left for England and North America. In 1959, he became Associate Professor of Mathematics at the University of Toronto. In 1962, he became Professor of Statistics at The Johns Hopkins University in Baltimore. Soon thereafter he was appointed department chairman. In 1970, he moved to Princeton University as Professor and Chairman of Statistics. He became Professor Emeritus at Princeton in 1992. He has published numerous research papers on a broad range of topics in statistics and applied probability. His best known contributions are the Durbin-Watson test for serial correlation, the Nadaraya-Watson estimator in nonparametric regression and fundamental methods for analyzing directional or axial data. He is the author of an important monograph, Statistics on Spheres. His professional honors include Membership in the International Statistical Institute and Fellowships of the Institute of Mathematical Statistics and of the American Association for the Advancement of Science. In private life, he is an accomplished painter of watercolors, a few of which may be seen on his website (http://www.princeton.edu/gsw/) at Princeton University. He married Shirley Elwyn Jennings in 1952. Their four children, one son and three daughters, pursue careers in Japanese literature, health care in Uganda, singing opera, and administering opera and ballet.

Journal ArticleDOI
TL;DR: Ching Chun Li as mentioned in this paper was a Fellow of the American Statistical Association (elected 1969), an elected member of the International Statistical Institute (ISIIA), a Fellow Fellow of American Association for the Advancement of Science (AASA), and an elected board member of Academia Sinica (Chinese Academy).
Abstract: Ching Chun Li was born on October 27, 1912, in Tianjin, China. He received his B.S. degree in agronomy from the University of Nanjing, China, in 1936 and a Ph.D. in plant breeding and genetics from Cornell University in 1940. He did postgraduate work in mathematics, mathematical statistics and experimental statistics at the University of Chicago, Columbia University and North Carolina State College, 1940-1941. He is a Fellow of the American Statistical Association (elected 1969), an elected member of the International Statistical Institute, a Fellow of the American Association for the Advancement of Science and an elected member of Academia Sinica (Chinese Academy). He served as President of the American Society of Human Genetics in 1960. His tenure at the University of Pittsburgh began in 1951. He was Professor and Department Chairman, Biostatistics, from 1969 to 1975, and he was promoted to University Professor in 1975. Although he retired in 1983, he has remained active in research.