scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 2000"


Journal ArticleDOI
TL;DR: In this paper, the authors propose a general class of prior distributions for arbitrary regression models, called power prior distributions, which are based on the idea of raising the likelihood function of the historical data to the power ao, where 0 < ao < 1.
Abstract: We propose a general class of prior distributions for arbitrary regression models. We discuss parametric and semiparametric models. The prior specification for the regression coefficients focuses on observ- able quantities in that the elicitation is based on the availability of his- torical data Do and a scalar quantity ao quantifying the uncertainty in Do. Then Do and ao are used to specify a prior for the regression coeffi- cients in a semiautomatic fashion. The most natural specification of Do arises when the raw data from a similar previous study are available. The availability of historical data is quite common in clinical trials, car- cinogenicity studies, and environmental studies, where large data bases are available from similar previous studies. Although the methodology we present here is quite general, we will focus only on using historical data from similar previous studies to construct the prior distributions. The prior distributions are based on the idea of raising the likelihood function of the historical data to the power ao, where 0 < ao < 1. We call such prior distributions power prior distributions. We examine the power prior for four commonly used classes of regression models. These include generalized linear models, generalized linear mixed models, semipara- metric proportional hazards models, and cure rate models for survival data. For these classes of models, we discuss the construction of the power prior, prior elicitation issues, propriety conditions, model selec- tion, and several other properties. For each class of models, we present real data sets to demonstrate the proposed methodology.

628 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the tasks where sensitivity analysis (SA) can be useful and try to assess the relevance of SA within the modeling process, and suggest that SA could considerably assist in the use of models, by providing objective criteria of judgement for different phases of the model building process: model identification and discrimination; model calibration; model corroboration.
Abstract: We explore the tasks where sensitivity analysis (SA) can be useful and try to assess the relevance of SA within the modeling process We suggest that SA could considerably assist in the use of models, by providing objective criteria of judgement for different phases of the model­building process: model identification and discrimination; model calibration; model corroboration We review some new global quantitative SA methods and suggest that these might enlarge the scope for sensitivity analysis in computational and statistical modeling practice Among the advantages of the new methods are their robustness, model independence and computational convenience The discussion is based on worked examples

558 citations


Journal ArticleDOI
TL;DR: In this paper, the authors use point processes or marked point processes (PPP) to model the characteristics of a single-tree forest, where the points are tree locations and the marks are tree characteristics such as diameter at breast height or degree of damage by environmental factors.
Abstract: Forestry statistics is an important field of applied statistics with a long tradition. Many forestry problems can be solved by means of point processes or marked point processes. There, the "points" are tree locations and the "marks" are tree characteristics such as diameter at breast height or degree of damage by environmental factors. Point pro- cess characteristics are valuable tools for exploratory data analysis in forestry, for describing the variability of forest stands and for under- standing and quantifying ecological relationships. Models of point pro- cesses are also an important basis of modern single-tree modeling, that gives simulation tools for the investigation of forest structures and for the prediction of results of forestry operations such as plantation and thinning.

498 citations


Journal ArticleDOI
TL;DR: In this article, an alternative parameterization for the multilevel model in which the marginal mean, rather than the conditional mean given random effects, is regressed on covariates is presented.
Abstract: Hierarchical or ‘‘multilevel’’ regression models typically parameterize the mean response conditional on unobserved latent variables or ‘‘random’’ effects and then make simple assumptions regarding their distribution. The interpretation of a regression parameter in such a model is the change in possibly transformed mean response per unit change in a particular predictor having controlled for all conditioning variables including the random effects. An often overlooked limitation of the conditional formulation for nonlinear models is that the interpretation of regression coefficients and their estimates can be highly sensitive to difficult-to-verify assumptions about the distribution of random effects, particularly the dependence of the latent variable distribution on covariates. In this article, we present an alternative parameterization for the multilevel model in which the marginal mean, rather than the conditional mean given random effects, is regressed on covariates. The impact of random effects model violations on the marginal and more traditional conditional parameters is compared through calculation of asymptotic relative biases. A simple two-level example from a study of teratogenicity is presented where the binomial overdispersion depends on the binary treatment assignment and greatly influences likelihood-based estimates of the treatment effect in the conditional model. A second example considers a three-level structure where attitudes toward abortion over time are correlated with person and district level covariates. We observe that regression parameters in conditionally specified models are more sensitive to random effects assumptions than their counterparts in the marginal formulation.

264 citations


Journal ArticleDOI
TL;DR: A nonparametric approach to estimating temporal trends when fitting parametric models to extreme values from a weakly dependent time series is suggested and the Gaussian distribution is shown to have special features that permit it to play a universal role as a "nominal" model for the marginal distribution.
Abstract: A topic of major current interest in extreme-value analysis is the investigation of temporal trends. For example, the potential influ- ence of "greenhouse" effects may result in severe storms becoming grad- ually more frequent, or in maximum temperatures gradually increasing, with time. One approach to evaluating these possibilities is to fit, to data, a parametric model for temporal parameter variation, as well as a model describing the marginal distribution of data at any given point in time. However, structural trend models can be difficult to formulate in many circumstances, owing to the complex way in which different factors combine to influence data in the form of extremes. Moreover, it is not advisable to fit trend models without empirical evidence of their suitability. In this paper, motivated by datasets on windstorm severity and maximum temperature, we suggest a nonparametric approach to estimating temporal trends when fitting parametric models to extreme values from a weakly dependent time series. We illustrate the method through applications to time series where the marginal distributions are approximately Pareto, generalized-Pareto, extreme-value or Gaussian. We introduce time-varying probability plots to assess goodness of fit, we discuss local-likelihood approaches to fitting the marginal model within a window and we propose temporal cross-validation for selecting window width. In cases where both location and scale are estimated together, the Gaussian distribution is shown to have special features that permit it to play a universal role as a "nominal" model for the marginal distribution.

158 citations


Journal ArticleDOI
TL;DR: The Bayesian procedures are shown to be straightforward and provide a convenient framework for model-averaging, which incorporates the uncertainty due to model selection into the inference process.
Abstract: We present the Bayesian approach to estimating parameters associated with animal survival on the basis of data arising from mark recovery and recapture studies. We provide two examples, beginning with a discussion of band-return models and examining data gathered from observations of blue winged teal (Aas discors), ringed as nestlings. We then look at open population recapture models, focusing on the Cormack- Jolly-Seber model, and examine this model in the context of a data set on European dippers (Cinclus cinclus). The Bayesian procedures are shown to be straightforward and provide a convenient framework for model-averaging, which incorporates the uncertainty due to model selection into the inference process. Sufficient detail is provided so that readers who wish to employ the Bayesian approach in this field can do so with ease. An example of BUGS code is also provided.

150 citations


Journal ArticleDOI
TL;DR: Data qual- ity is a particularly troublesome issue in data mining applications, and this is examined.
Abstract: Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel chal- lenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pat- tern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data qual- ity is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.

150 citations


Journal ArticleDOI
TL;DR: In this paper, a general procedure for posterior sampling from additive and generalized additive models is proposed, which is a stochastic generalization of the well-known backfitting algorithm for fitting additive models.
Abstract: We propose general procedures for posterior sampling from additive and generalized additive models. The procedure is a stochastic generalization of the well-known backfitting algorithm for fitting additive models. One chooses a linear operator (“smoother”) for each predictor, and the algorithm requires only the application of the operator and its square root. The procedure is general and modular, and we describe its application to nonparametric, semiparametric and mixed models.

138 citations


Journal ArticleDOI
TL;DR: Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. as mentioned in this paper corrected a printing malfunction that caused all minus signs and some left parentheses to be omitted from the paper "Bayesian Model Averaging:A Tutorial " by Jennifer A. Hoeting.
Abstract: A printing malfunction caused all minus signs and some left parentheses to be omitted from the paper “Bayesian Model Averaging:A Tutorial ” by Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky in the November 1999 issue of Statistical Science (volume 14, pages 382–417).These errors occurred after the proof stage and were not the fault of the au- thors. Corrections to the paper are listed below. A corrected version of the paper is also available at http://www.stat.washington.edu/www/research/online/hoeting1999.pdf.Please cite this article as follows: Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999) “Bayesian Model Averaging: A Tutorial (with discussion)” Statistical Science 14:4, 382–417. Corrected version available at http://www.stat.washington.edu/www/research/online/hoeting1999.pdf.

124 citations


Journal ArticleDOI
TL;DR: A history on the speed of light up to the time of Michelson's study is presented and the details of a single study allow to place the method of statistics within the larger context of science.
Abstract: What is "statistical method"? Is it the same as "scientific method"? This paper answers the first question by specifying the ele- ments and procedures common to all statistical investigations and orga- nizing these into a single structure. This structure is illustrated by care- ful examination of the first scientific study on the speed of light carried out by A. A. Michelson in 1879. Our answer to the second question is neg- ative. To understand this a history on the speed of light up to the time of Michelson's study is presented. The larger history and the details of a single study allow us to place the method of statistics within the larger context of science.

82 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian formulation of the sample size problem for planning clinical trials is proposed, where the error probabilities are conditional on the true hypotheses instead of the outcome of the trial.
Abstract: We propose a Bayesian formulation of the sample size problem for planning clinical trials The frequentist paradigm for calculating sample sizes for clinical trials is to prespecify the type I and II error probabilities These error probabilities are conditional on the true hypotheses Instead we propose prespecifying posterior probabilities which are conditional on the outcome of the trial Our method is easy to implement and has intuitive interpretations We illustrate an application of our method to the planning of cancer clinical trials for the Eastern Cooperative Oncology Group (ECOG)

Journal ArticleDOI
TL;DR: Main purpose of the research is to uncover basic elements of applied statistical practice and statistical thinking for the use of teach- ers of statistics.
Abstract: Advancing computer technology is allowing us to downplay instruction in mechanical procedures and shift emphasis towards teaching the “art” of statistics. This paper is based upon interviews with six professional statisticians about statistical thinking and statistical practice. It presents themes emerging from their professional experience, emphasizing dimensions that were surprising to them and were not part of their statistical training. Emerging themes included components of sta­ tistical thinking, pointers to good statistical practices and the subtleties of interacting with the thinking of others, particularly coworkers and clients. The main purpose of the research is to uncover basic elements of applied statistical practice and statistical thinking for the use of teachers of statistics.

Journal ArticleDOI
TL;DR: The spectral envelope was introduced as a statistical basis for the frequency domain analysis and scaling of qualitative-valued time series as mentioned in this paper, and many other interesting extensions became evi- dent.
Abstract: The concept of the spectral envelope was recently introduced as a statistical basis for the frequency domain analysis and scaling of qualitative-valued time series. In the process of developing the spectral envelope methodology, many other interesting extensions became evi- dent. In this article we explain the basic concept and give numerous ex- amples of the usefulness of the technology. These examples include anal- yses of DNA sequences, finding optimal transformations for the analysis of real-valued time series, residual analysis, detecting common signals in many time series, and the analysis of textures.

Journal ArticleDOI
TL;DR: A review of the literature on the multiple root problem can be found in this article, where various approaches are discussed for selecting among the roots, including iterating from consistent estimators, examining the asymptotics when explicit formulas for roots are available, testing the consistency of each root, selecting by bootstrapping and using information-theoretic methods for certain parametric models.
Abstract: Estimating functions, such as the score or quasiscore,can have more than one root. In many of these cases, theory tells us that there is a unique consistent root of the estimating function. However, in practice, there may be considerable doubt as to which root is appropriate as a parameter estimate. The problem is of practical importance to data analysts and theoretically challenging as well. In this paper, we review the literature on this problem. A variety of examples are provided to illustrate the diversity of situations in which multiple roots can arise. Some methods are suggested to investigate the possibility of multiple roots, search for all roots and compute the distributions of the roots. Various approaches are discussed for selecting among the roots. These methods include (1) iterating from consistent estimators, (2) examining the asymptotics when explicit formulas for roots are available, (3) testing the consistency of each root, (4) selecting by bootstrapping and (5) using information-theoretic methods for certain parametric models. As an alternative approach to the problem, we consider how an estimating function can be modified to reduce the number of roots. Finally, we survey some techniques of artificial likelihoods for semiparametric models and discuss their relationship to the multiple root problem.

Journal ArticleDOI
TL;DR: In this article, saddlepoint methods are used to approximate reliabilities and failure rates of finite stochastic systems with feedback loops, including birth-death processes, and some countably in.nite state systems.
Abstract: It is shown how saddlepoint methods may be used to approximate reliabilities and failure rates of finite stochastic systems with feedback loops. Some countably in .nite state systems including birth-death processes are also considered. The use of saddlepoint methods requires as input the moment generating functions (MGFs) for the system failure time distributions. Some new explicit formulas for these MGFs are given that are amenable to symbolic computation and which also make the numerical computation of saddlepoint approximations quite simple and convenient.

Journal ArticleDOI
TL;DR: John Wilder Tukey, Donner Professor of Science Emeritus at Princeton University, has led the way to the fields of exploratory data analysis (EDA) and robust estimation and has coauthored several books.
Abstract: John Wilder Tukey, Donner Professor of Science Emeritus at Princeton University, was born in New Bedford, Massachusetts, on June 16, 1915. After earning bachelor's and master's degrees in chemistry at Brown University in 1936 and 1937, respectively, he started his career at Princeton University with a Ph.D. in mathematics in 1939 followed by an immediate appointment as Henry B. Fine Instructor in Mathematics. A decade later, at age 35, he was advanced to a full professorship. He directed the Statistical Research Group at Princeton University from its founding in 1956; when the Department of Statistics was formed in 1965, he was named its first chairman and held that post until 1970. He was appointed to the Donner Chair in 1976 and remained at Princeton until reaching emeritus status in 1985. At the same time, he was a Member of Technical Staff at ATT Bell Laboratories since 1945, advancing to Assistant Director of Research, Communications Principles, in 1958 and, in 1961, to Associate Executive Director, Research Information Sciences, a position he held until retirement in 1985. Throughout World War II he participated in projects assigned to the Princeton Branch of the Frankford Arsenal Fire Control Design Division. This wartime service marked the beginning of his close and continuing association with governmental committees and agencies. Among other activities he was a member of the U.S. Delegation to the Conference on the Discontinuance of Nuclear Weapons Tests in Geneva in 1959, served on the President's Science Advisory Committee from 1960 to 1964 and was a member of President Johnson's Task Force on Environmental Pollution and President Nixon's Task Force on Air Pollution. The long list of awards and honors that Tukey has received includes the S. S. Wilks Medal from the American Statistical Association (ASA) (1965), the National Medal of Science (1973), the Medal of Honor from the IEEE (1982), the Deming Medal from the American Society of Quality Control (1983) and the Educational Testing Service Award (1990). He holds honorary degrees from Case Institute of Technology, the University of Chicago and Brown, Temple, Yale and Waterloo Universities; in June 1998, he was awarded an honorary degree from Princeton University. He has led the way to the fields of exploratory data analysis (EDA) and robust estimation. His contributions to the spectral analysis of time series and other aspects of digital signal processes have been widely used in engineering and science. His collaboration with a fellow mathematician resulted in the discovery of the fast Fourier transform (FFT) algorithm. Author of Exploratory Data Analysis and eight volumes of collected papers, he has contributed to a wide variety of areas and has coauthored several books. He has guided more than 50 graduate students to successful Ph.D.'s and inspired their careers. A detailed list of his students as well as a complete curriculum vitae can be found in The Practice of Data Analysis (1997), edited by D. Brillinger, L. Fernholz, and S. Morgenthaler, Princeton University Press. John W. Tukey married Elizabeth Louise Rapp in 1950. Before their marriage, she was Personnel Director of the Educational Testing Service in Princeton, New Jersey.

Journal ArticleDOI
TL;DR: Interesting findings to emerge from examination of quantiles of standardized matching scores include (i) formal significance is not attainable when querying a database for a given fingerprint pattern and (ii) maximal matching probabilities are not necessarily monotonely decreasing with increasing numbers of fingerprint bands.
Abstract: Genotypes of infectious organisms are becoming the founda- tion for epidemiologic studies of infectious disease. Central to the use of such data is a means for comparing genotypes. We develop methods for this purpose in the context of DNA fingerprint genotyping of tuberculo- sis, but our approach is applicable to many fingerprint-based genotyp- ing systems and/or organisms. Data available on replicate (laboratory) strains here reveal that (i) error in fingerprint band size is proportional to band size and (ii) errors are positively correlated within a finger- print. Comparison (or matching) scores computed to account for this error structure need to be "standardized" in order to properly rank the comparisons. We demonstrate the utility of using extreme value distri- butions to effect such standardization. Several estimation issues for the extreme value parameters are discussed, including a lack of robustness of (approximate) maximum likelihood estimates. Interesting findings to emerge from examination of quantiles of standardized matching scores include (i) formal significance is not attainable when querying a database for a given fingerprint pattern and (ii) maximal matching probabilities are not necessarily monotonely decreasing with increasing numbers of fingerprint bands.

Journal ArticleDOI
TL;DR: In this article, an influence-based regression diagnostic for tectonic data is proposed to study the influence of a group of data points on a subparameter of interest, which can also be used in treatment-block designs to analyze the influence the blocks on the estimated treatment effects.
Abstract: We discuss a linearized model to analyze the errors in the reconstruction of the relative motion of two tectonic plates using marine magnetic anomaly data. More complicated geometries, consisting of several plates, can be analyzed by breaking the geometry into its stochastically independent parts and repeatedly applying a few simple algorithms to recombine these parts. A regression version of Welch’s solution to the Behrens-Fisher problem is needed in the recombination process. The methodology is illustrated using data from the Indian Ocean. Through a historical perspective we show how improving data density and improving statistical techniques have led to more sophisticated models for the Indo-Australian plate. We propose an influence­based regression diagnostic for tectonic data. A generalization of the standardized influence matrix of Lu, Ko and Chang is applied to study the influence of a group of data points on a subparameter of interest. This methodology could also be used in treatment-block designs to analyze the influence of the blocks on the estimated treatment effects.



Journal ArticleDOI
TL;DR: Sobel as discussed by the authors has made substantial contributions in several areas of statistics and mathematics, including decision theory, sequential analysis, selection and ranking, reliability analysis, combinatorial problems, Dirichlet processes, as well as statistical tables and computing.
Abstract: Milton Sobel was born in New York City on August 30, 1919. He earned his B.A. degree in mathematics from the City College of New York in 1940, an M.A. degree in mathematics and a Ph.D. degree in mathematical statistics from Columbia University in 1946 and 1951, respectively. His Ph.D. thesis advisor was Abraham Wald. He has made substantial contributions in several areas of statistics and mathematics—including decision theory, sequential analysis, selection and ranking, reliability analysis, combinatorial problems, Dirichlet processes, as well as statistical tables and computing. He has been particularly credited for path breaking contributions in selection and ranking, sequential analysis and reliability, includingthe landmark book, Sequential Identi fication and Ranking Procedures (1968), coauthored with Robert E. Bechhofer and Jack C. Kiefer. Later, he collaborated with Jean D.Gibbons and Ingram Olkin to write a methodologically oriented book, Selecting and Ordering Populations (1977), on the subject. He has published authoritative books on Dirichlet distributions, Type 1 and Type 2 with V. R. R.Uppuluri and K. Frankowski. He is the author or coauthor of more than one hundred and twenty research publications, many of which are part of today ’s statistical folklore. During the period July 1940 through June 1960, his career path led him to work at the Census Bureau, the Army War College (Fort McNair),Columbia University, Wayne State University, Cornell University and Bell Laboratories. From September 1960 through June 1975, he was Professor of Statistics at the University of Minnesota, and from July 1975 through June 1989 he was a Professor in the Department of Probability and Statistics at the University of California at Santa Barbara. He has since been a Professor Emeritus at UC Santa Barbara. He has earned many honors and awards, including Fellow of the Institute of Mathematical Statistics (1956) and Fellow of the American Statistical Association (1958),a Guggenheim Fellowship (1967 –1968), a NIH Fellowship (1968 –1969)and elected membership in the International Statistical Institute (1974). He continues to think and work harder than many half his age and still goes to his department at UC Santa Barbara every day. Milton Sobel remains vigorous in attacking and solving hard problems.

Journal ArticleDOI
TL;DR: Joe Waksberg joined the Census Bureau in 1940, and stayed there for 33 years, retiring in 1973 as the Associate Director for Statistical Methods, Research, and Standards, and joined Westat, a statistical research firm in suburban Maryland, where he has continued to work for the last 26 years.
Abstract: Joseph Waksberg was born September 20, 1915, in Kielce, Poland; his family emigrated to the United States in 1921. Soon after graduating from the City University of New York (CUNY) in 1936, he moved to the Washington, D.C. area. He joined the Census Bureau in 1940, and stayed there for 33 years, retiring in 1973 as the Associate Director for Statistical Methods, Research, and Standards. Joe then joined Westat, a statistical research firm in suburban Maryland. He has continued to work at Westat for the last 26 years, serving as Chairman of the Board of Westat since 1990. From 1967 to 1997, he also served as a consultant to CBS and other TV networks for election night predic- tions. He has served the profession of statistics in many roles and received numerous awards, including the Department of Commerce Gold Medal and the Roger Herriot Memorial Award from the American Statistical Association. He has been active in the American Statistical Association serving on the Board of Directors as chairs of both the Survey Research Methods Section and the Social Statistics Section and on a number of committees. He has been president of the Washington Statistical Society and is currently an Associate Editor of Survey Methodology.