scispace - formally typeset
Search or ask a question

Showing papers in "Technometrics in 2005"


Journal ArticleDOI
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Abstract: (2005). Combining Pattern Classifiers: Methods and Algorithms. Technometrics: Vol. 47, No. 4, pp. 517-518.

3,933 citations


Journal ArticleDOI
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Abstract: (2005). Applied Multivariate Statistical Analysis. Technometrics: Vol. 47, No. 4, pp. 517-517.

3,932 citations


Journal ArticleDOI
TL;DR: The book does a good job explaining some fundamental computational methods in statistics and econometrics and will serve students well as a reference book for upper-level undergraduate courses or graduate courses in computational statistics, time series analysis, or econometric methods.
Abstract: (2005). A Beginner's Guide to Structural Equation Modeling. Technometrics: Vol. 47, No. 4, pp. 522-522.

2,948 citations


Journal ArticleDOI
TL;DR: This monograph is an outstanding monograph on current research on skewelliptical models and its generalizations and does an excellent job presenting the depth of methodological research as well as the breath of application regimes.
Abstract: (2005). Atmospheric Modeling, Data Assimilation, and Predictability. Technometrics: Vol. 47, No. 4, pp. 521-521.

1,580 citations


Journal ArticleDOI
TL;DR: This handbook is a very useful handbook for engineers, especially those working in signal processing, and provides real data bootstrap applications to illustrate the theory covered in the earlier chapters.
Abstract: tions. Bootstrap has found many applications in engineering field, including artificial neural networks, biomedical engineering, environmental engineering, image processing, and radar and sonar signal processing. Basic concepts of the bootstrap are summarized in each section as a step-by-step algorithm for ease of implementation. Most of the applications are taken from the signal processing literature. The principles of the bootstrap are introduced in Chapter 2. Both the nonparametric and parametric bootstrap procedures are explained. Babu and Singh (1984) have demonstrated that in general, these two procedures behave similarly for pivotal (Studentized) statistics. The fact that the bootstrap is not the solution for all of the problems has been known to statistics community for a long time; however, this fact is rarely touched on in the manuscripts meant for practitioners. It was first observed by Babu (1984) that the bootstrap does not work in the infinite variance case. Bootstrap Techniques for Signal Processing explains the limitations of bootstrap method with an example. I especially liked the presentation style. The basic results are stated without proofs; however, the application of each result is presented as a simple step-by-step process, easy for nonstatisticians to follow. The bootstrap procedures, such as moving block bootstrap for dependent data, along with applications to autoregressive models and for estimation of power spectral density, are also presented in Chapter 2. Signal detection in the presence of noise is generally formulated as a testing of hypothesis problem. Chapter 3 introduces principles of bootstrap hypothesis testing. The topics are introduced with interesting real life examples. Flow charts, typical in engineering literature, are used to aid explanations of the bootstrap hypothesis testing procedures. The bootstrap leads to second-order correction due to pivoting; this improvement in the results due to pivoting is also explained. In the second part of Chapter 3, signal processing is treated as a regression problem. The performance of the bootstrap for matched filters as well as constant false-alarm rate matched filters is also illustrated. Chapters 2 and 3 focus on estimation problems. Chapter 4 introduces bootstrap methods used in model selection. Due to the inherent structure of the subject matter, this chapter may be difficult for nonstatisticians to follow. Chapter 5 is the most impressive chapter in the book, especially from the standpoint of statisticians. It provides real data bootstrap applications to illustrate the theory covered in the earlier chapters. These include applications to optimal sensor placement for knock detection and land-mine detection. The authors also provide a MATLAB toolbox comprising frequently used routines. Overall, this is a very useful handbook for engineers, especially those working in signal processing.

1,292 citations


Journal ArticleDOI

1,131 citations


Journal ArticleDOI
TL;DR: Overall, Linear Models With R is well written and, given the increasing popularity of R, it is an important contribution.
Abstract: (2005). Time Series Analysis by State Space Methods. Technometrics: Vol. 47, No. 3, pp. 373-373.

1,115 citations


Journal ArticleDOI
TL;DR: The ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation, yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data.
Abstract: We introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensitive to outlying observations. Two robust approaches have been developed to date. The first approach is based on the eigenvectors of a robust scatter matrix such as the minimum covariance determinant or an S-estimator and is limited to relatively low-dimensional data. The second approach is based on projection pursuit and can handle high-dimensional data. Here we propose the ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation. ROBPCA yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data. ROBPCA can be computed rapidly, and is able to detect exact-fit situations. As a by-product, ROBPCA produces a diagnostic plot that displays and classifies the outliers. We apply the algorithm to several datasets from chemometrics and engineering.

935 citations


Journal ArticleDOI
TL;DR: This book contains things I have never seen in other introductory statistics textbooks, such as the first names of Bonferroni, Venn, and Likert, and a photograph of George Gallup.
Abstract: (2005). Statistical Analysis of Spatial Point Patterns. Technometrics: Vol. 47, No. 4, pp. 516-517.

836 citations


Journal ArticleDOI
TL;DR: This book is well intentioned for science majors, but it seems to miss the mark by being overly complicated with the unnecessary use of calculus on one hand and the recommendation of some poor statistical techniques on the other hand.
Abstract: There are some nice examples and datasets to cover the points. The section on “fitting a scatterplot by eye” has to be a first for a book claiming to be a serious statistics text for students. With all of the software available to do regression, this approach is inexcusable. Chapters 16 and 17 cover the analysis of categorical data and resampling methods. This book is well intentioned for science majors, but it seems to miss the mark by being overly complicated with the unnecessary use of calculus on one hand and the recommendation of some poor statistical techniques on the other hand. Starting out science majors on their careers with a useful basis in statistics will serve them well, so care should be taken to avoid techniques that complicate or underwhelm. The authors have the makings of an excellent text, but they need to pare it back a little before I can recommend it.

783 citations


Journal ArticleDOI
TL;DR: To fully grasp some of the more complex aspects of sensitivity analysis, additional references are needed to supplement this text, and the technical material seemed overly complicated and lacking sufficient explanation.


Journal ArticleDOI
TL;DR: Basic Statistics and Data Analysis is not suitable for students majoring in math, science, and engineering, because of the sparse coverage of statistical topics applicable to these fields.
Abstract: To summarize, Basic Statistics and Data Analysis is not suitable for students majoring in math, science, and engineering, because of the sparse coverage of statistical topics applicable to these fields. As a general audience text, the book is well written in terms of style and readability. However, the instructor would have to be predisposed to include a heavy dose of nonparametric statistics in an introductory course and plan on presenting clearer guidelines on when such tests are appropriate.

Journal ArticleDOI
TL;DR: Following Cook and Weisberg (1999, p. 432), the most important idea from the recent literature is that MLR is the study of the conditional distribution of the response variable given the predictors, and this distribution can be visualized with a plot of the fitted values versus the responseVariable.
Abstract: Weisberg (1985). Also, very little recent literature (after 1984) is covered (with the exception of Sec. 7.3, which covers radial basis functions). Following Cook and Weisberg (1999, p. 432), the most important idea from the recent literature is that MLR is the study of the conditional distribution of the response variable given the predictors, and this distribution can be visualized with a plot of the fitted values versus the response variable. Texts that do not discuss this plot may be obsolete.

Journal ArticleDOI
TL;DR: This book did not live up to my expectations and disappointed in three specific ways, including that the amount of the text specifically devoted to probabilistic reasoning is relatively small compared to the book’s total length.
Abstract: At 520 pages and with a title of Cognition and Chance: The Psychology of Probabilistic Reasoning, I had hoped that this book would deepen my understanding of how human beings reason probabilistically. My interest stems from teaching and practicing statistics, where I have learned to be wary of quantitative intuition. I think that this is a common feeling among statisticians, who as a group often directly experience the inherent limitations and biases of how human beings observe, process, and interpret quantitative information. We know, for example, that human beings tend to find patterns in data where none exist, are wonderfully adept at post hoc rationalization of results and outcomes, quickly leap from correlation to causation, are subject to many subtle biases, and so on. So an authoritative treatment of the psychology of probabilistic reasoning would be quite useful and could help us understand when to trust and when to question human intuition. For example, what types of quantitative reasoning is the human brain naturally better or worse at? When are we good intuitive probabilists and when are we bad? Do we know why? Thus in reading this book, I hoped to gain a better understanding of such issues, if for no other reason than to help me better know when to trust my own intuition. Unfortunately, this book did not live up to my expectations. As I discuss more fully later in this review, it disappointed in three specific ways. The first disappointment is that the amount of the text specifically devoted to probabilistic reasoning is relatively small compared to the book’s total length. At 520 pages and 12 chapters, I expected a fairly deep and thorough discussion of the book’s titled topic. Yet only Chapter 11 (“People as Intuitive Probabilists”) is devoted to the particular subject of probabilistic reasoning. Chapters 8–10 (“Estimation and Prediction,” “Perception of Covariation and Contingency,” and “Choice Under Uncertainty”) are also related, discussing other aspects of quantitative reasoning, but fully two-thirds of the book is devoted to topics that are largely general background material. In particular, Chapters 1–7 focus on topics such as a general history of the field of probability as it developed from games of chance; the various meanings, interpretations, and misinterpretations of the concepts of randomness and coincidence; an entire chapter explaining Bayes’ theorem; another chapter devoted to a discussion of various paradoxes (e.g., St. Petersburg, Simpson’s); and, a general exposition of the field of statistics. These chapters seem to have been written for a lay audience and are largely nonquantitative. Other than perhaps the first chapter, they are likely to be of only passing interest to someone with advanced statistical training. The second disappointment is that the exposition tends to be more broad than deep. A typical discussion in the later chapters is of the form “researcher A found this and researcher B found that, whereas researcher C found a contrary result,” with the results described in only the briefest and most general terms. Few, if any, specifics about the various research efforts are described, and little effort seems to have been made to discuss the results in anything more than a superficial manner. For example, in Chapter 8 the author writes, “When asked to observe a set of numbers and to estimate some measure of central tendency, such as its mean, people are able under some conditions to produce reasonably accurate estimates (Beach & Swensson, 1966; Edwards, 1967; C.R. Peterson & Beach, 1967), although systematic deviations from actual values have also been reported (N.H. Anderson, 1964; I.P. Levin, 1974, 1975)” (p. 284). Now, although I have chosen one of the more egregious examples of a singularly unhelpful “discussion,” the lack of detail here is not atypical of the book’s general tone and approach. As a result, the reader is often left without sufficient information to truly understand the strengths or limitations of the cited results or the author’s summary conclusion. In a related vein, although the author’s grasp of a very large body of material is quite impressive, the narration often feels more like a wandering discussion than a focused examination. Furthermore, although each chapter does have a summary section, as does the book, each of these is superficial, simply regurgitating various general discussions from each chapter in an even more general fashion. In fact, after 435 pages of text, the summary chapter for the entire book is less than two full pages long. The third disappointment—related to the second—is the failure of the text to go beyond lists and discussions of individual studies and provide the reader with a broader context in which to place the information. That is, on completion of the book, the reader is left with various categories of research study results generally summarized, but little to no information about what this means for the broader question of how humans reason quantitatively and whether or not there are theories or models that help explain, summarize, or synthesize the various study results into some larger framework of human probabilistic reasoning. For example, there are no charts or graphics or tabularizations anywhere in the book that provide the reader with an overview or taxonomy of the field of research. Similarly, there is no outline or description of how psychologists think about or summarize the observed phenomenon nor any real discussion about various theories that may exist to explain human quantitative or probabilistic reasoning. Nothing in the book provides a reader with any sort of “big picture” within which to understand how the various lengthy expositions fit. Criticism aside, in reading Cognition and Chance: The Psychology of Probabilistic Reasoning, I did expand my knowledge about probabilistic reasoning. My disappointment may be the result of unrealistic expectations on my part or perhaps insufficient editorial assistance by the publisher. On the positive side, the book does bring together many diverse sources and results on a host of topics. As such, it could serve as a useful starting point for a new researcher beginning a study of some aspect of quantitative reasoning.

Journal ArticleDOI
TL;DR: A new method for selecting a common subset of explanatory variables where the aim is to model several response variables based on the (joint) residual sum of squares while constraining the parameter estimates to lie within a suitable polyhedral region is proposed.
Abstract: We propose a new method for selecting a common subset of explanatory variables where the aim is to model several response variables. The idea is a natural extension of the LASSO technique proposed by Tibshirani (1996) and is based on the (joint) residual sum of squares while constraining the parameter estimates to lie within a suitable polyhedral region. The properties of the resulting convex programming problem are analyzed for the special case of an orthonormal design. For the general case, we develop an efficient interior point algorithm. The method is illustrated on a dataset with infrared spectrometry measurements on 14 qualitatively different but correlated responses using 770 wavelengths. The aim is to select a subset of the wavelengths suitable for use as predictors for as many of the responses as possible.

Journal ArticleDOI
TL;DR: In this paper, multi-way analysis is applied in the chemical sciences, and the authors present a survey of applications in the field of Multi-Way Analysis in the Chemical Sciences.
Abstract: (2005). Multi-Way Analysis: Applications in the Chemical Sciences. Technometrics: Vol. 47, No. 4, pp. 518-519.

Journal ArticleDOI
TL;DR: This book would have been more useful had some more detailed discussion on the choices of ranked set size k and cycle number m been added, however, overall I would highly recommend this well-written and reasonably priced book to researchers and practitioners.
Abstract: The book comprises eight chapters of varying length. The inclusion of sections at the end of the main chapters to collect more technical arguments and to give bibliographic notes works well and helps the readers explore in more depth aspects of RSS in which they are interested. Chapter 1 introduces the notion and general procedure of RSS. This very useful chapter will enable readers to quickly enter into the realm of RSS, learn about its historical developments, and identify applications of particular interest. Chapters 2 and 3 discuss balanced RSS. In particular, Chapter 2 focuses on nonparametric RSS, in which no assumption on the underlying distribution of the variable of interest is made. This chapter studies in detail the relative efficiency of RSS with respect to SRS in the estimation of a population mean, a smooth function of means, and population quantiles. The authors also consider the inference procedures, such as the construction of confidence intervals and hypothesis testing. To facilitate the inference procedures based on RSS sample quantiles, they also discuss the kernel method of density estimation. This section is quite interesting. The chapter also presents some robust procedure based on M-estimates with RSS data. Chapter 3 addresses parametric RSS, where the underlying distribution of the variable of interest is assumed to belong to some parametric family (e.g., location-scale family and shape-scale family) of distributions. The authors nicely lay out the theoretical foundation for the parametric RSS via Fisher information. The maximum likelihood estimate (MLE) based on RSS and its relative efficiency with respect to MLE based on SRS are studied, and the best linear unbiased estimate for location family of distributions is dealt with. Chapter 4 studies unbalanced RSS. This chapter first develops the methodology of analyzing RSS data for the inferences on distribution functions and quantiles, as well as general statistical functionals. The optimal designs for the parametric location-scale family and for nonparametric estimation of quantiles are discussed in detail. This chapter also contains methods of Bayes design and adaptive design. Chapter 5 explores classical distribution-free tests in the context of RSS. The authors consider the sign test, signed rank test, and Mann–Whitney– Wilcoxon tests and revisit the issue of the optimal design for distribution-free tests. Readers with a prior knowledge of nonparametric tests at the level of Gibbons and Chakraborti (2003) will find this chapter informative and easy to understand. For readers not familiar with these standard topics, some brief additional explanation and references might have been beneficial for the wider accessibility. Chapter 6 describes RSS with concomitant variables. A multilayer RSS scheme and an adaptive RSS scheme using multiple concomitant variables are developed; the general regression analysis using RSS is discussed; and the design of optimal RSS schemes for regression analysis, on the basis of the concomitant variables, is explored. Chapter 7 illustrates RSS as a data reduction tool for data mining, whereas Chapter 8 exemplifies the practical features of RSS via case studies. The inclusion of this last chapter on case studies with RSS further enhances the value of this monograph for practitioners and applied statisticians. In the development of RSS, the choices of ranked set size k and cycle number m are directly pertinent to practical problems. This book would have been more useful had some more detailed discussion on the choices been added. However, overall I would highly recommend this well-written and reasonably priced book to researchers and practitioners, all of whom are likely to use one or more of the methods it discusses.

Journal ArticleDOI
TL;DR: In this article, applied Bayesian modeling and Causal Inference from Incomplete-Data Perspectives are presented. But they do not consider the use of complete-data data for inference.
Abstract: (2005). Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives. Technometrics: Vol. 47, No. 4, pp. 519-519.

Journal ArticleDOI
TL;DR: This handbook is a very useful handbook for engineers, especially those working in signal processing, and provides real data bootstrap applications to illustrate the theory covered in the earlier chapters.
Abstract: tions. Bootstrap has found many applications in engineering field, including artificial neural networks, biomedical engineering, environmental engineering, image processing, and radar and sonar signal processing. Basic concepts of the bootstrap are summarized in each section as a step-by-step algorithm for ease of implementation. Most of the applications are taken from the signal processing literature. The principles of the bootstrap are introduced in Chapter 2. Both the nonparametric and parametric bootstrap procedures are explained. Babu and Singh (1984) have demonstrated that in general, these two procedures behave similarly for pivotal (Studentized) statistics. The fact that the bootstrap is not the solution for all of the problems has been known to statistics community for a long time; however, this fact is rarely touched on in the manuscripts meant for practitioners. It was first observed by Babu (1984) that the bootstrap does not work in the infinite variance case. Bootstrap Techniques for Signal Processing explains the limitations of bootstrap method with an example. I especially liked the presentation style. The basic results are stated without proofs; however, the application of each result is presented as a simple step-by-step process, easy for nonstatisticians to follow. The bootstrap procedures, such as moving block bootstrap for dependent data, along with applications to autoregressive models and for estimation of power spectral density, are also presented in Chapter 2. Signal detection in the presence of noise is generally formulated as a testing of hypothesis problem. Chapter 3 introduces principles of bootstrap hypothesis testing. The topics are introduced with interesting real life examples. Flow charts, typical in engineering literature, are used to aid explanations of the bootstrap hypothesis testing procedures. The bootstrap leads to second-order correction due to pivoting; this improvement in the results due to pivoting is also explained. In the second part of Chapter 3, signal processing is treated as a regression problem. The performance of the bootstrap for matched filters as well as constant false-alarm rate matched filters is also illustrated. Chapters 2 and 3 focus on estimation problems. Chapter 4 introduces bootstrap methods used in model selection. Due to the inherent structure of the subject matter, this chapter may be difficult for nonstatisticians to follow. Chapter 5 is the most impressive chapter in the book, especially from the standpoint of statisticians. It provides real data bootstrap applications to illustrate the theory covered in the earlier chapters. These include applications to optimal sensor placement for knock detection and land-mine detection. The authors also provide a MATLAB toolbox comprising frequently used routines. Overall, this is a very useful handbook for engineers, especially those working in signal processing.

Journal ArticleDOI
TL;DR: The author makes amply clear the cases where R2 is not to be used, such as in the case of a clustered scatterplot where R1 is virtually determined by a point, making a line unreliable if not meaningless.
Abstract: ingenious uses and revealing examples of scatterplot matrices and smoothers like loess smooth mean function estimates. I find the titles and the contents of Chapter 4 (“Drawing Conclusions”), Chapter 9 (“Outliers and Influence”), and Chapter 10 (“Variable Selection”) user-friendly and helpful for a practitioner who is not necessarily a statistician by keeping away from the jargon of statistics thus easing the use of statistics. These are among the hot issues that can confront practitioners, and the userfriendly style will help make applied statistics amenable to them. A prominent supplementary feature of the book is the rich and extensive data and the datasets that have been made available via the Internet. The datasets are used throughout the book without interrupting the flow of the text’s presentation. This intelligent use of the Internet will make it possible for students, applied statisticians, and practitioners to readily relate the concepts to actual statistical data drawn from a wide spectrum of real sources without making the book voluminous or requiring time-consuming manual entry of the data. Some of these data have historical flavors, like the height inheritance traits data from Pearson and Lee (1903). All of the datasets used to illustrate examples throughout the book or needed for the exercise problems are made available on-line at the webpage created for the book, http://www.stat.umn.edu/alr. The website also accommodates an errata section for the typos in this edition. Many regression books inadvertently tempt practitioners to make inappropriate use of R2 as a “useful” summary of regression. This author makes amply clear the cases where R2 is not to be used, such as in the case of a clustered scatterplot where R2 is virtually determined by a point, making a line unreliable if not meaningless. It may seem odd to have Chapters 11 (“Nonlinear Regression”) and Chapter 12 (“Logistic Regression”) in a linear regression book, but I was pleased to see their inclusion. This is because many practitioners often seek solutions to their problems in the linear regression arena as the natural and most popular initial candidate. However, the analysis or the nature of the data may force them to consider nonlinear models or logistic regression, as in the case of categorical data. In these cases readers and practitioners may not need to consult another specialized book, because of the thoughtful coverage provided in this text. The book is intended to be focused on methodology, and hence no extensive discussion is directly provided about computer packages and graphical software, even though computation and simulation are used throughout. This is facilitated by the very intelligent use of the Internet as an optional supplement to the book using the previously mentioned URL. There the readers can also learn how to use many popular statistical packages in applying the examples of the book. Supported with on-line R and S–PLUS libraries, one finds these and the website materials invaluable.

Journal ArticleDOI
TL;DR: This introductory treatment of generalized linear models (GLMs) is aimed at an audience familiar with the use of regression modeling and interested in extending those ideas to the application of the GLMs, and provides an easy introduction to the subject.
Abstract: This introductory treatment of generalized linear models (GLMs) is aimed at an audience familiar with the use of regression modeling and interested in extending those ideas to the application of the GLMs. The book’s goal is to provide an easy introduction to the subject and to motivate and illustrate the application of GLMs through the use of examples. The examples chosen to illustrate these models tend to be from the social sciences, and the text illustrates the application of the techniques through either the Stata, SAS, or SPSS statistical software packages, with Stata used most of the time. The book does not try to provide an in-depth explanation of the theory underlying the use of the GLM, although it does have numerous references to books and articles where that theory can be explored (e.g., McCullagh and Nelder 1989). The book is organized into eight chapters that cover three main areas: an introduction that reviews the linear model and introduces GLMs, a main body that discusses some common types of GLMs and illustrates those types through examples, and a closing chapter that briefly introduces some more advanced topics and suggests sources for learning more about those topics. The primary goal is to provide an introduction that will allow a reader to start using the techniques on their own data using one of the aforementioned statistics packages. It also stresses the development of correct interpretation of the models and points out some potential pitfalls in these techniques. Within the introductory material, Chapter 1 focuses on a review of the linear model, with emphasis on the form and interpretation of the regression model and the fact that within regression, the dependent variable is assumed to be continuous. Chapter 1 also reviews eight critical assumptions underlying the use of ordinary least squares for model estimation and diagnostic methods for detecting violation of these assumptions. Finally, it discusses interaction effects among the independent variables and some caveats on fitting models with interaction terms. Chapter 2 introduces GLMs as a method for extending linear model techniques into situations where the dependent variable is not continuous and where some of the key assumptions for ordinary least squares are violated. The overview of GLMs introduces the link function, explains its role in modeling, and reviews some of the common distributions used in GLMs. The distributions reviewed are the binomial, multinomial, Poisson, and negative binomial. A nontechnical explanation of the method of maximum likelihood for estimating the GLM is also provided, although no theory behind the technique is presented. Finally, some methods for checking the fit of a GLM are discussed. The methods covered include the likelihood ratio test, the deviance, pseudo-R2, the Akaike information criterion, and the Bayesian information criterion, with examples of their use provided through the Stata statistics package. The main body of the text consists of Chapters 3–7. These chapters introduce and discuss the application of GLMs for a number of common types of data. The form of the dependent random variable is emphasized throughout, and in most cases it takes the form of a discrete random variable. Each chapter also presents appropriate interpretations of the models and techniques for performing diagnostics on the fitted models. Chapter 3 discusses the use of logistic and probit regression models for a dichotomous variable. This chapter also discusses and fully illustrates the odds ratio, its uses, and its interpretation. To illustrate the logistic and probit regression techniques, simple examples are used initially, with complications gradually added to fully demonstrate the usefulness of these models. Within the simplest examples, methods that can be manually calculated, like cross-tabulation, are used to illuminate and interpret the more complex regression techniques. Chapters 4 and 5 discuss the extension of logistic and probit analysis techniques to situations where the dependent variable has more than two possible categories. Chapter 4 focuses on cases where the dependent variable is an ordered categorical variable, whereas Chapter 5 focuses on unordered categorical or polytomous variables. In each chapter, examples are again used to illustrate the methods, with output from Stata forming the basis of the examples. The book also stresses the underlying assumptions made in each of the types of models, along with methods for performing diagnostics to explore the validity of those assumptions where possible. Chapter 6 discusses using Poisson and negative binomial regression for modeling count data and standardized count rates, such as cases per 100,000 individuals. The effects of underdispersion and overdispersion on these models and methods for estimating the dispersion parameter are included. This chapter also introduces the concepts of zero-inflated and hurdle models and provides references that can be investigated to learn more about those models. Finally, Chapter 7 presents methods for estimating survival or event history models. Most attention is focused on fitting parametric regression models for log-normal, exponential, and Weibull distributions, as well as using the Cox proportional hazards model as a semiparametric approach. The concept of censoring is explained, and its potential effect on the model estimates is explored through the examples. This chapter includes many references to more detailed works on the theory and application of these techniques and has a balance of references from both the social sciences and biostatistics. The last chapter of the book provides a brief discussion of some further topics related to GLMs and some suggestions for pursuing further study of those topics. The topics covered include sample selection bias, endogeneity, longitudinal and panel data, multilevel or hierarchical models, and nonparametric regression models. Overall, the book provides some very good nontechnical discussion of many of the issues and finer points in fitting and interpreting GLMs. However, the book is very focused on using the Stata package to fit the models. For a reader who is using or has access to Stata, this book will be a very useful introduction to the application of GLMs. The book also provides SPSS and SAS function calls to reproduce all of the examples in an Appendix, so it could be a useful text for a reader interested in using those packages as well, although the differences in output of those packages from Stata may cause the reader to work at comparing his or her results to the ones in the book.

Journal ArticleDOI
TL;DR: Although the author has achieved the purposes he sets out in Chapter 1, I can recommend this book only to experienced statisticians and analysts who either have access to SAS EM or are trying to justify its purchase to their management.
Abstract: Several of the logic structures and formats used leave much to be desired. Part I leaves the reader with only the barest explanation of which methods apply to which problems. There are rather large leaps in mathematical symbolism that spans algebra, integral calculus, matrices, sets, and directed graphs. Questions regarding the details of computation, code, algorithms, mathematics, and statistics are directed to references. This tactic would be appealing had the citations referred to specific chapters and pages. Another frequent irritation is that much terminology is introduced without definition and many terms are not explained until subsequent chapters. References to case studies in the methods section are few, and the case studies themselves do not obviously point back to the methods section that they support. Consequently, it is easy to get lost in tangential concepts, which could have been avoided with a more sequential presentation. In Part II, each case is presented according to the following outline: Objectives, description of the data, EDA, model building, model comparison, and summary report. Only the models change between cases. The individual cases are well organized, and the structure of the presentation will appeal to any scientifically inclined reader. My only serious criticism of Part II is that the defense of the “best” method chosen often seems highly subjective, especially when computational methods are being evaluated. Although the author has achieved the purposes he sets out in Chapter 1, I can recommend this book only to experienced statisticians and analysts who either have access to SAS EM or are trying to justify its purchase to their management. It might also be of value to users trying to assemble a set of DM procedures, because SAS has clearly defined the state of the art in DM packages. Those interested in the do-it-yourself SAS approach may wish to consult the texts by Fernandez (2003) and Rud (2001), reported in Technometrics by Caby (2004) and Ziegel (2002).

Journal ArticleDOI
TL;DR: This book is an excellent addition to my modeling library and looks to be as good a book as Volume I on unbalanced models has been published.
Abstract: In general, the sections of each chapter cover the mathematical model, analysis of variance, point and interval estimation of variance components, hypothesis tests, and Bayesian estimation. Estimation methods include method of moments, maximum likelihood, restricted maximum likelihood, and Bayesian estimators. Comparisons of estimators are given for each case. Confidence intervals and hypothesis tests for variance components and functions of variance components are discussed thoroughly. Each chapter includes plenty of examples, most of which contain hand calculations. There are also numerous examples of software output, including SAS, SPSS, and BMDP output. The book also includes sample size calculations based on cost of sampling. The layout of this comprehensive book makes it easy to find exactly what you need. The writing is clear, giving extensive explanations of advanced topics. The authors make it clear in the Preface that they are providing comprehensive coverage of linear models with random effects. They do not, in general, provide proofs of theorems. There is also no material on methods other than the univariate models named in the chapter titles, such as sequential or nonparametric methods. However, plenty of references are given for the reader who wishes to delve into the theory more deeply. The Remarks section of each topic contains these references, as well as small details and historical information. In short, this book is an excellent addition to my modeling library. Volume II on unbalanced models has been published and looks to be as good a book as Volume I.

Journal ArticleDOI
TL;DR: Comparisons based on both simulated and real data show that the proposed procedure is more robust than its competitors, especially for large m.
Abstract: Both principal components analysis (PCA) and orthogonal regression deal with finding a p-dimensional linear manifold minimizing a scale of the orthogonal distances of the m-dimensional data points to the manifold. The main conceptual difference is that in PCA p is estimated from the data, to attain a small proportion of unexplained variability, whereas in orthogonal regression p equals m − 1. The two main approaches to robust PCA are using the eigenvectors of a robust covariance matrix and searching for the projections that maximize or minimize a robust (univariate) dispersion measure. This article is more akin to second approach. But rather than finding the components one by one, we directly undertake the problem of finding, for a given p, a p-dimensional linear manifold minimizing a robust scale of the orthogonal distances of the data points to the manifold. The scale may be either a smooth M-scale or a “trimmed” scale. An iterative algorithm is developed that is shown to converge to a local minimum. A ...

Journal ArticleDOI
TL;DR: The proposed penalized likelihood approach is applied to the reduction of piston slap, an unwanted engine noise due to piston secondary motion and is particularly important in the context of a computationally intensive simulation model where the number of simulation runs must be kept small because collection of a large sample set is prohibitive.
Abstract: Kriging is a popular analysis approach for computer experiments for the purpose of creating a cheap-to-compute “meta-model” as a surrogate to a computationally expensive engineering simulation model. The maximum likelihood approach is used to estimate the parameters in the kriging model. However, the likelihood function near the optimum may be flat in some situations, which leads to maximum likelihood estimates for the parameters in the covariance matrix that have very large variance. To overcome this difficulty, a penalized likelihood approach is proposed for the kriging model. Both theoretical analysis and empirical experience using real world data suggest that the proposed method is particularly important in the context of a computationally intensive simulation model where the number of simulation runs must be kept small because collection of a large sample set is prohibitive. The proposed approach is applied to the reduction of piston slap, an unwanted engine noise due to piston secondary motion. Issu...

Journal ArticleDOI
TL;DR: This work demonstrates that the changepoint formulation is competitive with the best of traditional formulations for detecting step changes in parameters, and implies an immediate benefit of the formulation, that the run behavior is controlled despite the lack of a large phase I sample.
Abstract: Statistical process control (SPC) involves ongoing checks to ensure that neither the mean nor the variability of the process readings has changed. Conventionally, this is done by pairs of charts—Shewhart X and S (or R) charts, cumulative sum charts for mean and for variance, or exponentially weighted moving average charts for mean and variance. The traditional methods of calculating the statistical properties of control charts are based on the assumption that the in-control true mean and variance were known exactly, and use these assumed true values to set center lines, control limits, and decision intervals. The reality, however, is that true parameter values are seldom if ever known exactly; rather, they are commonly estimated from a phase I sample. The random errors in the estimates lead to uncertain run length distribution of the resulting charts. An attractive alternative to the traditional charting methods is a single chart using the unknown-parameter likelihood ratio test for a change in mean and/o...

Journal ArticleDOI
TL;DR: The signal resistance values are calculated to be the largest standardized deviation from target not leading to an immediate out-of-control signal and support the recommendation that Shewhart limits should be used with exponentially weighted moving average charts, especially when the smoothing parameter is small.
Abstract: Many types of control charts have an ability to detect process changes that can weaken over time depending on the past data observed. This is often referred to as the “inertia problem.” We propose a new measure of inertia, the signal resistance, to be the largest standardized deviation from target not leading to an immediate out-of-control signal. We calculate the signal resistance values for several types of univariate and multivariate charts. Our conclusions support the recommendation that Shewhart limits should be used with exponentially weighted moving average charts, especially when the smoothing parameter is small.

Journal ArticleDOI
TL;DR: This book is a reasonable place to start if you are faced with this type of data, and there are many references with authors that both chemists and statisticians will recognize.
Abstract: Overall, at this price, I wish I could have left out conclusions 4 and 5. In addition, some fully worked out (with software) real examples are needed. As I found out long ago, being able to prove theorems does not mean that one can apply them. The novice (e.g., someone like me) needs training in both. However, one good thing is that there are many references with authors that both chemists and statisticians will recognize. Hopefully, that helps. If you are faced with this type of data, perhaps this book is a reasonable place to start. I know of no others.

Journal ArticleDOI
TL;DR: This work presents a rather straightforward and rich extension of penalized signal regression using penalized B-spline tensor products, where appropriate difference penalties are placed on the rows and columns of the tensor product coefficients.
Abstract: We propose a general approach to regression on digitized multidimensional signals that can pose severe challenges to standard statistical methods. The main contribution of this work is to build a two-dimensional coefficient surface that allows for interaction across the indexing plane of the regressor array. We aim to use the estimated coefficient surface for reliable (scalar) prediction. We assume that the coefficients are smooth along both indices. We present a rather straightforward and rich extension of penalized signal regression using penalized B-spline tensor products, where appropriate difference penalties are placed on the rows and columns of the tensor product coefficients. Our methods are grounded in standard penalized regression, and thus cross-validation, effective dimension, and other diagnostics are accessible. Further, the model is easily transplanted into the generalized linear model framework. An illustrative example motivates our proposed methodology, and performance comparisons are mad...