scispace - formally typeset
Search or ask a question
JournalISSN: 0003-1305

The American Statistician 

Taylor & Francis
About: The American Statistician is an academic journal published by Taylor & Francis. The journal publishes majorly in the area(s): Statistician & Population. It has an ISSN identifier of 0003-1305. Over the lifetime, 3939 publications have been published receiving 196129 citations. The journal is also known as: American Statistician.


Papers
More filters
Journal ArticleDOI
Abstract: (2003). Applied Multivariate Statistics for the Social Sciences. The American Statistician: Vol. 57, No. 1, pp. 68-69.

7,141 citations

Journal ArticleDOI
TL;DR: Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose and is a fairly readable book for adoption in a graduate-level introductory course on datamining.
Abstract: Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose. The following review was performed independently of LaRose’s other two books. Paraphrasing from the Preface, the goal of this book is to “explore the process of data mining from the point of view of model building.” Nevertheless, the reader will soon be aware that this book is not intended to provide a systematic or comprehensive coverage of various data mining algorithms. Instead, it considers supervised learning or predictive modeling only, and it walks the reader through the data mining process merely with a few selected modeling methods such as (generalized) linear modeling and the Bayesian approach. The book has seven chapters. Chapter 1 introduces dimension reduction, with a focus on principal components analysis (PCA) types of techniques. Chapters 2, 3, and 4 provide a detailed coverage of simple linear regression, multiple linear regression, and logistic regression, respectively. Chapter 5 introduces naive Bayes estimation and Bayesian networks. In Chapter 6, the basic idea of genetic algorithms is discussed. Finally, Chapter 7 presents a case study example of modeling response to direct mail marketing within the CRISP (crossindustry standard process) framework. This book is very easy to read, and this is absolutely the strength which many readers, especially those nonstatistically oriented ones, will greatly appreciate. Predictive modeling is perhaps the most technical part in a data mining process. The author has done an excellent job in making this difficult topic accessible to a broad audience. For example, I like the way in which Bayesian networks are introduced in Chapter 5. After the reader goes through a churn example on naive Bayes estimation in a step-by-step manner, Bayesian belief networks become easily understood as natural extensions. The overall style of the book is clear and patient. The main limitation of the book is its limited coverage. An inspired reader would expect to see a much more extended list of topics. Hastie, Tibishirani, and Friedman (2001) gave a full and more technical account of various data mining algorithms. The inclusion of genetic algorithms in Chapter 6 seems novel when compared to Hastie, Tibishirani, and Friedman (2001), but at the same time, a little unexpected as a separate chapter, since a genetic algorithm involves a stochastics search scheme, which is somewhat involved given the elementary nature of this text. Another noteworthy issue is that the author does not make an attempt to distinguish between conventional statistical analysis and data mining. I found a few errors. On Page 25, for example, it should be ai = 1, instead of ai = 1/4. Also, in the frame on the top of Page 211, it might have been “Posterior Odds,” instead of “Posterior Odds Ratio.” The book uses three different software packages to implement the ideas including SPSS with Clementine, Minitab, and WEKA, which might not be appealing. On the other hand, it is justifiable as it allows one to perform data mining with affordable costs. In summary, I recommend this fairly readable book for adoption in a graduate-level introductory course on data mining, especially when the students come from varied backgrounds.

6,409 citations

Journal ArticleDOI
TL;DR: This article used multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development, using the propensity score as a distinct matching variable.
Abstract: Matched sampling is a method for selecting units from a large reservoir of potential controls to produce a control group of modest size that is similar to a treated group with respect to the distribution of observed covariates. We illustrate the use of multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development. A key idea is the use of the propensity score as a distinct matching variable.

5,633 citations

Journal ArticleDOI
TL;DR: The American Statistical Association (ASA) released a policy statement on p-values and statistical significance in 2015 as discussed by the authors, which was based on a discussion with the ASA Board of Trustees and concerned with reproducibility and replicability of scientific conclusions.
Abstract: Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p< 0.05: “We teach it because it’s what we do; we do it because it’s what we teach.” This concern was brought to the attention of the ASA Board. The ASA Board was also stimulated by highly visible discussions over the last few years. For example, ScienceNews (Siegfried 2010) wrote: “It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation.” A November 2013, article in Phys.org Science News Wire (2013) cited “numerous deep flaws” in null hypothesis significance testing. A ScienceNews article (Siegfried 2014) on February 7, 2014, said “statistical techniques for testing hypotheses...havemore flaws than Facebook’s privacy policies.” Aweek later, statistician and “Simply Statistics” blogger Jeff Leek responded. “The problem is not that people use P-values poorly,” Leek wrote, “it is that the vast majority of data analysis is not performed by people properly trained to perform data analysis” (Leek 2014). That same week, statistician and science writer Regina Nuzzo published an article in Nature entitled “Scientific Method: Statistical Errors” (Nuzzo 2014). That article is nowone of the most highly viewedNature articles, as reported by altmetric.com (http://www.altmetric.com/details/2115792#score). Of course, it was not simply a matter of responding to some articles in print. The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. Without getting into definitions and distinctions of these terms, we observe that much confusion and even doubt about the validity of science is arising. Such doubt can lead to radical choices, such as the one taken by the editors of Basic andApplied Social Psychology, who decided to ban p-values (null hypothesis significance testing) (Trafimow and Marks 2015). Misunderstanding or misuse of statistical inference is only one cause of the “reproducibility crisis” (Peng 2015), but to our community, it is an important one. When the ASA Board decided to take up the challenge of developing a policy statement on p-values and statistical significance, it did so recognizing this was not a lightly taken step. The ASA has not previously taken positions on specific matters of statistical practice. The closest the association has come to this is a statement on the use of value-added models (VAM) for educational assessment (Morganstein and Wasserstein 2014) and a statement on risk-limiting post-election audits (American Statistical Association 2010). However, these were truly policy-related statements. The VAM statement addressed a key educational policy issue, acknowledging the complexity of the issues involved, citing limitations of VAMs as effective performance models, and urging that they be developed and interpreted with the involvement of statisticians. The statement on election auditing was also in response to a major but specific policy issue (close elections in 2008), and said that statistically based election audits should become a routine part of election processes. By contrast, the Board envisioned that the ASA statement on p-values and statistical significance would shed light on an aspect of our field that is too often misunderstood and misused in the broader research community, and, in the process, provides the community a service. The intended audience would be researchers, practitioners, and science writers who are not primarily statisticians. Thus, this statementwould be quite different from anything previously attempted. The Board tasked Wasserstein with assembling a group of experts representing a wide variety of points of view. On behalf of the Board, he reached out to more than two dozen such people, all of whom said theywould be happy to be involved. Several expressed doubt about whether agreement could be reached, but those who did said, in effect, that if there was going to be a discussion, they wanted to be involved. Over the course of many months, group members discussed what format the statement should take, tried to more concretely visualize the audience for the statement, and began to find points of agreement. That turned out to be relatively easy to do, but it was just as easy to find points of intense disagreement. The time came for the group to sit down together to hash out these points, and so in October 2015, 20 members of the group met at the ASA Office in Alexandria, Virginia. The 2-day meeting was facilitated by Regina Nuzzo, and by the end of the meeting, a good set of points around which the statement could be built was developed. The next 3 months saw multiple drafts of the statement, reviewed by group members, by Board members (in a lengthy discussion at the November 2015 ASA Board meeting), and by members of the target audience. Finally, on January 29, 2016, the Executive Committee of the ASA approved the statement. The statement development process was lengthier and more controversial than anticipated. For example, there was considerable discussion about how best to address the issue of multiple potential comparisons (Gelman and Loken 2014). We debated at some length the issues behind the words “a p-value near 0.05 taken by itself offers only weak evidence against the null

4,361 citations

Journal ArticleDOI
Naomi Altman1
TL;DR: Kernel and nearest-neighbor regression estimators are local versions of univariate location estimators, and so they can readily be introduced to beginning students and consulting clients who are familiar with such summaries as the sample mean and median.
Abstract: Nonparametric regression is a set of techniques for estimating a regression curve without making strong assumptions about the shape of the true regression function. These techniques are therefore useful for building and checking parametric models, as well as for data description. Kernel and nearest-neighbor regression estimators are local versions of univariate location estimators, and so they can readily be introduced to beginning students and consulting clients who are familiar with such summaries as the sample mean and median.

4,298 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202323
202273
202185
202064
2019101
201852