Journal•ISSN: 0003-1305

The American Statistician

Taylor & Francis

About: The American Statistician is an academic journal published by Taylor & Francis. The journal publishes majorly in the area(s): Statistician & Population. It has an ISSN identifier of 0003-1305. Over the lifetime, 3939 publications have been published receiving 196129 citations. The journal is also known as: American Statistician.

...read moreread less

Topics: Statistician, Population, Estimator, Confidence interval, Regression analysis ...read more

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Applied Multivariate Statistics for the Social Sciences

[...]

Richard Gonzalez¹•Institutions (1)

University of Michigan¹

01 Feb 2003-The American Statistician

Abstract: (2003). Applied Multivariate Statistics for the Social Sciences. The American Statistician: Vol. 57, No. 1, pp. 68-69.

...read moreread less

7,141 citations

Journal Article•DOI•

Confirmatory Factor Analysis for Applied Research

[...]

Phil Wood

01 Feb 2008-The American Statistician

TL;DR: Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose and is a fairly readable book for adoption in a graduate-level introductory course on datamining.

...read moreread less

Abstract: Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose. The following review was performed independently of LaRose’s other two books. Paraphrasing from the Preface, the goal of this book is to “explore the process of data mining from the point of view of model building.” Nevertheless, the reader will soon be aware that this book is not intended to provide a systematic or comprehensive coverage of various data mining algorithms. Instead, it considers supervised learning or predictive modeling only, and it walks the reader through the data mining process merely with a few selected modeling methods such as (generalized) linear modeling and the Bayesian approach. The book has seven chapters. Chapter 1 introduces dimension reduction, with a focus on principal components analysis (PCA) types of techniques. Chapters 2, 3, and 4 provide a detailed coverage of simple linear regression, multiple linear regression, and logistic regression, respectively. Chapter 5 introduces naive Bayes estimation and Bayesian networks. In Chapter 6, the basic idea of genetic algorithms is discussed. Finally, Chapter 7 presents a case study example of modeling response to direct mail marketing within the CRISP (crossindustry standard process) framework. This book is very easy to read, and this is absolutely the strength which many readers, especially those nonstatistically oriented ones, will greatly appreciate. Predictive modeling is perhaps the most technical part in a data mining process. The author has done an excellent job in making this difficult topic accessible to a broad audience. For example, I like the way in which Bayesian networks are introduced in Chapter 5. After the reader goes through a churn example on naive Bayes estimation in a step-by-step manner, Bayesian belief networks become easily understood as natural extensions. The overall style of the book is clear and patient. The main limitation of the book is its limited coverage. An inspired reader would expect to see a much more extended list of topics. Hastie, Tibishirani, and Friedman (2001) gave a full and more technical account of various data mining algorithms. The inclusion of genetic algorithms in Chapter 6 seems novel when compared to Hastie, Tibishirani, and Friedman (2001), but at the same time, a little unexpected as a separate chapter, since a genetic algorithm involves a stochastics search scheme, which is somewhat involved given the elementary nature of this text. Another noteworthy issue is that the author does not make an attempt to distinguish between conventional statistical analysis and data mining. I found a few errors. On Page 25, for example, it should be ai = 1, instead of ai = 1/4. Also, in the frame on the top of Page 211, it might have been “Posterior Odds,” instead of “Posterior Odds Ratio.” The book uses three different software packages to implement the ideas including SPSS with Clementine, Minitab, and WEKA, which might not be appealing. On the other hand, it is justifiable as it allows one to perform data mining with affordable costs. In summary, I recommend this fairly readable book for adoption in a graduate-level introductory course on data mining, especially when the students come from varied backgrounds.

...read moreread less

6,409 citations

Journal Article•DOI•

Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score

[...]

Paul R. Rosenbaum¹, Donald B. Rubin²•Institutions (2)

Princeton University¹, Harvard University²

01 Feb 1985-The American Statistician

TL;DR: This article used multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development, using the propensity score as a distinct matching variable.

...read moreread less

Abstract: Matched sampling is a method for selecting units from a large reservoir of potential controls to produce a control group of modest size that is similar to a treated group with respect to the distribution of observed covariates. We illustrate the use of multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development. A key idea is the use of the propensity score as a distinct matching variable.

...read moreread less

5,633 citations

Journal Article•DOI•

The ASA's Statement on p-Values: Context, Process, and Purpose

[...]

Ronald L. Wasserstein¹, Nicole A. Lazar¹•Institutions (1)

American Statistical Association¹

09 Jun 2016-The American Statistician

TL;DR: The American Statistical Association (ASA) released a policy statement on p-values and statistical significance in 2015 as discussed by the authors, which was based on a discussion with the ASA Board of Trustees and concerned with reproducibility and replicability of scientific conclusions.

...read moreread less

Abstract: Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p< 0.05: “We teach it because it’s what we do; we do it because it’s what we teach.” This concern was brought to the attention of the ASA Board. The ASA Board was also stimulated by highly visible discussions over the last few years. For example, ScienceNews (Siegfried 2010) wrote: “It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation.” A November 2013, article in Phys.org Science News Wire (2013) cited “numerous deep flaws” in null hypothesis significance testing. A ScienceNews article (Siegfried 2014) on February 7, 2014, said “statistical techniques for testing hypotheses...havemore flaws than Facebook’s privacy policies.” Aweek later, statistician and “Simply Statistics” blogger Jeff Leek responded. “The problem is not that people use P-values poorly,” Leek wrote, “it is that the vast majority of data analysis is not performed by people properly trained to perform data analysis” (Leek 2014). That same week, statistician and science writer Regina Nuzzo published an article in Nature entitled “Scientific Method: Statistical Errors” (Nuzzo 2014). That article is nowone of the most highly viewedNature articles, as reported by altmetric.com (http://www.altmetric.com/details/2115792#score). Of course, it was not simply a matter of responding to some articles in print. The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. Without getting into definitions and distinctions of these terms, we observe that much confusion and even doubt about the validity of science is arising. Such doubt can lead to radical choices, such as the one taken by the editors of Basic andApplied Social Psychology, who decided to ban p-values (null hypothesis significance testing) (Trafimow and Marks 2015). Misunderstanding or misuse of statistical inference is only one cause of the “reproducibility crisis” (Peng 2015), but to our community, it is an important one. When the ASA Board decided to take up the challenge of developing a policy statement on p-values and statistical significance, it did so recognizing this was not a lightly taken step. The ASA has not previously taken positions on specific matters of statistical practice. The closest the association has come to this is a statement on the use of value-added models (VAM) for educational assessment (Morganstein and Wasserstein 2014) and a statement on risk-limiting post-election audits (American Statistical Association 2010). However, these were truly policy-related statements. The VAM statement addressed a key educational policy issue, acknowledging the complexity of the issues involved, citing limitations of VAMs as effective performance models, and urging that they be developed and interpreted with the involvement of statisticians. The statement on election auditing was also in response to a major but specific policy issue (close elections in 2008), and said that statistically based election audits should become a routine part of election processes. By contrast, the Board envisioned that the ASA statement on p-values and statistical significance would shed light on an aspect of our field that is too often misunderstood and misused in the broader research community, and, in the process, provides the community a service. The intended audience would be researchers, practitioners, and science writers who are not primarily statisticians. Thus, this statementwould be quite different from anything previously attempted. The Board tasked Wasserstein with assembling a group of experts representing a wide variety of points of view. On behalf of the Board, he reached out to more than two dozen such people, all of whom said theywould be happy to be involved. Several expressed doubt about whether agreement could be reached, but those who did said, in effect, that if there was going to be a discussion, they wanted to be involved. Over the course of many months, group members discussed what format the statement should take, tried to more concretely visualize the audience for the statement, and began to find points of agreement. That turned out to be relatively easy to do, but it was just as easy to find points of intense disagreement. The time came for the group to sit down together to hash out these points, and so in October 2015, 20 members of the group met at the ASA Office in Alexandria, Virginia. The 2-day meeting was facilitated by Regina Nuzzo, and by the end of the meeting, a good set of points around which the statement could be built was developed. The next 3 months saw multiple drafts of the statement, reviewed by group members, by Board members (in a lengthy discussion at the November 2015 ASA Board meeting), and by members of the target audience. Finally, on January 29, 2016, the Executive Committee of the ASA approved the statement. The statement development process was lengthier and more controversial than anticipated. For example, there was considerable discussion about how best to address the issue of multiple potential comparisons (Gelman and Loken 2014). We debated at some length the issues behind the words “a p-value near 0.05 taken by itself offers only weak evidence against the null

...read moreread less

4,361 citations

Journal Article•DOI•

An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression

[...]

Naomi Altman¹•Institutions (1)

Cornell University¹

01 Aug 1992-The American Statistician

TL;DR: Kernel and nearest-neighbor regression estimators are local versions of univariate location estimators, and so they can readily be introduced to beginning students and consulting clients who are familiar with such summaries as the sample mean and median.

...read moreread less

Abstract: Nonparametric regression is a set of techniques for estimating a regression curve without making strong assumptions about the shape of the true regression function. These techniques are therefore useful for building and checking parametric models, as well as for data description. Kernel and nearest-neighbor regression estimators are local versions of univariate location estimators, and so they can readily be introduced to beginning students and consulting clients who are familiar with such summaries as the sample mean and median.

...read moreread less

4,298 citations

Collapse

Performance

Metrics

3,952

Papers

196,149

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	23
2022	73
2021	85
2020	64
2019	101
2018	52