Research Commentary---Too Big to Fail: Large Samples and the p-Value Problem

doi:10.1287/ISRE.2013.0480

Journal ArticleDOI

Research Commentary---Too Big to Fail: Large Samples and the p-Value Problem

Mingfeng Lin, +2 more

- 01 Dec 2013 -

Information Systems Research

- Vol. 24, Iss: 4, pp 906-917

TLDR

This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay.

Abstract:

The Internet has provided IS researchers with the opportunity to conduct studies with extremely large samples, frequently well over 10,000 observations. There are many advantages to large samples, but researchers using statistical inference must be aware of the p-value problem associated with them. In very large samples, p-values go quickly to zero, and solely relying on p-values can lead the researcher to claim support for results of no practical significance. In a survey of large sample IS research, we found that a significant number of papers rely on a low p-value and the sign of a regression coefficient alone to support their hypotheses. This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay. We believe that addressing the p-value problem will increase the credibility of large sample IS research as well as provide more insights for readers.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Does the weather influence sentencing? Empirical evidence from Czech data

Jakub Drápal, +1 more

- 01 Mar 2019 -

International Journal of Law Crime and J...

TL;DR: In this paper, the authors examined the extent to which the weather may influence sentencing and concluded that no meaningful unwarranted disparities in sentencing are caused by the weather in Prague, Czech Republic.

...read moreread less

Journal ArticleDOI

Investigation of the Effects of Exposure to Chemical Substances on Child Health

Miyuki Iwai-Shimada, +5 more

- 01 Jan 2019 -

Japanese journal of hygiene

TL;DR: The past and ongoing birth cohort studies carried out worldwide on the association between environmental exposure and children's health and development demonstrate that intervention to reduce exposure to certain chemicals whose exposure routes were well documented was also successful.

...read moreread less

Proceedings ArticleDOI

Evolving Probabilistically Significant Epistatic Classification Rules for Heterogeneous Big Datasets

John P. Hanley, +3 more

TL;DR: An age-layered evolutionary algorithm generates conjunctive clauses to model multivariate interactions in datasets that are too large to be analyzed using traditional methods such as logistic regression, thereby dramatically reducing the size of the search space for future analysis.

...read moreread less

Journal ArticleDOI

Comparison of data analysis procedures for real-time nanoparticle sampling data using classical regression and ARIMA models

Seunghon Ham, +8 more

- 12 Mar 2017 -

Journal of Applied Statistics

TL;DR: It is suggested that the ARIMA model could be used to process real-time monitoring data especially for non-stationary data, and averaging time setting is flexible depending on the data interval required to capture the effects of processes for occupational and environmental nano measurements.

...read moreread less

Continuous data imputation applied to massive instances

Jorge Zapatero Sánchez

TL;DR: In this paper, both, imputation and regression techniques are used to impute the missing values over an incomplete dataset containing 556,950 instances with 31 continuous variables to be imputed, and a validation framework is designed with the purpose of evaluating imputation quality under different amounts of incomplete instances and missing data mechanisms.

...read moreread less

Eric Vittinghoff, +3 more

TL;DR: McCoch as discussed by the authors provides a unified, in-depth, readable introduction to the multipredictor regression methods most widely used in biostatistics: linear models for continuous outcomes, logistic models for binary outcomes, the Cox model for right-censored survival times, repeated-measures models for longitudinal and hierarchical outcomes, and generalized linear model for counts and other outcomes.

...read moreread less

Collapse

Research Commentary---Too Big to Fail: Large Samples and the p-Value Problem

Citations

Does the weather influence sentencing? Empirical evidence from Czech data

Investigation of the Effects of Exposure to Chemical Substances on Child Health

Evolving Probabilistically Significant Epistatic Classification Rules for Heterogeneous Big Datasets

Comparison of data analysis procedures for real-time nanoparticle sampling data using classical regression and ARIMA models

Continuous data imputation applied to massive instances

References

Things I Have Learned (So Far).

To Explain or to Predict

Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets

Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models

Related Papers (5)

The ASA's Statement on p-Values: Context, Process, and Purpose

Statistical Power Analysis for the Behavioral Sciences

R: A language and environment for statistical computing.

A power primer.

Sample Selection Bias as a Specification Error

Trending Questions (1)