scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Research Commentary---Too Big to Fail: Large Samples and the p-Value Problem

TL;DR: This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay.
Abstract: The Internet has provided IS researchers with the opportunity to conduct studies with extremely large samples, frequently well over 10,000 observations. There are many advantages to large samples, but researchers using statistical inference must be aware of the p-value problem associated with them. In very large samples, p-values go quickly to zero, and solely relying on p-values can lead the researcher to claim support for results of no practical significance. In a survey of large sample IS research, we found that a significant number of papers rely on a low p-value and the sign of a regression coefficient alone to support their hypotheses. This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay. We believe that addressing the p-value problem will increase the credibility of large sample IS research as well as provide more insights for readers.
Citations
More filters
Journal ArticleDOI
TL;DR: A first step toward an inclusive big data research agenda for IS is offered by focusing on the interplay between big data’s characteristics, the information value chain encompassing people-process-technology, and the three dominant IS research traditions (behavioral, design, and economics of IS).
Abstract: Big data has received considerable attention from the information systems (IS) discipline over the past few years, with several recent commentaries, editorials, and special issue introductions on the topic appearing in leading IS outlets. These papers present varying perspectives on promising big data research topics and highlight some of the challenges that big data poses. In this editorial, we synthesize and contribute further to this discourse. We offer a first step toward an inclusive big data research agenda for IS by focusing on the interplay between big data’s characteristics, the information value chain encompassing people-process-technology, and the three dominant IS research traditions (behavioral, design, and economics of IS). We view big data as a disruption to the value chain that has widespread impacts, which include but are not limited to changing the way academics conduct scholarly work. Importantly, we critically discuss the opportunities and challenges for behavioral, design science, and economics of IS research and the emerging implications for theory and methodology arising due to big data’s disruptive effects.

543 citations


Cites background from "Research Commentary---Too Big to Fa..."

  • ...In addition to statistical significance and co-efficient signs, one may also need to consider effect sizes and variance when testing hypotheses on big data sets (Lin et al., 2013; George et al., 2014)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors draw on information economics to examine when signals and endorsements obtained from multiple information sources enhance or diminish one another's effects, and propose that signals from different information sources can have different effects.
Abstract: This article draws on information economics to examine when signals and endorsements obtained from multiple information sources enhance or diminish one another's effects. We propose that signals th...

434 citations

Journal ArticleDOI
TL;DR: In this article, a two-step structural equation modeling approach was applied to test both the measurement and the structural model to identify major antecedents of everyday green purchasing behavior and for determining their relative importance.
Abstract: Purpose – The theory of planned behavior (TPB) served as a framework for identifying major antecedents of everyday green purchasing behavior and for determining their relative importance. Design/methodology/approach – The German market research institute GfK provided data (n = 12,113) from their 2012 household panel survey. A two-step structural equation modeling approach was applied to test both the measurement and the structural model. Findings – Willingness to pay (WTP) was the strongest predictor of green purchasing behavior, followed by personal norms. The impact of attitude is insignificant. This implies an attitude – behavior gap. Research limitations/implications – Individuals overestimate their self-reported WTP and behavior, which suggests that the share of explained variance is in reality lower. It has to be doubted whether consumers are objectively able to judge products by their environmental impact. Even if consumers are willing to buy a “greener” product, their subjective evaluation might b...

401 citations

01 Jan 2008

274 citations

Journal ArticleDOI
TL;DR: In this article, the importance of sample size and its relationship to effect size (ES) and statistical significance is discussed. But, there is no straightforward way of calculating the effective sample size for reaching an accurate conclusion, and use of a statistically incorrect sample size may lead to inadequate results in both clinical and laboratory studies.
Abstract: Calculating the sample size in scientific studies is one of the critical issues as regards the scientific contribution of the study. The sample size critically affects the hypothesis and the study design, and there is no straightforward way of calculating the effective sample size for reaching an accurate conclusion. Use of a statistically incorrect sample size may lead to inadequate results in both clinical and laboratory studies as well as resulting in time loss, cost, and ethical problems. This review holds two main aims. The first aim is to explain the importance of sample size and its relationship to effect size (ES) and statistical significance. The second aim is to assist researchers planning to perform sample size estimations by suggesting and elucidating available alternative software, guidelines and references that will serve different scientific purposes.

250 citations

References
More filters
Journal ArticleDOI
Jacob Cohen1
TL;DR: The application of statistics to psychology and the other sociobiomedical sciences has been studied extensively as discussed by the authors, including the principles "less is more" (fewer variables, more highly targeted issues, sharp rounding off), "simple is better" (graphic representation, unit weighting for linear composites), and "some things you learn aren't so."
Abstract: This is an account of what I have learned (so far) about the application of statistics to psychology and the other sociobiomedical sciences. It includes the principles "less is more" (fewer variables, more highly targeted issues, sharp rounding off), "simple is better" (graphic representation, unit weighting for linear composites), and "some things you learn aren't so." I have learned to avoid the many misconceptions that surround Fisherian null hypothesis testing. I have also learned the importance of power analysis and the determination of just how big (rather than how statistically significant) are the effects that we study. Finally, I have learned that there is no royal road to statistical induction, that the informed judgment of the investigator is the crucial element in the interpretation of data, and that things take time.

1,764 citations

Journal ArticleDOI
TL;DR: The distinction between explanatory and predictive models is discussed in this paper, and the practical implications of the distinction to each step in the model- ing process are discussed as well as a discussion of the differences that arise in the process of modeling for an explanatory ver- sus a predictive goal.
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal ex- planation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and pre- diction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the phi- losophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory ver- sus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the model- ing process.

1,747 citations

Journal ArticleDOI
TL;DR: It is suggested that identity-relevant information about reviewers shapes community members' judgment of products and reviews and shows that shared geographical location increases the relationship between disclosure and product sales, thus highlighting the important role of geography in electronic commerce.
Abstract: Consumer-generated product reviews have proliferated online, driven by the notion that consumers' decision to purchase or not purchase a product is based on the positive or negative information about that product they obtain from fellow consumers. Using research on information processing (Chaiken 1980) as a foundation, we suggest that in the context of an online community, reviewer disclosure of identity-descriptive information is used by consumers to supplement or replace product information when making purchase decisions and evaluating the helpfulness of online reviews. Using a unique dataset based on both chronologically compiled ratings as well as reviewer characteristics for a given set of products and geographical location-based purchasing behavior from Amazon, we provide evidence that community norms are an antecedent to reviewer disclosure of identity-descriptive information. Amazon members rate reviews containing identity-descriptive information more positively, and the prevalence of reviewer disclosure of identity information is associated with increases in subsequent online product sales. In addition, we show that when reviewers are from a particular geographic location, subsequent product sales are higher in that region, thus highlighting the important role of geography in electronic commerce. Taken together, our results suggest that identity-relevant information about reviewers shapes community members' judgment of products and reviews. Implications for research on the relationship between online reviews and sales, peer recognition systems, and conformity to online community norms are discussed.

1,377 citations


"Research Commentary---Too Big to Fa..." refers background or methods in this paper

  • ...Too Big to Fail: Large Samples and the p-Value Problem Mingfeng Lin, Henry C. Lucas Jr, Galit Shmueli To cite this article: Mingfeng Lin, Henry C. Lucas Jr, Galit Shmueli (2013) Research Commentary—...

    [...]

  • ...However, to our knowledge, this approach has not been used and there have been no proposed rules of thumb in terms of how such adjustments should be made....

    [...]

  • ...…involve a categorical variable can be studied by splitting the data into the separate categories and fitting separate models (Asvanund et al. 2004, Forman et al. 2008, Gefen and Carmel 2008, Ghose 2009, Gordon et al. 2010, Li and Hitt 2008, Mithas and Lucas 2010, Overby and Jap 2009, Yao et al.…...

    [...]

  • ...A large sample also enables the researcher to incorporate many control variables into the model without worrying about power loss (Forman et al. 2008, Ghose 2009, Mithas and Lucas 2010), thereby reducing concerns for alternative explanations and strengthen the main arguments if results remain…...

    [...]

  • ...Fo r pe rs on al u se o nl y, a ll ri gh ts r es er ve d. conducted robustness/sensitivity analysis, modifying the independent measures (Forman et al. 2008), or the variable structure (Brynjolfsson et al. 2009, Ghose 2009)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors used a unique data set based on both chronologically compiled ratings as well as reviewer characteristics for a given set of products and geographical location-based purchasing behavior from Amazon, and provided evidence that community norms are an antecedent to reviewer disclosure of identity-descriptive information.
Abstract: Consumer-generated product reviews have proliferated online, driven by the notion that consumers' decision to purchase or not purchase a product is based on the positive or negative information about that product they obtain from fellow consumers. Using research on information processing as a foundation, we suggest that in the context of an online community, reviewer disclosure of identity-descriptive information is used by consumers to supplement or replace product information when making purchase decisions and evaluating the helpfulness of online reviews. Using a unique data set based on both chronologically compiled ratings as well as reviewer characteristics for a given set of products and geographical location-based purchasing behavior from Amazon, we provide evidence that community norms are an antecedent to reviewer disclosure of identity-descriptive information. Online community members rate reviews containing identity-descriptive information more positively, and the prevalence of reviewer disclosure of identity information is associated with increases in subsequent online product sales. In addition, we show that shared geographical location increases the relationship between disclosure and product sales, thus highlighting the important role of geography in electronic commerce. Taken together, our results suggest that identity-relevant information about reviewers shapes community members' judgment of products and reviews. Implications for research on the relationship between online word-of-mouth WOM and sales, peer recognition and reputation systems, and conformity to online community norms are discussed.

1,233 citations

Book
30 Mar 2006
TL;DR: McCoch as discussed by the authors provides a unified, in-depth, readable introduction to the multipredictor regression methods most widely used in biostatistics: linear models for continuous outcomes, logistic models for binary outcomes, the Cox model for right-censored survival times, repeated-measures models for longitudinal and hierarchical outcomes, and generalized linear model for counts and other outcomes.
Abstract: This new book provides a unified, in-depth, readable introduction to the multipredictor regression methods most widely used in biostatistics: linear models for continuous outcomes, logistic models for binary outcomes, the Cox model for right-censored survival times, repeated-measures models for longitudinal and hierarchical outcomes, and generalized linear models for counts and other outcomes. Treating these topics together takes advantage of all they have in common. The authors point out the many-shared elements in the methods they present for selecting, estimating, checking, and interpreting each of these models. They also show that these regression methods deal with confounding, mediation, and interaction of causal effects in essentially the same way. The examples, analyzed using Stata, are drawn from the biomedical context but generalize to other areas of application. While a first course in statistics is assumed, a chapter reviewing basic statistical methods is included. Some advanced topics are covered but the presentation remains intuitive. A brief introduction to regression analysis of complex surveys and notes for further reading are provided. For many students and researchers learning to use these methods, this one book may be all they need to conduct and interpret multipredictor regression analyses. The authors are on the faculty in the Division of Biostatistics, Department of Epidemiology and Biostatistics, University of California, San Francisco, and are authors or co-authors of more than 200 methodological as well as applied papers in the biological and biomedical sciences. The senior author, Charles E. McCulloch, is head of the Division and author of Generalized Linear Mixed Models (2003), Generalized, Linear, and Mixed Models (2000), and Variance Components (1992). From the reviews: "This book provides a unified introduction to the regression methods listed in the title...The methods are well illustrated by data drawn from medical studies...A real strength of this book is the careful discussion of issues common to all of the multipredictor methods covered." Journal of Biopharmaceutical Statistics, 2005 "This book is not just for biostatisticians. It is, in fact, a very good, and relatively nonmathematical, overview of multipredictor regression models. Although the examples are biologically oriented, they are generally easy to understand and follow...I heartily recommend the book" Technometrics, February 2006 "Overall, the text provides an overview of regression methods that is particularly strong in its breadth of coverage and emphasis on insight in place of mathematical detail. As intended, this well-unified approach should appeal to students who learn conceptually and verbally." Journal of the American Statistical Association, March 2006

1,117 citations


"Research Commentary---Too Big to Fa..." refers background in this paper

  • ...However, when dealing with models like the probit, one has to specify whether an effect size is being calculated Table 2 Interpreting Effect Sizes for Common Regression Models (Vittinghoff et al. 2005) Functional form Effect size interpretation (where is the coefficient) Linear f y = f 4x5 A unit change in x is associated with an average change of units in y . ln4y 5= f 4x5 For a unit increase in x, y increases on average by the percentage 100(e − 1) (û 100 when < 001). y = f 4ln4x55 For a 1% increase in x, y increases on average by ln410015 × 4û /100). ln4y 5= f 4ln4x55 For a 1% increase in x, y increases on average by the percentage 100(e ∗ ln410015 − 1) (û when < 001)....

    [...]

  • ...…like the probit, one has to specify whether an effect size is being calculated Table 2 Interpreting Effect Sizes for Common Regression Models (Vittinghoff et al. 2005) Functional form Effect size interpretation (where is the coefficient) Linear f y = f 4x5 A unit change in x is associated…...

    [...]

Trending Questions (1)
What are the positives of large samples in research?

Large samples in research provide researchers with more statistical power, increased generalizability of findings, and the ability to detect smaller effect sizes.