scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Experimental design

TL;DR: Experimental design is reviewed here for broad classes of data collection and analysis problems, including: fractioning techniques based on orthogonal arrays, Latin hypercube designs and their variants for computer experimentation, efficient design for data mining and machine learning applications, and sequential design for active learning.
Abstract: Maximizing data information requires careful selection, termed design, of the points at which data are observed. Experimental design is reviewed here for broad classes of data collection and analysis problems, including: fractioning techniques based on orthogonal arrays, Latin hypercube designs and their variants for computer experimentation, efficient design for data mining and machine learning applications, and sequential design for active learning. © 2012 Wiley Periodicals, Inc. © 2012 Wiley Periodicals, Inc.
Citations
More filters
Journal ArticleDOI
TL;DR: The pbkrtest package as discussed by the authors implements two alternatives to such approximate?2 tests: the package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal.
Abstract: When testing for reduction of the mean value structure in linear mixed models, it is common to use an asymptotic ?2 test. Such tests can, however, be very poor for small and moderate sample sizes. The pbkrtest package implements two alternatives to such approximate ?2 tests: The package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal. The implementation is focused on linear mixed models with independent residual errors. In addition to describing the methods and aspects of their implementation, the paper also contains several examples and a comparison of the various methods.

1,072 citations

Journal ArticleDOI
TL;DR: The main challenges raised by imbalanced domains are discussed, a definition of the problem is proposed, the main approaches to these tasks are described, and a taxonomy of the methods are proposed.
Abstract: Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

730 citations


Cites methods from "Experimental design"

  • ...More recently, a new experimental design was proposed [Batista et al. 2012; Prati et al. 2014] to overcome the difficulty in assessing the capability of recovering from the losses in performance caused by imbalance....

    [...]

Journal ArticleDOI
TL;DR: This study provides a systematic examination of F‐test robustness to violations of normality in terms of Type I error, considering a wide variety of distributions commonly found in the health and social sciences.
Abstract: BACKGROUND: The robustness of F-test to non-normality has been studied from the 1930s through to the present day. However, this extensive body of research has yielded contradictory results, there being evidence both for and against its robustness. This study provides a systematic examination of F-test robustness to violations of normality in terms of Type I error, considering a wide variety of distributions commonly found in the health and social sciences. METHOD: We conducted a Monte Carlo simulation study involving a design with three groups and several known and unknown distributions. The manipulated variables were: Equal and unequal group sample sizes; group sample size and total sample size; coefficient of sample size variation; shape of the distribution and equal or unequal shapes of the group distributions; and pairing of group size with the degree of contamination in the distribution. RESULTS: The results showed that in terms of Type I error the F-test was robust in 100% of the cases studied, independently of the manipulated conditions. // Antecedentes: las consecuencias de la violacion de la normalidad sobre la robustez del estadistico F han sido estudiadas desde 1930 y siguen siendo de interes en la actualidad. Sin embargo, aunque la investigacion ha sido extensa, los resultados son contradictorios, encontrandose evidencia a favor y en contra de su robustez. El presente estudio presenta un analisis sistematico de la robustez del estadistico F en terminos de error de Tipo I ante violaciones de la normalidad, considerando una amplia variedad de distribuciones frecuentemente encontradas en ciencias sociales y de la salud. METODO: se ha realizado un estudio de simulacion Monte Carlo considerando un diseno de tres grupos y diferentes distribuciones conocidas y no conocidas. Las variables manipuladas han sido: igualdad o desigualdad del tamano de los grupos, tamano muestral total y de los grupos; coeficiente de variacion del tamano muestral; forma de la distribucion e igualdad o desigualdad de la forma en los grupos; y emparejamiento entre el tamano muestral con el grado de contaminacion en la distribucion. RESULTADOS: los resultados muestran que el estadistico F es robusto en terminos de error de Tipo I en el 100% de los casos estudiados, independientemente de las condiciones manipuladas.

721 citations


Cites background or result from "Experimental design"

  • ...…studied here, with those classical handbooks which conclude that F-test is only robust if the departure from normality is moderate (Keppel, 1982; Montgomery, 1991), the populations have the same distributional shape (Kirk, 2013), and the sample sizes are large and equal (Winer et al., 1991)....

    [...]

  • ...Based on most early studies, many classical handbooks on research methods in education and psychology draw the following conclusions: Moderate departures from normality are of little concern in the fi xed-effects analysis of variance (Montgomery, 1991); violations of normality do not constitute a serious problem, unless the violations are especially severe (Keppel, 1982); F-test is robust to moderate departures from normality when sample sizes are reasonably large and are equal (Winer, Brown, & Michels, 1991); and researchers do not need to be concerned about moderate departures from normality provided that the populations are homogeneous in form (Kirk, 2013)....

    [...]

  • ...By contrast, however, our results do not concur, at least for the conditions studied here, with those classical handbooks which conclude that F-test is only robust if the departure from normality is moderate (Keppel, 1982; Montgomery, 1991), the populations have the same distributional shape (Kirk, 2013), and the sample sizes are large and equal (Winer et al....

    [...]

  • ...…F-test is robust to moderate departures from normality when sample sizes are reasonably large and are equal (Winer, Brown, & Michels, 1991); and researchers do not need to be concerned about moderate departures from normality provided that the populations are homogeneous in form (Kirk, 2013)....

    [...]

Journal ArticleDOI
TL;DR: A definition of effect size is proposed, which is purposely more inclusive than the way many have defined and conceptualized effect size, and it is unique with regard to linking effect size to a question of interest.
Abstract: The call for researchers to report and interpret effect sizes and their corresponding confidence intervals has never been stronger. However, there is confusion in the literature on the definition of effect size, and consequently the term is used inconsistently. We propose a definition for effect size, discuss 3 facets of effect size (dimension, measure/index, and value), outline 10 corollaries that follow from our definition, and review ideal qualities of effect sizes. Our definition of effect size is general and subsumes many existing definitions of effect size. We define effect size as a quantitative reflection of the magnitude of some phenomenon that is used for the purpose of addressing a question of interest. Our definition of effect size is purposely more inclusive than the way many have defined and conceptualized effect size, and it is unique with regard to linking effect size to a question of interest. Additionally, we review some important developments in the effect size literature and discuss the importance of accompanying an effect size with an interval estimate that acknowledges the uncertainty with which the population value of the effect size has been estimated. We hope that this article will facilitate discussion and improve the practice of reporting and interpreting effect sizes.

689 citations


Cites background from "Experimental design"

  • ...Effect size has often been defined relative to the value of the null hypothesis for a corresponding NHST (Berry & Mielke, 2002; Henson, 2006; Kirk, 1996, 2002)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors examined whether reappraisalating stress-induced arousal could improve cardiovascular outcomes and decrease attentional bias for emotionally negative information, and found that participants who were instructed to reappraise their arousal exhibited more adaptive cardiovascular stress responses.
Abstract: Researchers have theorized that changing the way we think about our bodily responses can improve our physiological and cognitive reactions to stressful events. However, the underlying processes through which mental states improve downstream outcomes are not well-understood. To this end, we examined whether reappraising stress-induced arousal could improve cardiovascular outcomes and decrease attentional bias for emotionally-negative information. Participants were randomly assigned to either a reappraisal condition in which they were instructed to think about their physiological arousal during a stressful task as functional and adaptive, or to one of two control conditions: attention reorientation and no instructions. Relative to controls, participants instructed to reappraise their arousal exhibited more adaptive cardiovascular stress responses – increased cardiac efficiency and lower vascular resistance – and decreased attentional bias. Thus, reappraising arousal shows physiological and cognitive benefits. Implications for health and potential clinical applications are discussed.

347 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Journal ArticleDOI
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

9,312 citations

Journal ArticleDOI
TL;DR: In this paper, two sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies and they are shown to be improvements over simple sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.
Abstract: Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.

8,328 citations

Journal ArticleDOI
TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

8,314 citations

Trending Questions (1)
Cany you give some examples of Experimental studies 2x2 between subject design,

The paper does not provide specific examples of experimental studies using a 2x2 between-subject design.