scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A comparison of methods to combine speed and accuracy measures of performance: A rejoinder on the binning procedure

01 Apr 2017-Behavior Research Methods (Springer US)-Vol. 49, Iss: 2, pp 653-673
TL;DR: The present paper compares the relative utility of seven integrated performance measures, namely four variations on a binning procedure that weights response times of correct and incorrect trials differently, and three measures that combine averaged speed and accuracy scores and concludes that these combination measures are useful provided that thespeed and accuracy data are also inspected.
Abstract: In cognitive research, speed and accuracy are two important aspects of performance. When analyzed separately, these performance variables sometimes lead to contradictory conclusions about the effect of a manipulation. To avoid such conflicts, several measures that integrate speed and accuracy have been proposed, but the added value of using such measures remains unclear. The present paper compares the relative utility of seven integrated performance measures, namely four variations on a binning procedure that weights response times of correct and incorrect trials differently, and three measures that combine averaged speed and accuracy scores. The properties of these integrated measures were explored in three simulation studies. The first study compared three binning measures and showed that one measure failed to grasp the performance difference between two conditions. The second study showed that the sampling distributions of the measures were symmetric, except for a strong skewness on the rate correct score. The third study varied the trade-off and the effect sizes of speed and accuracy in four different combinations of size and direction of speed and accuracy effects. These studies highlighted some further shortcomings of the binning measures. The combination measures performed well, but linear integration of speed and accuracy and rate correct score were most efficient in detecting effects and accounting for a larger proportion of the variance. The paper concludes that these combination measures are useful provided that the speed and accuracy data are also inspected.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Analysis of simulated data generated with the standard diffusion model shows that IES, RCS, and LISAS put unequal weights on speed and accuracy, depending on the accuracy level, and that these measures are actually very sensitive to speed–accuracy trade-offs.
Abstract: In psychological experiments, participants are typically instructed to respond as fast as possible without sacrificing accuracy. How they interpret this instruction and, consequently, which speed-accuracy trade-off they choose might vary between experiments, between participants, and between conditions. Consequently, experimental effects can appear unpredictably in either RTs or error rates (i.e., accuracy). Even more problematic, spurious effects might emerge that are actually due only to differential speed-accuracy trade-offs. An often-suggested solution is the inverse efficiency score (IES; Townsend & Ashby, 1983), which combines speed and accuracy into a single score. Alternatives are the rate-correct score (RCS; Woltz & Was, 2006) and the linear-integrated speed-accuracy score (LISAS; Vandierendonck, 2017, 2018). We report analyses on simulated data generated with the standard diffusion model (Ratcliff, 1978) showing that IES, RCS, and LISAS put unequal weights on speed and accuracy, depending on the accuracy level, and that these measures are actually very sensitive to speed-accuracy trade-offs. These findings stand in contrast to a fourth alternative, the balanced integration score (BIS; Liesefeld, Fu, & Zimmer, 2015), which was devised to integrate speed and accuracy with equal weights. Although all of the measures maintain "real" effects, only BIS is relatively insensitive to speed-accuracy trade-offs.

140 citations

Journal ArticleDOI
TL;DR: Evaluated issues—unreliability and sensitivity to speed–accuracy interactions—are identified and focus on and encourage researchers comparing individuals of differing cognitive and developmental levels to strongly consider using alternatives in lieu of RT, specifically RT difference scores.
Abstract: Reaction time is believed to be a good indicator of the speed and efficiency of mental processes and is a ubiquitous variable in the behavioral sciences. Despite this popularity, there are numerous issues associated with using reaction time (RT), specifically in differential and developmental research. Here, we identify and focus on two main problems-unreliability and sensitivity to speed-accuracy interactions. The use of difference scores is a primary factor that leads to many RT measures having demonstrably low reliability, and RT measures in general often do not properly account for speed-accuracy interactions. Both factors jeopardize the validity and interpretability of results based on RT. Here, we evaluate conceptually and empirically how these issues affect individual differences research. Although the empirical evidence we provide are primarily within the domains of attention control and task switching, we highlight examples from various other areas of psychological inquiry. We also discuss many of the statistical and methodological alternatives available to researchers conducting correlational studies. Ultimately, we encourage researchers comparing individuals of differing cognitive and developmental levels to strongly consider using these alternatives in lieu of RT, specifically RT difference scores. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

131 citations


Cites background from "A comparison of methods to combine ..."

  • ...See Hughes et al. (2014) or Vandierendonck (2017) for a more thorough explanation....

    [...]

  • ...In confirming this, we hoped more researchers would become aware of these techniques and continue to test and develop them, just as Vandierendonck (2017, 2018) and Liesefeld and Janczyk (2018) did....

    [...]

  • ...Vandierendonck (2017) assessed seven integrative scoring techniques for cognitive data, including multiple binning procedures, the inverse efficiency score, the rate residual score, and his own linear integrated speed–accuracy score....

    [...]

  • ...Recently, researchers have suggested that one solution to speed– accuracy interactions is to meaningfully combine them into a single metric (e.g., Draheim et al., 2016; Hughes et al., 2014; Liesefeld, Fu, & Zimmer, 2015; Liesefeld & Janczyk, 2018; Vandierendonck, 2017, 2018)....

    [...]

  • ...…both within the measurement of attention and task switching (e.g., Friedman & Miyake, 2004; Hughes et al., 2014; Rey-Mermet et al., 2018; Vandierendonck, 2017, 2018) as well as other areas and disciplines (e.g., Collins, 1996; DeGutis et al., 2013; Edwards & Parry, 1993; Gottman &…...

    [...]

Journal ArticleDOI
Tian Gao1, Tian Zhang1, Ling Zhu1, Yanan Gao, Ling Qiu 
TL;DR: The findings showed that as the popularity of a natural environment increased, so did the benefits of human health and well-being, and VR technology may be utilized as a possible surrogate measure to real scenes in evaluating human physiological and psychological restoration in the future.
Abstract: Accumulated evidence claims that urban green spaces (UGS) have a positive impact on the physical and mental health of humans. However, little information is available to clearly reveal what the most important driving factors are for human psychophysiological restoration. In order to unveil this uncertainty, this study employed virtual reality (VR) technology to investigate the physiological (electroencephalogram, EEG), and psychological (attention, positive mood, negative mood) responses and individual preferences for different urban environments. Participants (120) were recruited and randomly assigned to experience six different types of environments varying in land use and vegetation structures, which were: Grey space, blue space, open green space, partly open green space, partly closed green space, and closed green space. The results showed that the experience of the six environmental types through VR devices had positive restorative effects on the individuals’ attentional fatigue and negative mood; however, all the participants obtained the highest levels of physiological stress restoration when asked to close their eyes for relaxation. The physiological measurements of the EEG showed no significant differences among the selected types of environments. Meanwhile, the results of the psychological measures suggested that only negative mood showed significant differences of change among the six types of environments, and while the partly open green space had the most positive effect on negative mood, the closed green space had the worst. The blue space and partly closed green space received higher recreational preference ratings than the other four environments, while the closed green space received the lowest recreational preference rating. Moreover, the findings showed that there was a strong positive correlation between people’s preferences and the improvement of their positive mood. This indicated that as the popularity of a natural environment increased, so did the benefits of human health and well-being. In addition, this study shows that VR technology may be utilized as a possible surrogate measure to real scenes in evaluating human physiological and psychological restoration in the future. The present findings can provide the theoretical basis and practical guidance for future optimal planning of urban restorative environments.

89 citations


Cites result from "A comparison of methods to combine ..."

  • ...Fourth, as for measurement of attentional fatigue, the present study only measured the number of Stroop task items recorded within a timeframe, which could obtain less variability, and thus less of an effect, compared with those measured Stroop versions that integrated speed and accuracy [61]....

    [...]

Journal ArticleDOI
TL;DR: The existence of executive control as a psychometric construct and the assumption that WMC and gF are closely related to the ability to control ongoing thoughts and actions are called into question.
Abstract: In the last two decades, individual-differences research has put forward 3 cognitive psychometric constructs: executive control (i.e., the ability to monitor and control ongoing thoughts and actions), working memory capacity (WMC, i.e., the ability to retain access to a limited amount of information in the service of complex tasks), and fluid intelligence (gF, i.e., the ability to reason with novel information). These constructs have been proposed to be closely related, but previous research failed to substantiate a strong correlation between executive control and the other two constructs. This might arise from the difficulty in establishing executive control as a latent variable and from differences in the way the 3 constructs are measured (i.e., executive control is typically measured through reaction times, whereas WMC and gF are measured through accuracy). The purpose of the present study was to overcome these difficulties by measuring executive control through accuracy. Despite good reliabilities of all measures, structural equation modeling identified no coherent factor of executive control. Furthermore, WMC and gF-modeled as distinct but correlated factors-were unrelated to the individual measures of executive control. Hence, measuring executive control through accuracy did not overcome the difficulties of establishing executive control as a latent variable. These findings call into question the existence of executive control as a psychometric construct and the assumption that WMC and gF are closely related to the ability to control ongoing thoughts and actions. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

77 citations


Cites methods from "A comparison of methods to combine ..."

  • ...There is, however, no principled way of combining separate measures of RT and accuracy into a single score (Bruyer & Brysbaert, 2011; Dennis & Evans, 1996; Hughes, Linck, Bowles, Koeth, & Bunting, 2014; Vandierendonck, 2017)....

    [...]

Journal ArticleDOI
12 Jan 2018
TL;DR: It is concluded that while the rate correct score is better avoided, and the usage of the inverse efficiency score should be restricted to data with low overall error rates, the linear integrated speed-accuracy score proves to be valid.
Abstract: Speed and accuracy of performance are central to many theoretical accounts of cognitive processing. In recent years, several integrated performance measures have been proposed. A comparative study of the available measures [Vandierendonck, A. (2017). A comparison of methods to combine speed and accuracy measures of performance: A rejoinder on the binning procedure. Behavior Research Methods, 49, 653-673. DOI: https://doi.org/10.3758/s13428-016-0721-5] concluded that three of the measures, namely inverse efficiency score, rate correct score, and linear integrated speed-accuracy score achieved a balanced integration of speed and accuracy. As a follow-up on that study, these three measures were examined in data analyses from 13 (published and unpublished) experiments in the context of task switching. The correlations of the effect sizes in these integrated scores with the effect sizes obtained in latency and accuracy were high, but varied across the three integrated measures. The efficiency to detect effects supported by the speed and accuracy data was examined by means of signal detection analyses. The three measures efficiently detected effects present in either speed or accuracy, but the rate correct score was less efficient than the other two measures and it signalled a larger number of strong effects unsupported by the speed and accuracy data. It is concluded that while the rate correct score is better avoided, and the usage of the inverse efficiency score should be restricted to data with low overall error rates, the linear integrated speed-accuracy score proves to be valid.

72 citations


Cites background or methods from "A comparison of methods to combine ..."

  • ...As a drawback, this score neither reflects any contribution of variance in the RTs of the repetition condition nor of the errors committed in the repetition condition, but fortunately, other variants of the bin score that overcome these disadvantages are possible (Vandierendonck, 2017)....

    [...]

  • ..., 2014), even though its statistical distribution was skewed at the sample level (Vandierendonck, 2017)....

    [...]

  • ...In the remainder of the paper, these data sets, which include as well published as unpublished data, will be used to assess the utility of the three integrated measures that were found to be acceptable in the comparative Monte Carlo simulations of Vandierendonck (2017), namely LISAS, IES and RCS....

    [...]

  • ...…SAT was included in a Monte Carlo study of the efficiency of integrated measures and showed that when the trade-off was unrelated to the experimental factor, estimates of its effect on speed, accuracy and the integrated measures were not distorted by the trade-off (Vandierendonck, 2017, Study 3)....

    [...]

  • ...Thus far, also some doubts have been raised about the general usefulness of IES (Bruyer & Brysbaert, 2011; Hughes et al., 2014; Vandierendonck, 2017), while RCS was considered trustworthy (Hughes et al....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is concluded that recent theories placing the explanatory weight on parallel processing of the irrelevant and the relevant dimensions are likely to be more sucessful than are earlier theories attempting to locate a single bottleneck in attention.
Abstract: The literature on interference in the Stroop Color-Word Task, covering over 50 years and some 400 studies, is organized and reviewed. In so doing, a set of 18 reliable empirical finding is isolated that must be captured by any successful theory of the Stroop effect. Existing theoretical positions are summarized and evaluated in view of this critical evidence and the 2 major candidate theories ―relative speed of processing and automaticity of reading― are found to be wanting. It is concluded that recent theories placing the explanatory weight on parallel processing of the irrelevant and the relevant dimensions are likely to be more sucessful than are earlier theories attempting to locate a single bottleneck in attention

5,172 citations


"A comparison of methods to combine ..." refers background in this paper

  • ...…are typically performed slower and are ocen more error-prone than tasks with compaVble sVmuli and/or responses (Kornblum, Hasbroucq, & Osman, 1990; MacLeod, 1991; Stroop, 1935); similarly, situaVons requiring task switching lead to slower responding and/or increased error rates (e.g., Kiesel et…...

    [...]

Journal ArticleDOI
TL;DR: The model provides a systematic account of SRC effects, a taxonomy of simple performance tasks that were hitherto thought to be unrelated, and suggestive parallels between these tasks and the experimental paradigms that have traditionally been used to study attentional, controlled, and automatic processes.
Abstract: The classic problem of stimulus-response (S-R) compatibility (SRC) is addressed. A cognitive model is proposed that views the stimulus and response sets in S-R ensembles as categories with dimensions that may or may not overlap. If they do overlap, the task may be compatible or incompatible, depending on the assigned S-R mapping. If they do not overlap, the task is noncompatible regardless of the assigned mapping. The overlapping dimensions may be relevant or not. The model provides a systematic account of SRC effects, a taxonomy of simple performance tasks that were hitherto thought to be unrelated, and suggestive parallels between these tasks and the experimental paradigms that have traditionally been used to study attentional, controlled, and automatic processes. In this article, we address the classic problem of stimulusresponse (S-R) compatibility (SRC). A model is proposed that attempts to provide a systematic account of performance in highly compatible, incompatible, and noncompatible tasks. At the core of our model is the idea that when a particular S-R ensemble produces either high or low compatibility effects, it is because the stimulus and response sets in the ensemble have properties in common, and elements in the stimulus set automatically activate corresponding elements in the response set. Noncompatible tasks are those in which the stimulus and response sets have nothing in common. If the activated response is the required one, it will be executed rapidly and correctly; if it is not, then it will be relatively slow and error prone. Whether a particular S-R ensemble will produce compatibility effects is often quite easy to determine because of the relationship between the stimulus and response sets. In the part of the model that treats the representational aspects of the problem, we postulate that this relationship is based on the commonality, simi

1,785 citations

Journal ArticleDOI
TL;DR: The task-switching paradigm offers enormous possibilities to study cognitive control as well as task interference, and the current review provides an overview of recent research on both topics.
Abstract: The task-switching paradigm offers enormous possibilities to study cognitive control as well as task interference. The current review provides an overview of recent research on both topics. First, we review different experimental approaches to task switching, such as comparing mixed-task blocks with single-task blocks, predictable task-switching and task-cuing paradigms, intermittent instructions, and voluntary task selection. In the 2nd part, we discuss findings on preparatory control mechanisms in task switching and theoretical accounts of task preparation. We consider preparation processes in two-stage models, consider preparation as an all-or-none process, address the question of whether preparation is switch-specific, reflect on preparation as interaction of cue encoding and memory retrieval, and discuss the impact of verbal mediation on preparation. In the 3rd part, we turn to interference phenomena in task switching. We consider proactive interference of tasks and inhibition of recently performed tasks indicated by asymmetrical switch costs and n-2 task-repetition costs. We discuss stimulus-based interference as a result of stimulus-based response activation and stimulus-based task activation, and response-based interference because of applying bivalent rather than univalent responses, response repetition effects, and carryover of response selection and execution. In the 4th and final part, we mention possible future research fields.

1,223 citations