scispace - formally typeset
Search or ask a question

Showing papers in "Behavior Research Methods in 2016"


Journal ArticleDOI
TL;DR: In three online studies, participants from MTurk and collegiate populations participated in a task that included a measure of attentiveness to instructions (an instructional manipulation check: IMC), and MTurkers were more attentive to the instructions than were college students, even on novel IMCs.
Abstract: Participant attentiveness is a concern for many researchers using Amazon’s Mechanical Turk (MTurk). Although studies comparing the attentiveness of participants on MTurk versus traditional subject pool samples have provided mixed support for this concern, attention check questions and other methods of ensuring participant attention have become prolific in MTurk studies. Because MTurk is a population that learns, we hypothesized that MTurkers would be more attentive to instructions than are traditional subject pool samples. In three online studies, participants from MTurk and collegiate populations participated in a task that included a measure of attentiveness to instructions (an instructional manipulation check: IMC). In all studies, MTurkers were more attentive to the instructions than were college students, even on novel IMCs (Studies 2 and 3), and MTurkers showed larger effects in response to a minute text manipulation. These results have implications for the sustainable use of MTurk samples for social science research and for the conclusions drawn from research with MTurk and college subject pool samples.

1,346 citations


Journal ArticleDOI
TL;DR: Diagonally weighted least squares was less biased and more accurate than MLR in estimating the factor loadings across nearly every condition and the proposed model tended to be over-rejected by chi-square test statistics under both MLR and WLSMV in the condition of small sample size N = 200.
Abstract: In confirmatory factor analysis (CFA), the use of maximum likelihood (ML) assumes that the observed indicators follow a continuous and multivariate normal distribution, which is not appropriate for ordinal observed variables. Robust ML (MLR) has been introduced into CFA models when this normality assumption is slightly or moderately violated. Diagonally weighted least squares (WLSMV), on the other hand, is specifically designed for ordinal data. Although WLSMV makes no distributional assumptions about the observed variables, a normal latent distribution underlying each observed categorical variable is instead assumed. A Monte Carlo simulation was carried out to compare the effects of different configurations of latent response distributions, numbers of categories, and sample sizes on model parameter estimates, standard errors, and chi-square test statistics in a correlated two-factor model. The results showed that WLSMV was less biased and more accurate than MLR in estimating the factor loadings across nearly every condition. However, WLSMV yielded moderate overestimation of the interfactor correlations when the sample size was small or/and when the latent distributions were moderately nonnormal. With respect to standard error estimates of the factor loadings and the interfactor correlations, MLR outperformed WLSMV when the latent distributions were nonnormal with a small sample size of N = 200. Finally, the proposed model tended to be over-rejected by chi-square test statistics under both MLR and WLSMV in the condition of small sample size N = 200.

1,319 citations


Journal ArticleDOI
TL;DR: It was found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom, and the average prevalence of inconsistent p-values has been stable over the years or has declined.
Abstract: This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

304 citations


Journal ArticleDOI
TL;DR: The basic architecture of the psiTurk system is described and new users are introduced to the overall goals, which aims to reduce the technical hurdles for researchers developing online experiments while improving the transparency and collaborative nature of the behavioral sciences.
Abstract: Online data collection has begun to revolutionize the behavioral sciences. However, conducting carefully controlled behavioral experiments online introduces a number of new of technical and scientific challenges. The project described in this paper, psiTurk, is an open-source platform which helps researchers develop experiment designs which can be conducted over the Internet. The tool primarily interfaces with Amazon's Mechanical Turk, a popular crowd-sourcing labor market. This paper describes the basic architecture of the system and introduces new users to the overall goals. psiTurk aims to reduce the technical hurdles for researchers developing online experiments while improving the transparency and collaborative nature of the behavioral sciences.

214 citations


Journal ArticleDOI
TL;DR: This article distinguishes between “micro” and “macro” definitions of multicollinearity and shows how both sides of such a debate can be correct and clarifies the issues and reconciles the discrepancy.
Abstract: There seems to be confusion among researchers regarding whether it is good practice to center variables at their means prior to calculating a product term to estimate an interaction in a multiple regression model. Many researchers use mean centered variables because they believe it’s the thing to do or because reviewers ask them to, without quite understanding why. Adding to the confusion is the fact that there is also a perspective in the literature that mean centering does not reduce multicollinearity. In this article, we clarify the issues and reconcile the discrepancy. We distinguish between “micro” and “macro” definitions of multicollinearity and show how both sides of such a debate can be correct. To do so, we use proofs, an illustrative dataset, and a Monte Carlo simulation to show the precise effects of mean centering on both individual correlation coefficients as well as overall model indices. We hope to contribute to the literature by clarifying the issues, reconciling the two perspectives, and quelling the current confusion regarding whether and how mean centering can be a useful practice.

203 citations


Journal ArticleDOI
TL;DR: P predictive validation of TAACO is provided and the notion that expert judgments of text coherence and quality are either negatively correlated or not predicted by local and overall text cohesion indices, but are positively predicted by global indices of cohesion is supported.
Abstract: This study introduces the Tool for the Automatic Analysis of Cohesion (TAACO), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, and Linux), is housed on a user’s hard drive (rather than having an Internet interface), allows for the batch processing of text files, and incorporates over 150 classic and recently developed indices related to text cohesion. The study validates TAACO by investigating how its indices related to local, global, and overall text cohesion can predict expert judgments of text coherence and essay quality. The findings of this study provide predictive validation of TAACO and support the notion that expert judgments of text coherence and quality are either negatively correlated or not predicted by local and overall text cohesion indices, but are positively predicted by global indices of cohesion. Combined, these findings provide supporting evidence that coherence for expert raters is a property of global cohesion and not of local cohesion, and that expert ratings of text quality are positively related to global cohesion.

190 citations


Journal ArticleDOI
TL;DR: This database contains 64 primary novel object images and additional novel exemplars for ten basic- and nine global-level object categories and found that object novelty correlated with qualifying naming responses pertaining to the objects’ colors.
Abstract: Many experimental research designs require images of novel objects. Here we introduce the Novel Object and Unusual Name (NOUN) Database. This database contains 64 primary novel object images and additional novel exemplars for ten basic- and nine global-level object categories. The objects’ novelty was confirmed by both self-report and a lack of consensus on questions that required participants to name and identify the objects. We also found that object novelty correlated with qualifying naming responses pertaining to the objects’ colors. The results from a similarity sorting task (and a subsequent multidimensional scaling analysis on the similarity ratings) demonstrated that the objects are complex and distinct entities that vary along several featural dimensions beyond simply shape and color. A final experiment confirmed that additional item exemplars comprised both sub- and superordinate categories. These images may be useful in a variety of settings, particularly for developmental psychology and other research in the language, categorization, perception, visual memory, and related domains.

187 citations


Journal ArticleDOI
TL;DR: In this article, the authors present results from a visual search experiment in which they measured response time distributions with both Psychophysics Toolbox (PTB) and JavaScript, and concluded that JavaScript is a suitable tool for measuring response times in behavioral research.
Abstract: Behavioral researchers are increasingly using Web-based software such as JavaScript to conduct response time experiments. Although there has been some research on the accuracy and reliability of response time measurements collected using JavaScript, it remains unclear how well this method performs relative to standard laboratory software in psychologically relevant experimental manipulations. Here we present results from a visual search experiment in which we measured response time distributions with both Psychophysics Toolbox (PTB) and JavaScript. We developed a methodology that allowed us to simultaneously run the visual search experiment with both systems, interleaving trials between two independent computers, thus minimizing the effects of factors other than the experimental software. The response times measured by JavaScript were approximately 25 ms longer than those measured by PTB. However, we found no reliable difference in the variability of the distributions related to the software, and both software packages were equally sensitive to changes in the response times as a result of the experimental manipulations. We concluded that JavaScript is a suitable tool for measuring response times in behavioral research.

169 citations


Journal ArticleDOI
TL;DR: Evidence is provided that the CRT is a valid measure of reflective but not of intuitive thinking, and the dependency problem can be addressed, by considering the problem of the structural dependency of these measures derived from theCRT and by assessing their respective associations with self-report measures of intuitive–analytic cognitive styles.
Abstract: The Cognitive Reflection Test (CRT) is one of the most widely used tools to assess individual differences in intuitive–analytic cognitive styles. The CRT is of broad interest because each of its items reliably cues a highly available and superficially appropriate but incorrect response, conventionally deemed the “intuitive” response. To do well on the CRT, participants must reflect on and question the intuitive responses. The CRT score typically employed is the sum of correct responses, assumed to indicate greater “reflectiveness” (i.e., CRT–Reflective scoring). Some recent researchers have, however, inverted the rationale of the CRT by summing the number of intuitive incorrect responses, creating a putative measure of intuitiveness (i.e., CRT–Intuitive). We address the feasibility and validity of this strategy by considering the problem of the structural dependency of these measures derived from the CRT and by assessing their respective associations with self-report measures of intuitive–analytic cognitive styles: the Faith in Intuition and Need for Cognition scales. Our results indicated that, to the extent that the dependency problem can be addressed, the CRT–Reflective but not the CRT–Intuitive measure predicts intuitive–analytic cognitive styles. These results provide evidence that the CRT is a valid measure of reflective but not of intuitive thinking.

160 citations


Journal ArticleDOI
TL;DR: It is shown that when multiple pairs of series are aggregated in several different ways for a cross-correlation analysis, problems remain, and how to obtain a transfer function describing such relationships, informed by any genuine cross-Correlation functions is illustrated.
Abstract: Many articles on perception, performance, psychophysiology, and neuroscience seek to relate pairs of time series through assessments of their cross-correlations. Most such series are individually autocorrelated: they do not comprise independent values. Given this situation, an unfounded reliance is often placed on cross-correlation as an indicator of relationships (e.g., referent vs. response, leading vs. following). Such cross-correlations can indicate spurious relationships, because of autocorrelation. Given these dangers, we here simulated how and why such spurious conclusions can arise, to provide an approach to resolving them. We show that when multiple pairs of series are aggregated in several different ways for a cross-correlation analysis, problems remain. Finally, even a genuine cross-correlation function does not answer key motivating questions, such as whether there are likely causal relationships between the series. Thus, we illustrate how to obtain a transfer function describing such relationships, informed by any genuine cross-correlations. We illustrate the confounds and the meaningful transfer functions by two concrete examples, one each in perception and performance, together with key elements of the R software code needed. The approach involves autocorrelation functions, the establishment of stationarity, prewhitening, the determination of cross-correlation functions, the assessment of Granger causality, and autoregressive model development. Autocorrelation also limits the interpretability of other measures of possible relationships between pairs of time series, such as mutual information. We emphasize that further complexity may be required as the appropriate analysis is pursued fully, and that causal intervention experiments will likely also be needed.

144 citations


Journal ArticleDOI
TL;DR: This work systematically map pupil foreshortening error (PFE) using an artificial eye model and then applies a geometric model correction to correct it.
Abstract: Pupil size is correlated with a wide variety of important cognitive variables and is increasingly being used by cognitive scientists. Pupil data can be recorded inexpensively and non-invasively by many commonly used video-based eye-tracking cameras. Despite the relative ease of data collection and increasing prevalence of pupil data in the cognitive literature, researchers often underestimate the methodological challenges associated with controlling for confounds that can result in misinterpretation of their data. One serious confound that is often not properly controlled is pupil foreshortening error (PFE)—the foreshortening of the pupil image as the eye rotates away from the camera. Here we systematically map PFE using an artificial eye model and then apply a geometric model correction. Three artificial eyes with different fixed pupil sizes were used to systematically measure changes in pupil size as a function of gaze position with a desktop EyeLink 1000 tracker. A grid-based map of pupil measurements was recorded with each artificial eye across three experimental layouts of the eye-tracking camera and display. Large, systematic deviations in pupil size were observed across all nine maps. The measured PFE was corrected by a geometric model that expressed the foreshortening of the pupil area as a function of the cosine of the angle between the eye-to-camera axis and the eye-to-stimulus axis. The model reduced the root mean squared error of pupil measurements by 82.5 % when the model parameters were pre-set to the physical layout dimensions, and by 97.5 % when they were optimized to fit the empirical error surface.

Journal ArticleDOI
TL;DR: It is concluded that large AOIs are a noise-robust solution in face stimuli and, when implemented using the Voronoi method, are the most objective of the researcher-defined AOI.
Abstract: A problem in eyetracking research is choosing areas of interest (AOIs): Researchers in the same field often use widely varying AOIs for similar stimuli, making cross-study comparisons difficult or even impossible. Subjective choices while choosing AOIs cause differences in AOI shape, size, and location. On the other hand, not many guidelines for constructing AOIs, or comparisons between AOI-production methods, are available. In the present study, we addressed this gap by comparing AOI-production methods in face stimuli, using data collected with infants and adults (with autism spectrum disorder [ASD] and matched controls). Specifically, we report that the attention-attracting and attention-maintaining capacities of AOIs differ between AOI-production methods, and that this matters for statistical comparisons in one of three groups investigated (the ASD group). In addition, we investigated the relation between AOI size and an AOI’s attention-attracting and attention-maintaining capacities, as well as the consequences for statistical analyses, and report that adopting large AOIs solves the problem of statistical differences between the AOI methods. Finally, we tested AOI-production methods for their robustness to noise, and report that large AOIs—using the Voronoi tessellation method or the limited-radius Voronoi tessellation method with large radii—are most robust to noise. We conclude that large AOIs are a noise-robust solution in face stimuli and, when implemented using the Voronoi method, are the most objective of the researcher-defined AOIs. Adopting Voronoi AOIs in face-scanning research should allow better between-group and cross-study comparisons.

Journal ArticleDOI
TL;DR: The present experiment contradicts the still common preconception that reaction time effects of only a few hundred milliseconds cannot be detected in Web experiments, and confirms any substantial influence of increased technical or situational variation.
Abstract: Although Web-based research is now commonplace, it continues to spur skepticism from reviewers and editors, especially whenever reaction times are of primary interest. Such persistent preconceptions are based on arguments referring to increased variation, the limits of certain software and technologies, and a noteworthy lack of comparisons (between Web and lab) in fully randomized experiments. To provide a critical test, participants were randomly assigned to complete a lexical decision task either (a) in the lab using standard experimental software (E-Prime), (b) in the lab using a browser-based version (written in HTML and JavaScript), or (c) via the Web using the same browser-based version. The classical word frequency effect was typical in size and corresponded to a very large effect in all three conditions. There was no indication that the Web- or browser-based data collection was in any way inferior. In fact, if anything, a larger effect was obtained in the browser-based conditions than in the condition relying on standard experimental software. No differences between Web and lab (within the browser-based conditions) could be observed, thus disconfirming any substantial influence of increased technical or situational variation. In summary, the present experiment contradicts the still common preconception that reaction time effects of only a few hundred milliseconds cannot be detected in Web experiments.

Journal ArticleDOI
TL;DR: A suite of Bayes factor hypothesis tests that allow researchers to grade the decisiveness of the evidence that the data provide for the presence versus the absence of a correlation between two variables are presented.
Abstract: We present a suite of Bayes factor hypothesis tests that allow researchers to grade the decisiveness of the evidence that the data provide for the presence versus the absence of a correlation between two variables For concreteness, we apply our methods to the recent work of Donnellan et al (in press) who conducted nine replication studies with over 3,000 participants and failed to replicate the phenomenon that lonely people compensate for a lack of social warmth by taking warmer baths or showers We show how the Bayes factor hypothesis test can quantify evidence in favor of the null hypothesis, and how the prior specification for the correlation coefficient can be used to define a broad range of tests that address complementary questions Specifically, we show how the prior specification can be adjusted to create a two-sided test, a one-sided test, a sensitivity analysis, and a replication test

Journal ArticleDOI
TL;DR: A generalized item response tree model with a flexible parametric form, dimensionality, and choice of covariates for modeling item response processes with a tree structure is presented.
Abstract: A new item response theory (IRT) model with a tree structure has been introduced for modeling item response processes with a tree structure In this paper, we present a generalized item response tree model with a flexible parametric form, dimensionality, and choice of covariates The utilities of the model are demonstrated with two applications in psychological assessments for investigating Likert scale item responses and for modeling omitted item responses The proposed model is estimated with the freely available R package flirt (Jeon et al, 2014b)

Journal ArticleDOI
TL;DR: New affective norms for a new set of Spanish words that were scored on two emotional dimensions (valence and arousal) and on five discrete emotional categories, as well as on concreteness, by 660 Spanish native speakers are introduced.
Abstract: In the present study, we introduce affective norms for a new set of Spanish words, the Madrid Affective Database for Spanish (MADS), that were scored on two emotional dimensions (valence and arousal) and on five discrete emotional categories (happiness, anger, sadness, fear, and disgust), as well as on concreteness, by 660 Spanish native speakers. Measures of several objective psycholinguistic variables—grammatical class, word frequency, number of letters, and number of syllables—for the words are also included. We observed high split-half reliabilities for every emotional variable and a strong quadratic relationship between valence and arousal. Additional analyses revealed several associations between the affective dimensions and discrete emotions, as well as with some psycholinguistic variables. This new corpus complements and extends prior databases in Spanish and allows for designing new experiments investigating the influence of affective content in language processing under both dimensional and discrete theoretical conceptions of emotion. These norms can be downloaded as supplemental materials for this article from www.dropbox.com/s/o6dpw3irk6utfhy/Hinojosa%20et%20al_Supplementary%20materials.xlsx?dl=0.

Journal ArticleDOI
TL;DR: The MR2 includes 74 extremely high resolution images of European, African, and East Asian faces, providing a high-quality, diverse, naturalistic, and well-controlled facial image set for use in research.
Abstract: Faces impart exhaustive information about their bearers, and are widely used as stimuli in psychological research. Yet many extant facial stimulus sets have substantially less detail than faces encountered in real life. In this paper, we describe a new database of facial stimuli, the Multi-Racial Mega-Resolution database (MR2). The MR2 includes 74 extremely high resolution images of European, African, and East Asian faces. This database provides a high-quality, diverse, naturalistic, and well-controlled facial image set for use in research. The MR2 is available under a Creative Commons license, and may be accessed online.

Journal ArticleDOI
TL;DR: Strong support for the reliability of LENA in French is provided, with two age groups (7–12 months and 13–18 months) having a significant effect on the AWC data and that the second day of recording had a significant impact on the CVC data.
Abstract: In this study, we examined the accuracy of the Language ENvironment Analysis (LENA) system in European French LENA is a digital recording device with software that facilitates the collection and analysis of audio recordings from young children, providing automated measures of the speech overheard and produced by the child Eighteen native French-speaking children, who were divided into six age groups ranging from 3 to 48 months old, were recorded about 10–16 h per day, three days a week A total of 324 samples (six 10-min chunks of recordings) were selected and then transcribed according to the CHAT format Simple and mixed linear models between the LENA and human adult word count (AWC) and child vocalization count (CVC) estimates were performed, to determine to what extent the automatic and the human methods agreed Both the AWC and CVC estimates were very reliable (r = 64 and 71, respectively) for the 324 samples When controlling the random factors of participants and recordings, 1 h was sufficient to obtain a reliable sample It was, however, found that two age groups (7–12 months and 13–18 months) had a significant effect on the AWC data and that the second day of recording had a significant effect on the CVC data When noise-related factors were added to the model, only a significant effect of signal-to-noise ratio was found on the AWC data All of these findings and their clinical implications are discussed, providing strong support for the reliability of LENA in French

Journal ArticleDOI
TL;DR: The short version of the Geneva Emotion Recognition Test (the GERT-S) is introduced, and two studies that examine the internal consistency, factor structure, and convergent and discriminant validity of the test show that it is a unidimensional test with good internal consistency.
Abstract: The ability to accurately interpret others’ emotional expressions in the face, voice, and body is a crucial component of successful social functioning and has been shown to predict better outcomes in private and professional life. To date, emotion recognition ability (ERA) has mostly been measured with tests that heavily rely on static pictures of the face and on few emotions, restricting their content validity. Recently, Schlegel, Grandjean, and Scherer (Psychological Assessment, 26, 666–672, 2014) published a new test that measures ERA in a more comprehensive fashion, by (1) including a wide range of 14 positive and negative emotions and (2) using video clips with sound that simultaneously present facial, vocal, and bodily emotional cues. This article introduces the short version of the Geneva Emotion Recognition Test (the GERT-S), and presents two studies (total N = 425) that examine the internal consistency, factor structure, and convergent and discriminant validity of the test. The results show that the GERT-S is a unidimensional test with good internal consistency. Furthermore, the GERT-S was substantially positively correlated with other ERA tests, with tests of emotional understanding and emotion management, and with cognitive ability. Taken together, the present studies demonstrate the usefulness of the GERT-S as an instrument for the brief and reliable assessment of ERA. It is available, free of charge and in seven different languages, for academic research use. Given the brief test-taking time (approx. 10 min) and its possible administration via different online platforms, the GERT-S can easily be integrated by researchers into their own studies.

Journal ArticleDOI
TL;DR: Several analysis modules included in ANSLAB are reviewed and it is described how these address some of the current needs and methodological challenges of psychophysiological science.
Abstract: Psychophysiological science employs a large variety of signals from the human body that index the activity of the peripheral nervous system. This allows for studying interactions of psychological and physiological processes that are relevant for understanding cognition, emotion, and psychopathology. The multidimensional nature of the data and the interactions between different physiological signals represent a methodological and computational challenge. Analysis software in this domain is often limited in its coverage of the signals from different physiological systems, and therefore only partially addresses these challenges. ANSLAB (short for Autonomic Nervous System Laboratory) is an integrated software suite that supports data visualization, artifact detection, data reduction, automated processing, and statistical analysis for a large range of autonomic, respiratory, and muscular measures. Analysis modules for cardiovascular (e.g., electrocardiogram, heart rate variability, blood pressure wave, pulse wave, and impedance cardiography), electrodermal (skin conductance level and responses), respiratory (respiratory pattern, timing, and volume variables, as well as capnography), and muscular (eye-blink startle, facial and bodily electromyography) systems are complemented by specialized modules (e.g., body temperature and accelerometry, cross-spectral analysis of respiratory and cardiac measures, signal averaging, and statistical analysis) and productivity-enhancing features (batched processing, fully automatized analyses, and data management). ANSLAB also facilitates the analysis of long-term recordings from ambulatory assessment studies. The present article reviews several analysis modules included in ANSLAB and describes how these address some of the current needs and methodological challenges of psychophysiological science.

Journal ArticleDOI
TL;DR: A database that provides subjective ratings for 1,400 Spanish words for valence, arousal, concreteness, imageability, context availability, and familiarity is described, suitable for experimental research into the effects of both affective properties and lexico-semantic variables on word processing and memory.
Abstract: Studies of semantic variables (e.g., concreteness) and affective variables (i.e., valence and arousal) have traditionally tended to run in different directions. However, in recent years there has been growing interest in studying the relationship, as well as the potential overlaps, between the two. This article describes a database that provides subjective ratings for 1,400 Spanish words for valence, arousal, concreteness, imageability, context availability, and familiarity. Data were collected online through a process involving 826 university students. The results showed a high interrater reliability for all of the variables examined, as well as high correlations between our affective and semantic values and norms currently available in other Spanish databases. Regarding the affective variables, the typical quadratic correlation between valence and arousal ratings was obtained. Likewise, significant correlations were found between the lexico-semantic variables. Importantly, we obtained moderate negative correlations between emotionality and both concreteness and imageability. This is in line with the claim that abstract words have more affective associations than concrete ones (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011). The present Spanish database is suitable for experimental research into the effects of both affective properties and lexico-semantic variables on word processing and memory.

Journal ArticleDOI
TL;DR: The validation results, as well as participant ratings of the emotional valence, arousal and intensity of the visual stimuli from this emotion stimulus set, are presented.
Abstract: The EU-Emotion Stimulus Set is a newly developed collection of dynamic multimodal emotion and mental state representations. A total of 20 emotions and mental states are represented through facial expressions, vocal expressions, body gestures and contextual social scenes. This emotion set is portrayed by a multi-ethnic group of child and adult actors. Here we present the validation results, as well as participant ratings of the emotional valence, arousal and intensity of the visual stimuli from this emotion stimulus set. The EU-Emotion Stimulus Set is available for use by the scientific community and the validation data are provided as a supplement available for download.

Journal ArticleDOI
TL;DR: The Gardony Map Drawing Analyzer (GMDA), an open-source software package for sketch map analysis, is presented, describing the software and its operation and present a formal specification of calculation procedures for its unique measures.
Abstract: Sketch maps are effective tools for assessing spatial memory. However, despite their widespread use in cognitive science research, sketch map analysis techniques remain unstandardized and carry limitations. In the present article, we present the Gardony Map Drawing Analyzer (GMDA), an open-source software package for sketch map analysis. GMDA combines novel and established analysis techniques into a graphical user interface that permits rapid computational sketch map analysis. GMDA calculates GMDA-unique measures based on pairwise comparisons between landmarks, as well as bidimensional regression parameters (Friedman & Kohler, 2003), which together reflect sketch map quality at two levels: configural and individual landmark. The configural measures assess the overall landmark configuration and provide a whole-map analysis. Individual landmark measures, introduced in GMDA, assess individual landmark placement and indicate how individual landmarks contribute to the configural scores. Together, these measures provide a more complete psychometric picture of sketch map analysis, allowing for comparisons between sketch maps and between landmarks. The calculated measures reflect specific and cognitively relevant aspects of interlandmark spatial relationships, including distance and angular representation. GMDA supports complex environments (up to 48 landmarks) and two software modes that capture aspects of maps not addressed by existing techniques, such as landmark size and shape variation and interlandmark containment relationships. We describe the software and its operation and present a formal specification of calculation procedures for its unique measures. We then validate the software by demonstrating the capabilities and reliability of its measures using simulation and experimental data. The most recent version of GMDA is available at www.aarongardony.com/tools/map-drawing-analyzer.

Journal ArticleDOI
TL;DR: A large sample of real-world data was used to illustrate the base rate dependence of correlations when applied to dichotomous or ordinal data to recommend AUCs, Pearson/Thorndike adjusted correlations, Cohen’s d, or polychoric correlations should be considered as alternate effect size statistics in many contexts.
Abstract: Correlations are the simplest and most commonly understood effect size statistic in psychology. The purpose of the current paper was to use a large sample of real-world data (109 correlations with 60,415 participants) to illustrate the base rate dependence of correlations when applied to dichotomous or ordinal data. Specifically, we examined the influence of the base rate on different effect size metrics. Correlations decreased when the dichotomous variable did not have a 50 % base rate. The higher the deviation from a 50 % base rate, the smaller the observed Pearson’s point-biserial and Kendall’s tau correlation coefficients. In contrast, the relationship between base rate deviations and the more commonly proposed alternatives (i.e., polychoric correlation coefficients, AUCs, Pearson/Thorndike adjusted correlations, and Cohen’s d) were less remarkable, with AUCs being most robust to attenuation due to base rates. In other words, the base rate makes a marked difference in the magnitude of the correlation. As such, when using dichotomous data, the correlation may be more sensitive to base rates than is optimal for the researcher’s goals. Given the magnitude of the association between the base rate and point-biserial correlations (r = −.81) and Kendall’s tau (r = −.80), we recommend that AUCs, Pearson/Thorndike adjusted correlations, Cohen’s d, or polychoric correlations should be considered as alternate effect size statistics in many contexts.

Journal ArticleDOI
TL;DR: Arousal correlated positively with figurativeness (supporting the idea that figurative expressions are more emotionally engaging than literal expressions) and with concreteness and semantic transparency and suggests that idioms may convey a more direct reference to sensory representations, mediated by the meanings of their constituting words.
Abstract: Despite flourishing research on the relationship between emotion and literal language, and despite the pervasiveness of figurative expressions in communication, the role of figurative language in conveying affect has been underinvestigated. This study provides affective and psycholinguistic norms for 619 German idiomatic expressions and explores the relationships between affective and psycholinguistic idiom properties. German native speakers rated each idiom for emotional valence, arousal, familiarity, semantic transparency, figurativeness, and concreteness. They also described the figurative meaning of each idiom and rated how confident they were about the attributed meaning. The results showed that idioms rated high in valence were also rated high in arousal. Negative idioms were rated as more arousing than positive ones, in line with results from single words. Furthermore, arousal correlated positively with figurativeness (supporting the idea that figurative expressions are more emotionally engaging than literal expressions) and with concreteness and semantic transparency. This suggests that idioms may convey a more direct reference to sensory representations, mediated by the meanings of their constituting words. Arousal correlated positively with familiarity. In addition, positive idioms were rated as more familiar than negative idioms. Finally, idioms without a literal counterpart were rated as more emotionally valenced and arousing than idioms with a literal counterpart. Although the meanings of ambiguous idioms were less correctly defined than those of unambiguous idioms, ambiguous idioms were rated as more concrete than unambiguous ones. We also discuss the relationships between the various psycholinguistic variables characterizing idioms, with reference to the literature on idiom structure and processing.

Journal ArticleDOI
TL;DR: The basic-emotion normative ratings for the NAPS BE system are introduced, which will allow researchers to control and manipulate stimulus properties specifically for their experimental questions of interest.
Abstract: The Nencki Affective Picture System (NAPS; Marchewka, Żurawski, Jednorog, & Grabowska, Behavior Research Methods, 2014) is a standardized set of 1,356 realistic, high-quality photographs divided into five categories (people, faces, animals, objects, and landscapes). NAPS has been primarily standardized along the affective dimensions of valence, arousal, and approach-avoidance, yet the characteristics of discrete emotions expressed by the images have not been investigated thus far. The aim of the present study was to collect normative ratings according to categorical models of emotions. A subset of 510 images from the original NAPS set was selected in order to proportionally cover the whole dimensional affective space. Among these, using three available classification methods, we identified images eliciting distinguishable discrete emotions. We introduce the basic-emotion normative ratings for the Nencki Affective Picture System (NAPS BE), which will allow researchers to control and manipulate stimulus properties specifically for their experimental questions of interest. The NAPS BE system is freely accessible to the scientific community for noncommercial use as supplementary materials to this article.

Journal ArticleDOI
TL;DR: The present work provides researchers with a large database to aid in stimulus construction and selection and reveals that the color RED was most commonly associated with negative emotion and emotion-laden words, whereas YELLOW and WHITE were associated with positive emotion and emotions, respectively.
Abstract: Color has the ability to influence a variety of human behaviors, such as object recognition, the identification of facial expressions, and the ability to categorize stimuli as positive or negative. Researchers have started to examine the relationship between emotional words and colors, and the findings have revealed that brightness is often associated with positive emotional words and darkness with negative emotional words (e.g., Meier, Robinson, & Clore, Psychological Science, 15, 82-87, 2004). In addition, words such as anger and failure seem to be inherently associated with the color red (e.g., Kuhbandner & Pekrun). The purpose of the present study was to construct norms for positive and negative emotion and emotion-laden words and their color associations. Participants were asked to provide the first color that came to mind for a set of 160 emotional items. The results revealed that the color RED was most commonly associated with negative emotion and emotion-laden words, whereas YELLOW and WHITE were associated with positive emotion and emotion-laden words, respectively. The present work provides researchers with a large database to aid in stimulus construction and selection.

Journal ArticleDOI
TL;DR: TimeStudio facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers’ analysis workload, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses.
Abstract: This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers’ analysis workload. The project website (http://timestudioproject.com) contains the latest releases of TimeStudio, together with documentation and user forums.

Journal ArticleDOI
TL;DR: Analytical derivations and numerical examinations are presented to assess the bias and mean square error of the alternative estimators and suggest that more advantageous indices can be recommended over ICC(2) for their theoretical implication and computational ease.
Abstract: The intraclass correlation coefficient (ICC)(2) index from a one-way random effects model is widely used to describe the reliability of mean ratings in behavioral, educational, and psychological research. Despite its apparent utility, the essential property of ICC(2) as a point estimator of the average score intraclass correlation coefficient is seldom mentioned. This article considers several potential measures and compares their performance with ICC(2). Analytical derivations and numerical examinations are presented to assess the bias and mean square error of the alternative estimators. The results suggest that more advantageous indices can be recommended over ICC(2) for their theoretical implication and computational ease.

Journal ArticleDOI
TL;DR: The results showed that Aw and dr were generally robust to these violations, and Aw slightly outperformed dr. Implications for the use of Aw and Dr in real-world research are discussed.
Abstract: In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r* ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.