Showing papers in &quot;Behavior Research Methods in 2019&quot;

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

TL;DR: The most notable addition has been that Builder interface, allowing users to create studies with minimal or no programming, while also allowing the insertion of Python code for maximal flexibility.

...read moreread less

Abstract: PsychoPy is an application for the creation of experiments in behavioral science (psychology, neuroscience, linguistics, etc.) with precise spatial control and timing of stimuli. It now provides a choice of interface; users can write scripts in Python if they choose, while those who prefer to construct experiments graphically can use the new Builder interface. Here we describe the features that have been added over the last 10 years of its development. The most notable addition has been that Builder interface, allowing users to create studies with minimal or no programming, while also allowing the insertion of Python code for maximal flexibility. We also present some of the other new features, including further stimulus options, asynchronous time-stamped hardware polling, and better support for open science and reproducibility. Tens of thousands of users now launch PsychoPy every month, and more than 90 people have contributed to the code. We discuss the current state of the project, as well as plans for the future.

...read moreread less

1,747 citations

Journal Article•DOI•

[...]

Yan Xia¹, Yanyun Yang²•Institutions (2)

Arizona State University¹, Florida State University²

Online panels in social science research: Expanding sampling methods beyond Mechanical Turk

TL;DR: The results showed that DWLS and ULS lead to smaller RMSEA and larger CFI and TLI values than does ML for all manipulated conditions, regardless of whether or not the indices are scaled.

...read moreread less

Abstract: In structural equation modeling, application of the root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker–Lewis index (TLI) highly relies on the conventional cutoff values developed under normal-theory maximum likelihood (ML) with continuous data. For ordered categorical data, unweighted least squares (ULS) and diagonally weighted least squares (DWLS) based on polychoric correlation matrices have been recommended in previous studies. Although no clear suggestions exist regarding the application of these fit indices when analyzing ordered categorical variables, practitioners are still tempted to adopt the conventional cutoff rules. The purpose of our research was to answer the question: Given a population polychoric correlation matrix and a hypothesized model, if ML results in a specific RMSEA value (e.g., .08), what is the RMSEA value when ULS or DWLS is applied? CFI and TLI were investigated in the same fashion. Both simulated and empirical polychoric correlation matrices with various degrees of model misspecification were employed to address the above question. The results showed that DWLS and ULS lead to smaller RMSEA and larger CFI and TLI values than does ML for all manipulated conditions, regardless of whether or not the indices are scaled. Applying the conventional cutoffs to DWLS and ULS, therefore, has a pronounced tendency not to discover model–data misfit. Discussions regarding the use of RMSEA, CFI, and TLI for ordered categorical data are given.

...read moreread less

475 citations

Journal Article•DOI•

[...]

Jesse Chandler¹, Cheskie Rosenzweig², Aaron J. Moss, Jonathan Robinson³, Leib Litman³ - Show less +1 more•Institutions (3)

University of Michigan¹, Columbia University², Lander University³

The “Small World of Words” English word association norms for over 12,000 cue words

TL;DR: It is concluded that online research panels offer a unique opportunity for research, yet one with some important trade-offs, as compared with traditional student subject pools.

...read moreread less

Abstract: Amazon Mechanical Turk (MTurk) is widely used by behavioral scientists to recruit research participants. MTurk offers advantages over traditional student subject pools, but it also has important limitations. In particular, the MTurk population is small and potentially overused, and some groups of interest to behavioral scientists are underrepresented and difficult to recruit. Here we examined whether online research panels can avoid these limitations. Specifically, we compared sample composition, data quality (measured by effect sizes, internal reliability, and attention checks), and the non-naivete of participants recruited from MTurk and Prime Panels-an aggregate of online research panels. Prime Panels participants were more diverse in age, family composition, religiosity, education, and political attitudes. Prime Panels participants also reported less exposure to classic protocols and produced larger effect sizes, but only after screening out several participants who failed a screening task. We conclude that online research panels offer a unique opportunity for research, yet one with some important trade-offs.

...read moreread less

306 citations

Journal Article•DOI•

[...]

Simon De Deyne¹, Danielle J. Navarro², Amy Perfors¹, Marc Brysbaert³, Gert Storms⁴ - Show less +1 more•Institutions (4)

University of Melbourne¹, University of New South Wales², Ghent University³, Katholieke Universiteit Leuven⁴

Preprocessing pupil size data: Guidelines and code

TL;DR: This work describes the collection of word associations for over 12,000 cue words, currently the largest such English-language resource in the world, and shows that measures based on a mechanism of spreading activation derived from this new resource are highly predictive of direct judgments of similarity.

...read moreread less

Abstract: Word associations have been used widely in psychology, but the validity of their application strongly depends on the number of cues included in the study and the extent to which they probe all associations known by an individual. In this work, we address both issues by introducing a new English word association dataset. We describe the collection of word associations for over 12,000 cue words, currently the largest such English-language resource in the world. Our procedure allowed subjects to provide multiple responses for each cue, which permits us to measure weak associations. We evaluate the utility of the dataset in several different contexts, including lexical decision and semantic categorization. We also show that measures based on a mechanism of spreading activation derived from this new resource are highly predictive of direct judgments of similarity. Finally, a comparison with existing English word association sets further highlights systematic improvements provided through these new norms.

...read moreread less

192 citations

Journal Article•DOI•

[...]

Mariska E. Kret¹, Elio E. Sjak-Shie¹•Institutions (1)

Leiden University¹

Combining speed and accuracy to control for speed-accuracy trade-offs(?)

TL;DR: A short review of the literature on pupil size measurements is provided, then the most important sources of noise are highlighted and shown how these can be detected and step-by-step guidelines are provided that will help those interested in pupil size to preprocess their data correctly.

...read moreread less

Abstract: Pupillometry has been one of the most widely used response systems in psychophysiology. Changes in pupil size can reflect diverse cognitive and emotional states, ranging from arousal, interest and effort to social decisions, but they are also widely used in clinical practice to assess patients’ brain functioning. As a result, research involving pupil size measurements has been reported in practically all psychology, psychiatry, and psychophysiological research journals, and now it has found its way into the primatology literature as well as into more practical applications, such as using pupil size as a measure of fatigue or a safety index during driving. The different systems used for recording pupil size are almost as variable as its applications, and all yield, as with many measurement techniques, a substantial amount of noise in addition to the real pupillometry data. Before analyzing pupil size, it is therefore of crucial importance first to detect this noise and deal with it appropriately, even prior to (if need be) resampling and baseline-correcting the data. In this article we first provide a short review of the literature on pupil size measurements, then we highlight the most important sources of noise and show how these can be detected. Finally, we provide step-by-step guidelines that will help those interested in pupil size to preprocess their data correctly. These guidelines are accompanied by an open source MATLAB script (available at https://github.com/ElioS-S/pupil-size). Given that pupil diameter is easily measured by standard eyetracking technologies and can provide fundamental insights into cognitive and emotional processes, it is hoped that this article will further motivate scholars from different disciplines to study pupil size.

...read moreread less

191 citations

Journal Article•DOI•

[...]

Heinrich René Liesefeld¹, Markus Janczyk²•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Tübingen²

Moderation analysis in two-instance repeated measures designs: Probing methods and multiple moderator models.

TL;DR: Analysis of simulated data generated with the standard diffusion model shows that IES, RCS, and LISAS put unequal weights on speed and accuracy, depending on the accuracy level, and that these measures are actually very sensitive to speed–accuracy trade-offs.

...read moreread less

Abstract: In psychological experiments, participants are typically instructed to respond as fast as possible without sacrificing accuracy. How they interpret this instruction and, consequently, which speed-accuracy trade-off they choose might vary between experiments, between participants, and between conditions. Consequently, experimental effects can appear unpredictably in either RTs or error rates (i.e., accuracy). Even more problematic, spurious effects might emerge that are actually due only to differential speed-accuracy trade-offs. An often-suggested solution is the inverse efficiency score (IES; Townsend & Ashby, 1983), which combines speed and accuracy into a single score. Alternatives are the rate-correct score (RCS; Woltz & Was, 2006) and the linear-integrated speed-accuracy score (LISAS; Vandierendonck, 2017, 2018). We report analyses on simulated data generated with the standard diffusion model (Ratcliff, 1978) showing that IES, RCS, and LISAS put unequal weights on speed and accuracy, depending on the accuracy level, and that these measures are actually very sensitive to speed-accuracy trade-offs. These findings stand in contrast to a fourth alternative, the balanced integration score (BIS; Liesefeld, Fu, & Zimmer, 2015), which was devised to integrate speed and accuracy with equal weights. Although all of the measures maintain "real" effects, only BIS is relatively insensitive to speed-accuracy trade-offs.

...read moreread less

140 citations

Journal Article•DOI•

[...]

Amanda K. Montoya¹•Institutions (1)

Ohio State University¹

Survey-software implicit association tests: A methodological and empirical analysis

TL;DR: The article shows how to probe interactions in a two-instance repeated measures design using both the pick-a-point approach and the Johnson–Neyman procedure, and describes some alternative methods of analysis, including structural equation models and multilevel models.

...read moreread less

Abstract: Moderation hypotheses appear in every area of psychological science, but the methods for testing and probing moderation in two-instance repeated measures designs are incomplete. This article begins with a short overview of testing and probing interactions in between-participant designs. Next I review the methods outlined in Judd, McClelland, and Smith (Psychological Methods 1; 366-378, 1996) and Judd, Kenny, and McClelland (Psychological Methods 6; 115-134, 2001) for estimating and conducting inference on an interaction between a repeated measures factor and a single between-participant moderator using linear regression. I extend these methods in two ways: First, the article shows how to probe interactions in a two-instance repeated measures design using both the pick-a-point approach and the Johnson-Neyman procedure. Second, I extend the models described by Judd et al. (1996) to multiple-moderator models, including additive and multiplicative moderation. Worked examples with a published dataset are included, to demonstrate the methods described throughout the article. Additionally, I demonstrate how to use Mplus and MEMORE (Mediation and Moderation for Repeated Measures; available at http://akmontoya.com ), an easy-to-use tool available for SPSS and SAS, to estimate and probe interactions when the focal predictor is a within-participant factor, reducing the computational burden for researchers. I describe some alternative methods of analysis, including structural equation models and multilevel models. The conclusion touches on some extensions of the methods described in the article and potentially fruitful areas of further research.

...read moreread less

130 citations

Journal Article•DOI•

[...]

Thomas P. Carpenter¹, Ruth Pogacar², Chris Pullig³, Michal Kouril⁴, Stephen J. Aguilar⁵, Jordan P. LaBouff⁶, Naomi Isenberg¹, Aleksandr Chakroff⁷ - Show less +4 more•Institutions (7)

Seattle Pacific University¹, University of Calgary², Baylor University³, Cincinnati Children's Hospital Medical Center⁴, University of Southern California⁵, University of Maine⁶, Harvard University⁷

A tutorial on Bayes Factor Design Analysis using an informed prior

TL;DR: A novel method for constructing IATs using online survey software (Qualtrics) is introduced and its validity is empirically assessed; it appears to be reliable and valid, offer numerous advantages, and make I ATs accessible for researchers who use survey software to conduct online research.

...read moreread less

Abstract: The implicit association test (IAT) is widely used in psychology Unfortunately, the IAT cannot be run within online surveys, requiring researchers who conduct online surveys to rely on third-party tools We introduce a novel method for constructing IATs using online survey software (Qualtrics); we then empirically assess its validity Study 1 (student n = 239) revealed good psychometric properties, expected IAT effects, and expected correlations with explicit measures for survey-software IATs Study 2 (MTurk n = 818) showed predicted IAT effects across four survey-software IATs (ds = 082 [Black-White IAT] to 213 [insect-flower IAT]) Study 3 (MTurk n = 270) compared survey-software IATs and IATs run via Inquisit, yielding nearly identical results and intercorrelations that would be expected for identical IATs Survey-software IATs appear to be reliable and valid, offer numerous advantages, and make IATs accessible for researchers who use survey software to conduct online research We present all the materials, links to tutorials, and an open-source tool that rapidly automates survey-software IAT construction and analysis

...read moreread less

117 citations

Journal Article•DOI•

[...]

Angelika Stefan¹, Quentin Frederik Gronau¹, Felix D. Schönbrodt², Eric-Jan Wagenmakers¹•Institutions (2)

University of Amsterdam¹, Ludwig Maximilian University of Munich²

04 Feb 2019-Behavior Research Methods

TL;DR: This tutorial paper provides an introduction to BFDA and analyze how the use of informed prior distributions affects the results of the BFDA, and presents a user-friendly web-based BFDA application that allows researchers to conduct BFDAs with ease.

...read moreread less

Abstract: Well-designed experiments are likely to yield compelling evidence with efficient sample sizes. Bayes Factor Design Analysis (BFDA) is a recently developed methodology that allows researchers to balance the informativeness and efficiency of their experiment (Schonbrodt & Wagenmakers, Psychonomic Bulletin & Review, 25(1), 128–142 2018). With BFDA, researchers can control the rate of misleading evidence but, in addition, they can plan for a target strength of evidence. BFDA can be applied to fixed-N and sequential designs. In this tutorial paper, we provide an introduction to BFDA and analyze how the use of informed prior distributions affects the results of the BFDA. We also present a user-friendly web-based BFDA application that allows researchers to conduct BFDAs with ease. Two practical examples highlight how researchers can use a BFDA to plan for informative and efficient research designs.

...read moreread less

112 citations

Journal Article•DOI•

The Glasgow Norms: Ratings of 5,500 words on nine scales

[...]

Graham G. Scott¹, Anne Keitel², Marc Becirspahic², Bo Yao³, Sara C. Sereno² - Show less +1 more•Institutions (3)

University of the West of Scotland¹, University of Glasgow², University of Manchester³

TL;DR: The Glasgow Norms provide a valuable resource, for researchers investigating the role of word recognition in language comprehension and the validity of the GlasgowNorms was established via comparisons of the ratings to 18 different sets of current psycholinguistic norms.

...read moreread less

Abstract: The Glasgow Norms are a set of normative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association The Glasgow Norms are unique in several respects First, the corpus itself is relatively large, while simultaneously providing norms across a substantial number of lexical dimensions Second, for any given subset of words, the same participants provided ratings across all nine dimensions (33 participants/word, on average) Third, two novel dimensions—semantic size and gender association—are included Finally, the corpus contains a set of 379 ambiguous words that are presented either alone (eg, toast) or with information that selects an alternative sense (eg, toast (bread), toast (speech)) The relationships between the dimensions of the Glasgow Norms were initially investigated by assessing their correlations In addition, a principal component analysis revealed four main factors, accounting for 82% of the variance (Visualization, Emotion, Salience, and Exposure) The validity of the Glasgow Norms was established via comparisons of our ratings to 18 different sets of current psycholinguistic norms The dimension of size was tested with megastudy data, confirming findings from past studies that have explicitly examined this variable Alternative senses of ambiguous words (ie, disambiguated forms), when discordant on a given dimension, seemingly led to appropriately distinct ratings Informal comparisons between the ratings of ambiguous words and of their alternative senses showed different patterns that likely depended on several factors (the number of senses, their relative strengths, and the rating scales themselves) Overall, the Glasgow Norms provide a valuable resource—in particular, for researchers investigating the role of word recognition in language comprehension

...read moreread less

108 citations

Journal Article•DOI•

Dynamic models of choice

[...]

Andrew Heathcote¹, Yi-Shin Lin¹, Angus Reynolds¹, Luke Strickland¹, Matthew Gretton¹, Dora Matzke² - Show less +2 more•Institutions (2)

University of Tasmania¹, University of Amsterdam²

Word prevalence norms for 62,000 English lemmas

TL;DR: This work outlines how to fit evidence-accumulation models using the flexible, open-source, R-based Dynamic Models of Choice (DMC) software, and guides the reader through the practical details of a Bayesian hierarchical analysis.

...read moreread less

Abstract: Parameter estimation in evidence-accumulation models of choice response times is demanding of both the data and the user. We outline how to fit evidence-accumulation models using the flexible, open-source, R-based Dynamic Models of Choice (DMC) software. DMC provides a hands-on introduction to the Bayesian implementation of two popular evidence-accumulation models: the diffusion decision model (DDM) and the linear ballistic accumulator (LBA). It enables individual and hierarchical estimation, as well as assessment of the quality of a model’s parameter estimates and descriptive accuracy. First, we introduce the basic concepts of Bayesian parameter estimation, guiding the reader through a simple DDM analysis. We then illustrate the challenges of fitting evidence-accumulation models using a set of LBA analyses. We emphasize best practices in modeling and discuss the importance of parameter- and model-recovery simulations, exploring the strengths and weaknesses of models in different experimental designs and parameter regions. We also demonstrate how DMC can be used to model complex cognitive processes, using as an example a race model of the stop-signal paradigm, which is used to measure inhibitory ability. We illustrate the flexibility of DMC by extending this model to account for mixtures of cognitive processes resulting from attention failures. We then guide the reader through the practical details of a Bayesian hierarchical analysis, from specifying priors to obtaining posterior distributions that encapsulate what has been learned from the data. Finally, we illustrate how the Bayesian approach leads to a quantitatively cumulative science, showing how to use posterior distributions to specify priors that can be used to inform the analysis of future experiments.

...read moreread less

Journal Article•DOI•

[...]

Marc Brysbaert¹, Paweł Mandera¹, Samantha F. McCormick², Emmanuel Keuleers³•Institutions (3)

Ghent University¹, University of Roehampton², Tilburg University³

1D CNN with BLSTM for automated classification of fixations, saccades, and smooth pursuits

TL;DR: Word prevalence predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.

...read moreread less

Abstract: We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.

...read moreread less

Journal Article•DOI•

[...]

Mikhail Startsev¹, Ioannis Agtzidis¹, Michael Dorr¹•Institutions (1)

Technische Universität München¹

Replication Bayes factors from evidence updating

TL;DR: A novel pipeline and metric for event detection in eye-tracking recordings, which enforce stricter criteria on the algorithmically produced events in order to consider them as potentially correct detections, and shows that the deep approach outperforms all others, including the state-of-the-art multi-observer smooth pursuit detector.

...read moreread less

Abstract: Deep learning approaches have achieved breakthrough performance in various domains. However, the segmentation of raw eye-movement data into discrete events is still done predominantly either by hand or by algorithms that use hand-picked parameters and thresholds. We propose and make publicly available a small 1D-CNN in conjunction with a bidirectional long short-term memory network that classifies gaze samples as fixations, saccades, smooth pursuit, or noise, simultaneously assigning labels in windows of up to 1 s. In addition to unprocessed gaze coordinates, our approach uses different combinations of the speed of gaze, its direction, and acceleration, all computed at different temporal scales, as input features. Its performance was evaluated on a large-scale hand-labeled ground truth data set (GazeCom) and against 12 reference algorithms. Furthermore, we introduced a novel pipeline and metric for event detection in eye-tracking recordings, which enforce stricter criteria on the algorithmically produced events in order to consider them as potentially correct detections. Results show that our deep approach outperforms all others, including the state-of-the-art multi-observer smooth pursuit detector. We additionally test our best model on an independent set of recordings, where our approach stays highly competitive compared to literature methods.

...read moreread less

Journal Article•DOI•

[...]

Alexander Ly¹, Alexander Etz², Maarten Marsman¹, Eric-Jan Wagenmakers¹•Institutions (2)

University of Amsterdam¹, University of California, Irvine²

01 Jan 2019-Behavior Research Methods

TL;DR: A general method that allows experimenters to quantify the evidence from the data of a direct replication attempt given data already acquired from an original study is described.

...read moreread less

Abstract: We describe a general method that allows experimenters to quantify the evidence from the data of a direct replication attempt given data already acquired from an original study. These so-called replication Bayes factors are a reconceptualization of the ones introduced by Verhagen and Wagenmakers (Journal of Experimental Psychology: General, 143(4), 1457–1475 2014) for the common t test. This reconceptualization is computationally simpler and generalizes easily to most common experimental designs for which Bayes factors are available.

...read moreread less

Journal Article•DOI•

gazeNet: End-to-end eye-movement event detection with deep neural networks

[...]

Raimondas Zemblys¹, Diederick C Niehorster², Kenneth Holmqvist³, Kenneth Holmqvist⁴, Kenneth Holmqvist⁵ - Show less +1 more•Institutions (5)

Šiauliai University¹, Lund University², University of Regensburg³, Masaryk University⁴, University of the Free State⁵

The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap

TL;DR: G gazeNet is presented, a new framework for creating event detectors that do not require hand-crafted signal features or signal thresholding and employs an end-to-end deep learning approach, which takes raw eye-tracking data as input and classifies it into fixations, saccades and post-saccadic oscillations.

...read moreread less

Abstract: Existing event detection algorithms for eye-movement data almost exclusively rely on thresholding one or more hand-crafted signal features, each computed from the stream of raw gaze data. Moreover, this thresholding is largely left for the end user. Here we present and develop gazeNet, a new framework for creating event detectors that do not require hand-crafted signal features or signal thresholding. It employs an end-to-end deep learning approach, which takes raw eye-tracking data as input and classifies it into fixations, saccades and post-saccadic oscillations. Our method thereby challenges an established tacit assumption that hand-crafted features are necessary in the design of event detection algorithms. The downside of the deep learning approach is that a large amount of training data is required. We therefore first develop a method to augment hand-coded data, so that we can strongly enlarge the data set used for training, minimizing the time spent on manual coding. Using this extended hand-coded data, we train a neural network that produces eye-movement event classification from raw eye-movement data without requiring any predefined feature extraction or post-processing steps. The resulting classification performance is at the level of expert human coders. Moreover, an evaluation of gazeNet on two other datasets showed that gazeNet generalized to data from different eye trackers and consistently outperformed several other event detection algorithms that we tested.

...read moreread less

Journal Article•DOI•

[...]

Scott A. Crossley¹, Kristopher Kyle, Mihai Dascalu²•Institutions (2)

Georgia State University¹, Politehnica University of Bucharest²

The human task-evoked pupillary response function is linear: Implications for baseline response scaling in pupillometry.

TL;DR: This study examined whether source overlap between the speaking samples found in the TOEFL-iBT integrated speaking tasks and the responses produced by test-takers was predictive of human ratings of speaking proficiency, and found that global semantic similarity as reported by word2vec was an important predictor of coherence ratings.

...read moreread less

Abstract: This article introduces the second version of the Tool for the Automatic Analysis of Cohesion (TAACO 2.0). Like its predecessor, TAACO 2.0 is a freely available text analysis tool that works on the Windows, Mac, and Linux operating systems; is housed on a user's hard drive; is easy to use; and allows for batch processing of text files. TAACO 2.0 includes all the original indices reported for TAACO 1.0, but it adds a number of new indices related to local and global cohesion at the semantic level, reported by latent semantic analysis, latent Dirichlet allocation, and word2vec. The tool also includes a source overlap feature, which calculates lexical and semantic overlap between a source and a response text (i.e., cohesion between the two texts based measures of text relatedness). In the first study in this article, we examined the effects that cohesion features, prompt, essay elaboration, and enhanced cohesion had on expert ratings of text coherence, finding that global semantic similarity as reported by word2vec was an important predictor of coherence ratings. A second study was conducted to examine the source and response indices. In this study we examined whether source overlap between the speaking samples found in the TOEFL-iBT integrated speaking tasks and the responses produced by test-takers was predictive of human ratings of speaking proficiency. The results indicated that the percentage of keywords found in both the source and response and the similarity between the source document and the response, as reported by word2vec, were significant predictors of speaking quality. Combined, these findings help validate the new indices reported for TAACO 2.0.

...read moreread less

Journal Article•DOI•

[...]

Jamie Reilly¹, Alexandra Kelly¹, Seung Hwan Kim², Savannah Jett³, Bonnie Zuckerman¹ - Show less +1 more•Institutions (3)

Temple University¹, Boston College², Pomona College³

childes-db: A flexible and reproducible interface to the child language data exchange system

TL;DR: Phasic pupillary responses scaled according to a linear function across all lighting and task conditions, demonstrating that the TEPR is independent of its baseline amplitude, and discusses methodological implications and a need to reevaluate past pupillometry studies.

...read moreread less

Abstract: The human task-evoked pupillary response provides a sensitive physiological index of the intensity and online resource demands of numerous cognitive processes (e.g., memory retrieval, problem solving, or target detection). Cognitive pupillometry is a well-established technique that relies upon precise measurement of these subtle response functions. Baseline variability of pupil diameter is a complex artifact that typically necessitates mathematical correction. A methodological paradox within pupillometry is that linear and nonlinear forms of baseline scaling both remain accepted baseline correction techniques, despite yielding highly disparate results. The task-evoked pupillary response (TEPR) could potentially scale nonlinearly, similar to autonomic functions such as heart rate, in which the amplitude of an evoked response diminishes as the baseline rises. Alternatively, the TEPR could scale similarly to the cortical hemodynamic response, as a linear function that is independent of its baseline. However, the TEPR cannot scale both linearly and nonlinearly. Our aim was to adjudicate between linear and nonlinear scaling of human TEPR. We manipulated baseline pupil size by modulating the illuminance in the testing room as participants heard abrupt pure-tone transitions (Exp. 1) or visually monitored word lists (Exp. 2). Phasic pupillary responses scaled according to a linear function across all lighting (dark, mid, bright) and task (tones, words) conditions, demonstrating that the TEPR is independent of its baseline amplitude. We discuss methodological implications and identify a need to reevaluate past pupillometry studies.

...read moreread less

Journal Article•DOI•

[...]

Alessandro Sanchez¹, Stephan C. Meylan², Stephan C. Meylan³, Mika Braginsky², Kyle MacDonald¹, Daniel Yurovsky⁴, Michael C. Frank¹ - Show less +3 more•Institutions (4)

Stanford University¹, Massachusetts Institute of Technology², Duke University³, University of Chicago⁴

Quantification of nonverbal synchrony using linear time series analysis methods: Lack of convergent validity and evidence for facets of synchrony

TL;DR: Childes-db is introduced, a database-formatted mirror of CHILDES that improves data accessibility and usability by offering novel interfaces, including browsable web applications and an R application programming interface (API).

...read moreread less

Abstract: The Child Language Data Exchange System (CHILDES) has played a critical role in research on child language development, particularly in characterizing the early language learning environment. Access to these data can be both complex for novices and difficult to automate for advanced users, however. To address these issues, we introduce childes-db, a database-formatted mirror of CHILDES that improves data accessibility and usability by offering novel interfaces, including browsable web applications and an R application programming interface (API). Along with versioned infrastructure that facilitates reproducibility of past analyses, these interfaces lower barriers to analyzing naturalistic parent-child language, allowing for a wider range of researchers in language and cognitive development to easily leverage CHILDES in their work.

...read moreread less

Journal Article•DOI•

[...]

Désirée Schoenherr¹, Jane Paulick², Susanne Worrack¹, Bernhard Strauss¹, Julian A. Rubel², Brian S. Schwartz², Anne-Katharina Deisenhofer², Wolfgang Lutz², Ulrich Stangier³, Uwe Altmann¹ - Show less +6 more•Institutions (3)

University of Jena¹, University of Trier², Goethe University Frankfurt³

The Massive Auditory Lexical Decision (MALD) database

TL;DR: It is found that the synchrony measures only partially correlated with each other and only some synchrony scores were able to predict improvement at the end of therapy, suggesting that the considered TSAMs do not measure the same synchrony construct, but different facets of synchrony.

...read moreread less

Abstract: Nonverbal synchrony describes coordination of the nonverbal behavior of two interacting partners. Additionally, it seems to be important in human interactions, such as during psychotherapy. Currently, there are several options for the automated determination of synchrony based on linear time series analysis methods (TSAMs). However, investigations into whether the different methods measure the same construct have been missing. In this study, N = 84 patient-therapist dyads were videotaped during psychotherapy sessions. Motion energy analysis was used to assess body movements. We applied seven different TSAMs and recorded multiple output scores (average synchrony, maximum synchrony, and frequency of synchrony; in total, N = 16 scores). Convergent validity was examined using correlations of the output scores and exploratory factor analysis. Additionally, two criterion-based validations were conducted: investigations of concordant validity with a more generalized nonlinear method, and of the predictive validity of the synchrony scores for improvement in interpersonal problems at the end of therapy. We found that the synchrony measures only partially correlated with each other. The factor analysis did not support a common-factor model. A three-factor model with a second-order synchrony variable showed the best fit for eight of the selected synchrony scores. Only some synchrony scores were able to predict improvement at the end of therapy. We concluded that the considered TSAMs do not measure the same synchrony construct, but different facets of synchrony: the strength of synchrony of the total interaction, the strength of synchrony during synchronization intervals, and the frequency of synchrony.

...read moreread less

Journal Article•DOI•

[...]

Benjamin V. Tucker¹, Daniel Brenner¹, D Kyle Danielson², Matthew C. Kelley¹, Filip Nenadić¹, Michelle Sims¹ - Show less +2 more•Institutions (2)

University of Alberta¹, University of Toronto²

Soundgen : An open-source tool for synthesizing nonverbal vocalizations

TL;DR: The Massive Auditory Lexical Decision database is an end-to-end, freely available auditory and production data set for speech and psycholinguistic research, providing time-aligned stimulus recordings and response data for 227,179 auditory lexical decisions from 231 unique monolingual English listeners.

...read moreread less

Abstract: The Massive Auditory Lexical Decision (MALD) database is an end-to-end, freely available auditory and production data set for speech and psycholinguistic research, providing time-aligned stimulus recordings for 26,793 words and 9592 pseudowords, and response data for 227,179 auditory lexical decisions from 231 unique monolingual English listeners. In addition to the experimental data, we provide many precompiled listener- and item-level descriptor variables. This data set makes it easy to explore responses, build and test theories, and compare a wide range of models. We present summary statistics and analyses.

...read moreread less

Journal Article•DOI•

[...]

Andrey Anikin¹•Institutions (1)

Lund University¹

Predicting future mental illness from social media: A big-data approach

TL;DR: Soundgen is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app and may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

...read moreread less

Abstract: Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

...read moreread less

Journal Article•DOI•

[...]

Robert Thorstad¹, Phillip Wolff¹•Institutions (1)

Emory University¹

Group communication analysis: A computational linguistics approach for detecting sociocognitive roles in multiparty interactions

TL;DR: Words derived from the nonclinical subreddits predicted future postings to clinical subreddits, implying that everyday language contains signal about the likelihood of future mental illness, possibly before people are aware of their mental health condition.

...read moreread less

Abstract: In the present research, we investigated whether people’s everyday language contains sufficient signal to predict the future occurrence of mental illness. Language samples were collected from the social media website Reddit, drawing on posts to discussion groups focusing on different kinds of mental illness (clinical subreddits), as well as on posts to discussion groups focusing on nonmental health topics (nonclinical subreddits). As expected, words drawn from the clinical subreddits could be used to distinguish several kinds of mental illness (ADHD, anxiety, bipolar disorder, and depression). Interestingly, words drawn from the nonclinical subreddits (e.g., travel, cooking, cars) could also be used to distinguish different categories of mental illness, implying that the impact of mental illness spills over into topics unrelated to mental illness. Most importantly, words derived from the nonclinical subreddits predicted future postings to clinical subreddits, implying that everyday language contains signal about the likelihood of future mental illness, possibly before people are aware of their mental health condition. Finally, whereas models trained on clinical subreddits learned to focus on words indicating disorder-specific symptoms, models trained to predict future mental illness learned to focus on words indicating life stress, suggesting that kinds of features that are predictive of mental illness may change over time. Implications for the underlying causes of mental illness are discussed.

...read moreread less

Journal Article•DOI•

[...]

Nia Dowell¹, Tristan M. Nixon, Arthur C. Graesser²•Institutions (2)

University of Michigan¹, University of Memphis²

Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices.

TL;DR: A group communication analysis (GCA) by combining automated computational linguistic techniques with analyses of the sequential interactions of online group communication indicated that participants’ patterns of linguistic coordination and cohesion are representative of the roles that individuals play in collaborative discussions.

...read moreread less

Abstract: Roles are one of the most important concepts in understanding human sociocognitive behavior. During group interactions, members take on different roles within the discussion. Roles have distinct patterns of behavioral engagement (i.e., active or passive, leading or following), contribution characteristics (i.e., providing new information or echoing given material), and social orientation (i.e., individual or group). Different combinations of roles can produce characteristically different group outcomes, and thus can be either less or more productive with regard to collective goals. In online collaborative-learning environments, this can lead to better or worse learning outcomes for the individual participants. In this study, we propose and validate a novel approach for detecting emergent roles from participants' contributions and patterns of interaction. Specifically, we developed a group communication analysis (GCA) by combining automated computational linguistic techniques with analyses of the sequential interactions of online group communication. GCA was applied to three large collaborative interaction datasets (participant N = 2,429, group N = 3,598). Cluster analyses and linear mixed-effects modeling were used to assess the validity of the GCA approach and the influence of learner roles on student and group performance. The results indicated that participants' patterns of linguistic coordination and cohesion are representative of the roles that individuals play in collaborative discussions. More broadly, GCA provides a framework for researchers to explore the micro intra- and interpersonal patterns associated with participants' roles and the sociocognitive processes related to successful collaboration.

...read moreread less

Journal Article•DOI•

[...]

Marc Dupuis¹, Emanuele Meier¹, Félix Cuneo¹•Institutions (1)

University of Lausanne¹

English semantic feature production norms: An extended database of 4436 concepts

TL;DR: Three of the seven indices in this study appear to be the best estimators for detecting nonhuman response sets and every researcher working with online questionnaires could use them to screen for the presence of such invalid data.

...read moreread less

Abstract: With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.

...read moreread less

Journal Article•DOI•

[...]

Erin Michelle Buchanan¹, K. D. Valentine², Nicholas P. Maxwell³•Institutions (3)

Harrisburg University of Science and Technology¹, Harvard University², University of Southern Mississippi³

Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis

TL;DR: This study expands on three previous databases of concepts to over 4000 words including nouns, verbs, adjectives, and other parts of speech to examine the relation of semantic similarity statistics on semantic priming in tandem with other psycholinguistic variables.

...read moreread less

Abstract: A limiting factor in understanding memory and language is often the availability of large numbers of stimuli to use and explore in experimental studies. In this study, we expand on three previous databases of concepts to over 4000 words including nouns, verbs, adjectives, and other parts of speech. Participants in the study were asked to provide lists of features for each concept presented (a semantic feature production task), which were combined with previous research in this area. These feature lists for each concept were then coded into their root word form and affixes (i.e., cat and s for cats) to explore the impact of word form on semantic similarity measures, which are often calculated by comparing concept feature lists (feature overlap). All concept features, coding, and calculated similarity information is provided in a searchable database for easy access and utilization for future researchers when designing experiments that use word stimuli. The final database of word pairs was combined with the Semantic Priming Project to examine the relation of semantic similarity statistics on semantic priming in tandem with other psycholinguistic variables.

...read moreread less

Journal Article•DOI•

[...]

Matthew Andreotta¹, Robertus Wahyu N. Nugroho², Mark J. Hurlstone¹, Fabio Boschetti¹, Simon Farrell¹, Iain Walker³, Cecile Paris² - Show less +3 more•Institutions (3)

University of Western Australia¹, Commonwealth Scientific and Industrial Research Organisation², University of Canberra³

Randomized single-case AB phase designs: Prospects and pitfalls.

TL;DR: This work presents a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions.

...read moreread less

Abstract: To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (non-negative matrix inter-joint factorization; topic alignment) and qualitative (thematic analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.

...read moreread less

Journal Article•DOI•

[...]

Bart Michiels¹, Patrick Onghena¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Dec 2019-Behavior Research Methods

TL;DR: It is concluded that randomized AB phase designs are experimentally valid, but that the power of these designs is sufficient only for large treatment effects and large sample sizes.

...read moreread less

Abstract: Single-case experimental designs (SCEDs) are increasingly used in fields such as clinical psychology and educational psychology for the evaluation of treatments and interventions in individual participants. The AB phase design, also known as the interrupted time series design, is one of the most basic SCEDs used in practice. Randomization can be included in this design by randomly determining the start point of the intervention. In this article, we first introduce this randomized AB phase design and review its advantages and disadvantages. Second, we present some data-analytical possibilities and pitfalls related to this design and show how the use of randomization tests can mitigate or remedy some of these pitfalls. Third, we demonstrate that the Type I error of randomization tests in randomized AB phase designs is under control in the presence of unexpected linear trends in the data. Fourth, we report the results of a simulation study investigating the effect of unexpected linear trends on the power of the randomization test in randomized AB phase designs. The implications of these results for the analysis of randomized AB phase designs are discussed. We conclude that randomized AB phase designs are experimentally valid, but that the power of these designs is sufficient only for large treatment effects and large sample sizes. For small treatment effects and small sample sizes, researchers should turn to more complex phase designs, such as randomized ABAB phase designs or randomized multiple-baseline designs.

...read moreread less

Journal Article•DOI•

The reliability of attentional biases for emotional images measured using a free-viewing eye-tracking paradigm

[...]

Christopher R. Sears¹, Leanne Quigley¹, Amanda Fernandez¹, Kristin R. Newman¹, Keith S. Dobson¹ - Show less +1 more•Institutions (1)

University of Calgary¹

01 Dec 2019-Behavior Research Methods

TL;DR: This study examined the internal consistency and 6-month test–retest reliability of attentional bias indices derived from a free-viewing eye-tracking paradigm and found that the fixation indices for threat, sad, and positive images over the full 8-s display was moderate to excellent.

...read moreread less

Abstract: Cognitive theories of anxiety disorders and depression posit that attentional biases play a role in the development, maintenance, and recurrence of these disorders. Several paradigms have been used to examine attentional biases in anxiety and depression, but information on the reliability of different attentional bias indices is limited. In this study we examined the internal consistency and 6-month test-retest reliability of attentional bias indices derived from a free-viewing eye-tracking paradigm. Participants completed two versions of an eye-tracking task-one that used naturalistic images as stimuli, and one that used face images. In both tasks, participants viewed displays of four images, each display consisting of one threat image, one sad image, one positive/happy image, and one neutral image. The internal consistency of the fixation indices (dwell time and number of fixations) for threat, sad, and positive images over the full 8-s display was moderate to excellent. When the 8-s display was divided into 2-s intervals, the dwell times for the 0- to 2-s and 2- to 4-s intervals showed lower reliability, particularly for the face images. The attentional bias indices for the naturalistic images showed adequate to good stability over the test-retest period, whereas the test-retest reliability estimates for the face images were in the low to moderate range. The implications of these results for attentional bias research are discussed.

...read moreread less

Journal Article•DOI•

An open-source remote heart rate imaging method with practical apparatus and algorithms.

[...]

Koen M van der Kooij¹, Marnix Naber¹•Institutions (1)

Utrecht University¹

Russian Sentence Corpus: Benchmark measures of eye movements in reading in Russian.

TL;DR: It is reported that rPPG is highly accurate when the camera is aimed at facial skin tissue, but that the heart rate recordings from wrist regions are less reliable, and recordings from the calves are unreliable.

...read moreread less

Abstract: Recent developments in computer science and digital image processing have enabled the extraction of an individual’s heart pulsations from pixel changes in recorded video images of human skin surfaces. This method is termed remote photoplethysmography (rPPG) and can be achieved with consumer-level cameras (e.g., a webcam or mobile camera). The goal of the present publication is two-fold. First, we aim to organize future rPPG software developments in a tractable and nontechnical manner, such that the public gains access to a basic open-source rPPG code, comes to understand its utility, and can follow its most recent progressions. The second goal is to investigate rPPG’s accuracy in detecting heart rates from the skin surfaces of several body parts after physical exercise and under ambient lighting conditions with a consumer-level camera. We report that rPPG is highly accurate when the camera is aimed at facial skin tissue, but that the heart rate recordings from wrist regions are less reliable, and recordings from the calves are unreliable. Facial rPPG remained accurate despite the high heart rates after exercise. The proposed research procedures and the experimental findings provide guidelines for future studies on rPPG.

...read moreread less

Journal Article•DOI•

[...]

Anna Laurinavichyute¹, Irina A. Sekerina², Svetlana Alexeeva³, Kristine Bagdasaryan¹, Reinhold Kliegl⁴ - Show less +1 more•Institutions (4)

National Research University – Higher School of Economics¹, City University of New York², Saint Petersburg State University³, University of Potsdam⁴