scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Vision in 2015"


Journal ArticleDOI
TL;DR: The purpose of this article is to describe the fundamental stimulation paradigms for steady-state visual evoked potentials and to illustrate these principles through research findings across a range of applications in vision science.
Abstract: Periodic visual stimulation and analysis of the resulting steady-state visual evoked potentials were first introduced over 80 years ago as a means to study visual sensation and perception. From the first single-channel recording of responses to modulated light to the present use of sophisticated digital displays composed of complex visual stimuli and high-density recording arrays, steady-state methods have been applied in a broad range of scientific and applied settings.The purpose of this article is to describe the fundamental stimulation paradigms for steady-state visual evoked potentials and to illustrate these principles through research findings across a range of applications in vision science.

875 citations


Journal ArticleDOI
TL;DR: Examination of participants' orientation report errors for targets crowded by similar or dissimilar flankers concludes that-at least for the displays used here-crowding likely results from a probabilistic substitution of targets and distractors, regardless of target-distractor feature similarity.
Abstract: Visual crowding refers to a phenomenon whereby objects that appear in the periphery of the visual field are more difficult to identify when embedded within clutter. Pooling models assert that crowding results from an obligatory averaging or other combination of target and distractor features that occurs prior to awareness. One well-known manifestation of pooling is feature averaging, with which the features of target and nontarget stimuli are combined at an early stage of visual processing. Conversely, substitution models assert that crowding results from binding a target and nearby distractors to incorrect spatial locations. Recent evidence suggests that substitution predominates when target–flanker feature similarity is low, but it is unclear whether averaging or substitution best explains crowding when similarity is high. Here, we examined participants' orientation report errors for targets crowded by similar or dissimilar flankers. In two experiments, we found evidence inconsistent with feature averaging regardless of target–flanker similarity. However, the observed data could be accommodated by a probabilistic substitution model in which participants occasionally “swap” a target for a distractor. Thus, we conclude that—at least for the displays used here—crowding likely results from a probabilistic substitution of targets and distractors, regardless of target–distractor feature similarity.

169 citations


Journal ArticleDOI
TL;DR: The aim is to review the literature supporting the important role of crowding in developmental dyslexia and propose new possible studies in order to clarify whether the observed excessive crowding could be a cause rather than an effect of DD.
Abstract: Developmental dyslexia (DD) is the most common neurodevelopmental disorder (about 10% of children across cultures) characterized by severe difficulties in learning to read. According to the dominant view, DD is considered a phonological processing impairment that might be linked to a cross-modal, letter-to-speech sound integration deficit. However, new theories-supported by consistent data-suggest that mild deficits in low-level visual and auditory processing can lead to DD. This evidence supports the probabilistic and multifactorial approach for DD. Among others, an interesting visual deficit that is often associated with DD is excessive visual crowding. Crowding is defined as difficulty in the ability to recognize objects when surrounded by similar items. Crowding, typically observed in peripheral vision, could be modulated by attentional processes. The direct consequence of stronger crowding on reading is the inability to recognize letters when they are surrounded by other letters. This problem directly translates to reading at a slower speed and being more prone to making errors while reading. Our aim is to review the literature supporting the important role of crowding in DD. Moreover, we are interested in proposing new possible studies in order to clarify whether the observed excessive crowding could be a cause rather than an effect of DD. Finally, we also suggest possible remediation and even prevention programs that could be based on reducing the crowding in children with or at risk for DD without involving any phonological or orthographic training.

165 citations


Journal ArticleDOI
TL;DR: This fast periodic visual stimulation approach provides a direct signature of natural face categorization and opens an avenue for efficiently measuring categorization responses of complex visual stimuli in the human brain.
Abstract: We designed a fast periodic visual stimulation approach to identify an objective signature of face categorization incorporating both visual discrimination (from nonface objects) and generalization (across widely variable face exemplars). Scalp electroencephalographic (EEG) data were recorded in 12 human observers viewing natural images of objects at a rapid frequency of 5.88 images/s for 60 s. Natural images of faces were interleaved every five stimuli, i.e., at 1.18 Hz (5.88/5). Face categorization was indexed by a high signal-to-noise ratio response, specifically at an oddball face stimulation frequency of 1.18 Hz and its harmonics. This face-selective periodic EEG response was highly significant for every participant, even for a single 60-s sequence, and was generally localized over the right occipitotemporal cortex. The periodicity constraint and the large selection of stimuli ensured that this selective response to natural face images was free of low-level visual confounds, as confirmed by the absence of any oddball response for phase-scrambled stimuli. Without any subtraction procedure, time-domain analysis revealed a sequence of differential face-selective EEG components between 120 and 400 ms after oddball face image onset, progressing from medial occipital (P1-faces) to occipitotemporal (N1-faces) and anterior temporal (P2-faces) regions. Overall, this fast periodic visual stimulation approach provides a direct signature of natural face categorization and opens an avenue for efficiently measuring categorization responses of complex visual stimuli in the human brain

150 citations


Journal ArticleDOI
TL;DR: The hypothesis that appearance (i.e., how stimuli look) is a good predictor for crowding is put forward, because both crowding and appearance reflect the output of recurrent processing rather than interactions during the initial phase of visual processing.
Abstract: In crowding, the perception of a target strongly deteriorates when neighboring elements are presented Crowding is usually assumed to have the following characteristics (a) Crowding is determined only by nearby elements within a restricted region around the target (Bouma's law) (b) Increasing the number of flankers can only deteriorate performance (c) Target-flanker interference is feature-specific These characteristics are usually explained by pooling models, which are well in the spirit of classic models of object recognition In this review, we summarize recent findings showing that crowding is not determined by the above characteristics, thus, challenging most models of crowding We propose that the spatial configuration across the entire visual field determines crowding Only when one understands how all elements of a visual scene group with each other, can one determine crowding strength We put forward the hypothesis that appearance (ie, how stimuli look) is a good predictor for crowding, because both crowding and appearance reflect the output of recurrent processing rather than interactions during the initial phase of visual processing

112 citations


Journal ArticleDOI
TL;DR: The results suggest that top-down expectations play a larger role when bottom-up input is ambiguous, in line with predictive processing accounts of perception, and support the hypothesis that conscious access depends on verification of perceptual predictions.
Abstract: How do expectations influence transitions between unconscious and conscious perceptual processing? According to the influential predictive processing framework, perceptual content is determined by predictive models of the causes of sensory signals. On one interpretation, conscious contents arise when predictive models are verified by matching sensory input (minimizing prediction error). On another, conscious contents arise when surprising events falsify current perceptual predictions. Finally, the cognitive impenetrability account posits that conscious perception is not affected by such higher level factors. To discriminate these positions, we combined predictive cueing with continuous flash suppression (CFS) in which the relative contrast of a target image gradually increases over time. In four experiments we established that expected stimuli enter consciousness faster than neutral or unexpected stimuli. These effects are difficult to account for in terms of response priming, pre-existing stimulus associations, or the attentional mechanisms that cause asynchronous temporal order judgments (of simultaneously presented stimuli). Our results further suggest that top-down expectations play a larger role when bottom-up input is ambiguous, in line with predictive processing accounts of perception. Taken together, our findings support the hypothesis that conscious access depends on verification of perceptual predictions.

110 citations


Journal ArticleDOI
TL;DR: The findings support the idea that low-level contributions to perceived lightness are primarily determined by the luminance contrast at surface boundaries, and not the mechanisms that are responsible for producing many of the lightness phenomena observed in human perception.
Abstract: Spatial filtering models are currently a widely accepted mechanistic account of human lightness perception. Their popularity can be ascribed to two reasons: They correctly predict how human observers perceive a variety of lightness illusions, and the processing steps involved in the models bear an apparent resemblance with known physiological mechanisms at early stages of visual processing. Here, we tested the adequacy of these models by probing their response to stimuli that have been modified by adding narrowband noise. Psychophysically, it has been shown that noise in the range of one to five cycles per degree (cpd) can drastically reduce the strength of some lightness phenomena, while noise outside this range has little or no effect on perceived lightness. Choosing White's illusion (White, 1979) as a test case, we replicated and extended the psychophysical results, and found that none of the spatial filtering models tested was able to reproduce the spatial frequency specific effect of narrowband noise. We discuss the reasons for failure for each model individually, but we argue that the failure is indicative of the general inadequacy of this class of spatial filtering models. Given the present evidence we do not believe that spatial filtering models capture the mechanisms that are responsible for producing many of the lightness phenomena observed in human perception. Instead we think that our findings support the idea that low-level contributions to perceived lightness are primarily determined by the luminance contrast at surface boundaries.

107 citations


Journal ArticleDOI
TL;DR: The findings show that the human visual system can effectively use peripheral and foveal information about object features and that visual perception does not simply correspond to disconnected snapshots during each fixation.
Abstract: Due to the inhomogenous visual representation across the visual field, humans use peripheral vision to select objects of interest and foveate them by saccadic eye movements for further scrutiny. Thus, there is usually peripheral information available before and foveal information after a saccade. In this study we investigated the integration of information across saccades. We measured reliabilities--i.e., the inverse of variance-separately in a presaccadic peripheral and a postsaccadic foveal orientation--discrimination task. From this, we predicted trans-saccadic performance and compared it to observed values. We show that the integration of incongruent peripheral and foveal information is biased according to their relative reliabilities and that the reliability of the trans-saccadic information equals the sum of the peripheral and foveal reliabilities. Both results are consistent with and indistinguishable from statistically optimal integration according to the maximum-likelihood principle. Additionally, we tracked the gathering of information around the time of the saccade with high temporal precision by using a reverse correlation method. Information gathering starts to decline between 100 and 50 ms before saccade onset and recovers immediately after saccade offset. Altogether, these findings show that the human visual system can effectively use peripheral and foveal information about object features and that visual perception does not simply correspond to disconnected snapshots during each fixation.

103 citations


Journal ArticleDOI
TL;DR: This psychophysics study presented participants with spatially congruent and discrepant audiovisual signals at four levels of visual reliability and demonstrated that Bayesian CI is fundamental for integrating signals of variable reliabilities.
Abstract: To obtain a coherent percept of the environment, the brain should integrate sensory signals from common sources and segregate those from independent sources. Recent research has demonstrated that humans integrate audiovisual information during spatial localization consistent with Bayesian Causal Inference (CI). However, the decision strategies that human observers employ for implicit and explicit CI remain unclear. Further, despite the key role of sensory reliability in multisensory integration, Bayesian CI has never been evaluated across a wide range of sensory reliabilities. This psychophysics study presented participants with spatially congruent and discrepant audiovisual signals at four levels of visual reliability. Participants localized the auditory signals (implicit CI) and judged whether auditory and visual signals came from common or independent sources (explicit CI). Our results demonstrate that humans employ model averaging as a decision strategy for implicit CI; they report an auditory spatial estimate that averages the spatial estimates under the two causal structures weighted by their posterior probabilities. Likewise, they explicitly infer a common source during the common-source judgment when the posterior probability for a common source exceeds a fixed threshold of 0.5. Critically, sensory reliability shapes multisensory integration in Bayesian CI via two distinct mechanisms: First, higher sensory reliability sensitizes humans to spatial disparity and thereby sharpens their multisensory integration window. Second, sensory reliability determines the relative signal weights in multisensory integration under the assumption of a common source. In conclusion, our results demonstrate that Bayesian CI is fundamental for integrating signals of variable reliabilities.

102 citations


Journal ArticleDOI
TL;DR: An inexpensive LED-based five-primary photostimulator that can control the excitations of rods, S-, M-, L-cones, and melanopsin-containing ipRGCs in humans at constant background photoreceptor excitation levels is introduced.
Abstract: Intrinsically photosensitive retinal ganglion cells (ipRGCs) can respond to light directly through self-contained photopigment, melanopsin. IpRGCs also receive synaptic inputs from rods and cones. Thus, studying ipRGC functions requires a novel photostimulating method that can account for all of the photoreceptor inputs. Here, we introduced an inexpensive LED-based five-primary photostimulator that can control the excitations of rods, S-, M-, L-cones, and melanopsin-containing ipRGCs in humans at constant background photoreceptor excitation levels, a critical requirement for studying the adaptation behavior of ipRGCs with rod, cone, or melanopsin input. We described the theory and technical aspects (including optics, electronics, software, and calibration) of the five-primary photostimulator. Then we presented two preliminary studies using the photostimulator we have implemented to measure melanopsin-mediated pupil responses and temporal contrast sensitivity function (TCSF). The results showed that the S-cone input to pupil responses was antagonistic to the L-, M- or melanopsin inputs, consistent with an S-OFF and (L + M)-ON response property of primate ipRGCs (Dacey et al., 2005). In addition, the melanopsin-mediated TCSF had a distinctive pattern compared with L + M or S-cone mediated TCSF. Other than controlling individual photoreceptor excitation independently, the five-primary photostimulator has the flexibility in presenting stimuli modulating any combination of photoreceptor excitations, which allows researchers to study the mechanisms by which ipRGCs combine various photoreceptor inputs.

88 citations


Journal ArticleDOI
TL;DR: The results demonstrate that perceived face attractiveness was pulled by the attractiveness level of facial images encountered up to 6 s prior, and this perceptual pull increased as the difference in attractiveness between previous and current stimuli increased.
Abstract: The perception of attractiveness is essential for choices of food, object, and mate preference. Like perception of other visual features, perception of attractiveness is stable despite constant changes of image properties due to factors like occlusion, visual noise, and eye movements. Recent results demonstrate that perception of low-level stimulus features and even more complex attributes like human identity are biased towards recent percepts. This effect is often called serial dependence. Some recent studies have suggested that serial dependence also exists for perceived facial attractiveness, though there is also concern that the reported effects are due to response bias. Here we used an attractiveness-rating task to test the existence of serial dependence in perceived facial attractiveness. Our results demonstrate that perceived face attractiveness was pulled by the attractiveness level of facial images encountered up to 6 s prior. This effect was not due to response bias and did not rely on the previous motor response. This perceptual pull increased as the difference in attractiveness between previous and current stimuli increased. Our results reconcile previously conflicting findings and extend previous work, demonstrating that sequential dependence in perception operates across different levels of visual analysis, even at the highest levels of perceptual interpretation.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths, relying on activation dynamics via spatially limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions.
Abstract: In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data.

Journal ArticleDOI
TL;DR: These findings support previous research suggesting that viewers form representations of both the exemplars and the set average, and provide further evidence that the average representation is invariant to several high-level characteristics.
Abstract: Research on ensemble encoding has found that viewers extract summary information from sets of similar items. When shown a set of four faces of different people, viewers merge identity information from the exemplars into a representation of the set average. Here, we presented sets containing unconstrained images of the same identity. In response to a subsequent probe, viewers recognized the exemplars accurately. However, they also reported having seen a merged average of these images. Importantly, viewers reported seeing the matching average of the set (the average of the four presented images) more often than a nonmatching average (an average of four other images of the same identity). These results were consistent for both simultaneous and sequential presentation of the sets. Our findings support previous research suggesting that viewers form representations of both the exemplars and the set average. Given the unconstrained nature of the photographs, we also provide further evidence that the average representation is invariant to several high-level characteristics.

Journal ArticleDOI
TL;DR: Pupil size and attentional enhancement indicate that pupillometry is not only an index of retinal and brainstem function, but also an objective measure of complex constructs such as attention and its effects on sensory processing.
Abstract: We measured pupil size in adult human subjects while we manipulated both the luminance of the visual scene and the location of attention. We found that, with central fixation maintained, pupillary constrictions and dilations evoked by peripheral luminance increments and decrements are larger when spatial attention is covertly (i.e., with no eye movements) directed to the stimulus region versus when it is directed to the opposite hemifield. Irrespective of the size of the attended region (focused at the center of the stimulus or spread within and outside the stimulus), the attentional enhancement is large: more than 20% of the response to stimuli in the unattended hemifield. This indicates that a sizable portion of this simple ocular behavior—often considered a subcortical ''reflex''—in fact depends on cortical processing. Together, these features indicate that pupillometry is not only an index of retinal and brainstem function, but also an objective measure of complex constructs such as attention and its effects on sensory processing.

Journal ArticleDOI
TL;DR: It is found that humans perform best when an oriented target is visible both before (peripherally) and after a saccade (foveally), suggesting that humans integrate the two views.
Abstract: We perceive a stable environment despite the fact that visual information is essentially acquired in a sequence of snapshots separated by saccadic eye movements. The resolution of these snapshots varies-high in the fovea and lower in the periphery-and thus the formation of a stable percept presumably relies on the fusion of information acquired at different resolutions. To test if, and to what extent, foveal and peripheral information are integrated, we examined human orientation-discrimination performance across saccadic eye movements. We found that humans perform best when an oriented target is visible both before (peripherally) and after a saccade (foveally), suggesting that humans integrate the two views. Integration relied on eye movements, as we found no evidence of integration when the target was artificially moved during stationary viewing. Perturbation analysis revealed that humans combine the two views using a weighted sum, with weights assigned based on the relative precision of foveal and peripheral representations, as predicted by ideal observer models. However, our subjects displayed a systematic overweighting of the fovea, relative to the ideal observer, indicating that human integration across saccades is slightly suboptimal.

Journal ArticleDOI
TL;DR: The results support the hypothesis that summary representations operate differently in different feature domains, and may be subserved by distinct mechanisms.
Abstract: Over the past 15 years, a number of behavioral studies have shown that the human visual system can extract the average value of a set of items along a variety of feature dimensions, often with great facility and accuracy. These efficient representations of sets of items are commonly referred to as summary representations, but very little is known about whether their computation constitutes a single unitary process or if it involves different mechanisms in different domains. Here, we asked participants to report the average value of a set of items presented serially over time in four different feature dimensions. We then measured the contribution of different parts of the information stream to the reported summaries. We found that this temporal weighting profile differs greatly across domains. Specifically, summaries of mean object location (Experiment 1) were influenced approximately 2.5 times more by earlier items than by later items. Summaries of mean object size (Experiment 1), mean facial expression (Experiment 2), and mean motion direction (Experiment 3), however, were more influenced by later items. These primacy and recency effects show that summary representations computed across time do not incorporate all items equally. Furthermore, our results support the hypothesis that summary representations operate differently in different feature domains, and may be subserved by distinct mechanisms.

Journal ArticleDOI
TL;DR: The results reinforce the notion that numerosity and texture are mediated by two distinct processes, depending on whether the individual elements are perceptually segregable, and which mechanism is engaged follows laws that determine crowding.
Abstract: We have recently provided evidence that the perception of number and texture density is mediated by two independent mechanisms: numerosity mechanisms at relatively low numbers, obeying Weber's law, and texture-density mechanisms at higher numerosities, following a square root law. In this study we investigated whether the switch between the two mechanisms depends on the capacity to segregate individual dots, and therefore follows similar laws to those governing visual crowding. We measured numerosity discrimination for a wide range of numerosities at three eccentricities. We found that the point where the numerosity regime (Weber's law) gave way to the density regime (square root law) depended on eccentricity. In central vision, the regime changed at 2.3 dots/°2, while at 15° eccentricity, it changed at 0.5 dots/°2, three times less dense. As a consequence, thresholds for low numerosities increased with eccentricity, while at higher numerosities thresholds remained constant. We further showed that like crowding, the regime change was independent of dot size, depending on distance between dot centers, not distance between dot edges or ink coverage. Performance was not affected by stimulus contrast or blur, indicating that the transition does not depend on low-level stimulus properties. Our results reinforce the notion that numerosity and texture are mediated by two distinct processes, depending on whether the individual elements are perceptually segregable. Which mechanism is engaged follows laws that determine crowding.

Journal ArticleDOI
TL;DR: The range of expressions that participants perceived as lying at the center of the trajectory narrowed in both conditions, a pattern that is not predicted by the central-channel model but can be explained by the opponent-coding model.
Abstract: Facial expression is theorized to be visually represented in a multidimensional expression space, relative to a norm. This norm-based coding is typically argued to be implemented by a two-pool opponent coding system. However, the evidence supporting the opponent coding of expression cannot rule out the presence of a third channel tuned to the center of each coded dimension. Here we used a paradigm not previously applied to facial expression to determine whether a central-channel model is necessary to explain expression coding. Participants identified expressions taken from a fear/antifear trajectory, first at baseline and then in two adaptation conditions. In one condition, participants adapted to the expression at the center of the trajectory. In the other condition, participants adapted to alternating images from the two ends of the trajectory. The range of expressions that participants perceived as lying at the center of the trajectory narrowed in both conditions, a pattern that is not predicted by the central-channel model but can be explained by the opponent-coding model. Adaptation to the center of the trajectory also increased identification of both fear and antifear, which may indicate a functional benefit for adaptive coding of facial expression.

Journal ArticleDOI
TL;DR: Understanding of the mechanisms underlying the RF is undergoing a quantum leap, which includes top-down effects guiding attention and tuned to task-relevant information complement the bottom-up analysis.
Abstract: Following the pioneering studies of the receptive field (RF), the RF concept gained further significance for visual perception by the discovery of input effects from beyond the classical RF. These studies demonstrated that neuronal responses could be modulated by stimuli outside their RFs, consistent with the perception of induced brightness, color, orientation, and motion. Lesion scotomata are similarly modulated perceptually from the surround by RFs that have migrated from the interior to the outer edge of the scotoma and in this way provide filling-in of the void. Large RFs are advantageous to this task. In higher visual areas, such as the middle temporal and inferotemporal lobe, RFs increase in size and lose most of their retinotopic organization while encoding increasingly complex features. Whereas lower-level RFs mediate perceptual filling-in, contour integration, and figure-ground segregation, RFs at higher levels serve the perception of grouping by common fate, biological motion, and other biologically relevant stimuli, such as faces. Studies in alert monkeys while freely viewing natural scenes showed that classical and nonclassical RFs cooperate in forming representations of the visual world. Today, our understanding of the mechanisms underlying the RF is undergoing a quantum leap. What had started out as a hierarchical feed-forward concept for simple stimuli, such as spots, lines, and bars, now refers to mechanisms involving ascending, descending, and lateral signal flow. By extension of the bottom-up paradigm, RFs are nowadays understood as adaptive processors, enabling the predictive coding of complex scenes. Top-down effects guiding attention and tuned to task-relevant information complement the bottom-up analysis.

Journal ArticleDOI
TL;DR: It is demonstrated that there is substantial variance in display difficulty within a single set size, suggesting that limits based on the number of individual items alone cannot explain working memory storage.
Abstract: Influential slot and resource models of visual working memory make the assumption that items are stored in memory as independent units, and that there are no interactions between them. Consequently, these models predict that the number of items to be remembered (the set size) is the primary determinant of working memory performance, and therefore these models quantify memory capacity in terms of the number and quality of individual items that can be stored. Here we demonstrate that there is substantial variance in display difficulty within a single set size, suggesting that limits based on the number of individual items alone cannot explain working memory storage. We asked hundreds of participants to remember the same sets of displays, and discovered that participants were highly consistent in terms of which items and displays were hardest or easiest to remember. Although a simple grouping or chunking strategy could not explain this individual-display variability, a model with multiple, interacting levels of representation could explain some of the display-by-display differences. Specifically, a model that includes a hierarchical representation of items plus the mean and variance of sets of the colors on the display successfully accounts for some of the variability across displays. We conclude that working memory representations are composed only in part of individual, independent object representations, and that a major factor in how many items are remembered on a particular display is interitem representations such as perceptual grouping, ensemble, and texture representations.

Journal ArticleDOI
TL;DR: Recent evidence is reviewed suggesting that patterns of response in high-level visual areas may be better explained by response to image properties that are characteristic of different object categories.
Abstract: Neuroimaging research over the past 20 years has begun to reveal a picture of how the human visual system is organized. A key distinction that has arisen from these studies is the difference in the organization of low-level and high-level visual regions. Low-level regions contain topographic maps that are tightly linked to properties of the image. In contrast, high-level visual areas are thought to be arranged in modules that are tightly linked to categorical or semantic information in the image. To date, an unresolved question has been how the strong functional selectivity for object categories in high-level visual regions might arise from the image-based representations found in low-level visual regions. Here, we review recent evidence suggesting that patterns of response in high-level visual areas may be better explained by response to image properties that are characteristic of different object categories.

Journal ArticleDOI
TL;DR: The results suggest that the nonvisual cognitive processing can suppress MS rate, and that the extent of such suppression is related to the task difficulty.
Abstract: Microsaccades (MSs) are small eye movements that occur during attempted visual fixation. While most studies concerning MSs focus on their roles in visual processing, some also suggest that the MS rate can be modulated by the amount of mental exertion involved in nonvisual processing. The current study focused on the effects of task difficulty on MS rate in a nonvisual mental arithmetic task. Experiment 1 revealed a general inverse relationship between MS rate and subjective task difficulty. During Experiment 2, three task phases with different requirements were identified: during calculation (between stimulus presentation and response), postcalculation (after reporting an answer), and a control condition (undergoing a matching sequence of events without the need to make a calculation). MS rate was observed to approximately double from the during-calculation phase to the postcalculation phase, and was significantly higher in the control condition compared to postcalculation. Only during calculation was the MS rate generally decreased with greater task difficulty. Our results suggest that the nonvisual cognitive processing can suppress MS rate, and that the extent of such suppression is related to the task difficulty.

Journal ArticleDOI
TL;DR: Data show that value modulates perception in a similar way as the volitional deployment of attention, even though the relative effect of value is largely unaffected by an attention task.
Abstract: Our perception does not provide us with an exact imprint of the outside world, but is continuously adapted to our internal expectations, task sets, and behavioral goals. Although effects of reward-or value in general-on perception therefore seem likely, how valuation modulates perception and how such modulation relates to attention is largely unknown. We probed effects of reward on perception by using a binocular-rivalry paradigm. Distinct gratings drifting in opposite directions were presented to each observer's eyes. To objectify their subjective perceptual experience, the optokinetic nystagmus was used as measure of current perceptual dominance. In a first experiment, one of the percepts was either rewarded or attended. We found that reward and attention similarly biased perception. In a second experiment, observers performed an attentionally demanding task either on the rewarded stimulus, the other stimulus, or both. We found that-on top of an attentional effect on perception-at each level of attentional load, reward still modulated perception by increasing the dominance of the rewarded percept. Similarly, penalizing one percept increased dominance of the other at each level of attentional load. In turn, rewarding-and similarly nonpunishing-a percept yielded performance benefits that are typically associated with selective attention. In conclusion, our data show that value modulates perception in a similar way as the volitional deployment of attention, even though the relative effect of value is largely unaffected by an attention task.

Journal ArticleDOI
TL;DR: It is shown how the probability (and additive) summation formulas can be used to simulate psychometric functions, which when fitted with Weibull functions make signature predictions for how thresholds and psychometric function slopes vary as a function of τ, n, and Q.
Abstract: Many studies have investigated how multiple stimuli combine to reach threshold. There are broadly speaking two ways this can occur: additive summation (AS) where inputs from the different stimuli add together in a single mechanism, or probability summation (PS) where different stimuli are detected independently by separate mechanisms. PS is traditionally modeled under high threshold theory (HTT); however, tests have shown that HTT is incorrect and that signal detection theory (SDT) is the better framework for modeling summation. Modeling the equivalent of PS under SDT is, however, relatively complicated, leading many investigators to use Monte Carlo simulations for the predictions. We derive formulas that employ numerical integration to predict the proportion correct for detecting multiple stimuli assuming PS under SDT, for the situations in which stimuli are either equal or unequal in strength. Both formulas are general purpose, calculating performance for forced-choice tasks with M alternatives, n stimuli, in Q monitored mechanisms, each subject to a non-linear transducer with exponent τ. We show how the probability (and additive) summation formulas can be used to simulate psychometric functions, which when fitted with Weibull functions make signature predictions for how thresholds and psychometric function slopes vary as a function of τ, n, and Q. We also show how one can fit the formulas directly to real psychometric functions using data from a binocular summation experiment, and show how one can obtain estimates of τ and test whether binocular summation conforms more to PS or AS. The methods described here can be readily applied using software functions newly added to the Palamedes toolbox.

Journal ArticleDOI
TL;DR: It is found that increasing the number of alternatives of the forced-choice task greatly improved the efficiency of CSF assessment in both simulation and psychophysical studies.
Abstract: The contrast sensitivity function (CSF) provides a fundamental characterization of spatial vision, important for basic and clinical applications, but its long testing times have prevented easy, widespread assessment. The original quick CSF method was developed using a two-alternative forced choice (2AFC) grating orientation identification task (Lesmes, Lu, Baek, & Albright, 2010), and obtained precise CSF assessments while reducing the testing burden to only 50 trials. In this study, we attempt to further improve the efficiency of the quick CSF method by exploiting the properties of psychometric functions in multiple-alternative forced choice (m-AFC) tasks. A simulation study evaluated the effect of the number of alternatives m on the efficiency of the sensitivity measurement by the quick CSF method, and a psychophysical study validated the quick CS method in a 10AFC task. We found that increasing the number of alternatives of the forced-choice task greatly improved the efficiency of CSF assessment in both simulation and psychophysical studies. The quick CSF method based on a 10-letter identification task can assess the CSF with an averaged standard deviation of 0.10 decimal log unit in less than 2 minutes.

Journal ArticleDOI
TL;DR: The findings suggest that the availability of visual information about the terrain near a particular step is most essential during the latter half of the preceding step, which constitutes a critical control phase in the bipedal gait cycle.
Abstract: The aim of this study was to examine how visual information is used to control stepping during locomotion over terrain that demands precision in the placement of the feet. More specifically, we sought to determine the point in the gait cycle at which visual information about a target is no longer needed to guide accurate foot placement. Subjects walked along a path while stepping as accurately as possible on a series of small, irregularly spaced target footholds. In various conditions, each of the targets became invisible either during the step to the target or during the step to the previous target. We found that making targets invisible after toe off of the step to the target had little to no effect on stepping accuracy. However, when targets disappeared during the step to the previous target, foot placement became less accurate and more variable. The findings suggest that visual information about a target is used prior to initiation of the step to that target but is not needed to continuously guide the foot throughout the swing phase. We propose that this style of control is rooted in the biomechanics of walking, which facilitates an energetically efficient strategy in which visual information is primarily used to initialize the mechanical state of the body leading into a ballistic movement toward the target foothold. Taken together with previous studies, the findings suggest the availability of visual information about the terrain near a particular step is most essential during the latter half of the preceding step, which constitutes a critical control phase in the bipedal gait cycle. Language: en

Journal ArticleDOI
TL;DR: This work explored how various factors that the authors could manipulate influenced people's precision when intercepting virtual targets and found that temporal precision was highest for fast targets that subjects were free to intercept wherever they wished.
Abstract: People can hit rapidly moving balls with amazing precision. To determine how they manage to do so, we explored how various factors that we could manipulate influenced people's precision when intercepting virtual targets. We found that temporal precision was highest for fast targets that subjects were free to intercept wherever they wished. Temporal precision was much poorer when the point of interception was specified in advance. Examining responses to abrupt perturbations of the target's motion revealed that people adjusted where rather than when they would hit the target if given the choice. A model that combines judging how long it will take to reach the target's path with estimating the target's position at that time from its visually perceived position and velocity could account for the observed precision with reasonable values for all the parameters. The model considers all relevant sources of errors, together with the delays with which the various aspects can be adjusted. Our analysis provides a biologically plausible explanation for how light falling on the eye can guide the hand to intercept a moving ball with such high precision.

Journal ArticleDOI
TL;DR: A novel framework for estimating visual sensitivity using a continuous target-tracking task in concert with a dynamic internal model of human visual performance provides estimates of decision variable variance that are directly comparable with those obtained from traditional psychophysics.
Abstract: We introduce a novel framework for estimating visual sensitivity using a continuous target-tracking task in concert with a dynamic internal model of human visual performance. Observers used a mouse cursor to track the center of a two-dimensional Gaussian luminance blob as it moved in a random walk in a field of dynamic additive Gaussian luminance noise. To estimate visual sensitivity, we fit a Kalman filter model to the human tracking data under the assumption that humans behave as Bayesian ideal observers. Such observers optimally combine prior information with noisy observations to produce an estimate of target position at each time step. We found that estimates of human sensory noise obtained from the Kalman filter fit were highly correlated with traditional psychophysical measures of human sensitivity (R2 > 97%). Because each frame of the tracking task is effectively a "minitrial," this technique reduces the amount of time required to assess sensitivity compared with traditional psychophysics. Furthermore, because the task is fast, easy, and fun, it could be used to assess children, certain clinical patients, and other populations that may get impatient with traditional psychophysics. Importantly, the modeling framework provides estimates of decision variable variance that are directly comparable with those obtained from traditional psychophysics. Further, we show that easily computed summary statistics of the tracking data can also accurately predict relative sensitivity (i.e., traditional sensitivity to within a scale factor).

Journal ArticleDOI
TL;DR: It was found that averaging was harder for ensembles with more colors but that changing the number of elements had no effect on accuracy, supportive of a distributed-attention account of rapid color averaging.
Abstract: The ability to extract the mean of features from a rapidly viewed, heterogeneous array of objects has been demonstrated for a number of different visual properties. Few studies have previously investigated the rapid averaging of color; those that did had insufficient stimulus control or inappropriate methods. This study reports three experiments that directly test observers' ability to extract the mean hue from a rapidly presented, multielement color ensemble. In Experiment 1, ensembles varied in number of elements and number of colors. It was found that averaging was harder for ensembles with more colors but that changing the number of elements had no effect on accuracy, supportive of a distributed-attention account of rapid color averaging. Experiment 2a manipulated the hue range present in any single ensemble (varying the perceptual difference between ensemble elements) while still varying the number of colors. Range had a strong effect on ability to pick the mean hue. Experiment 2b found no effect of color categories on the accuracy or speed of mean selection. The results indicate that perceptual difference of elements is the dominant factor affecting ability to average rapidly seen color ensembles. Findings are discussed both in the context of perception and memory of multiple colors and ensemble perception generally.

Journal ArticleDOI
TL;DR: The results show that human subjects combine and optimally integrate vestibular and visual information, each signaling self-motion around a different rotation axis, suggesting that the experience of two temporally co-occurring but spatially unrelated self- Motion cues leads to inferring a common cause for these two initially unrelated sources of information about self- motion.
Abstract: Humans integrate multisensory information to reduce perceptual uncertainty when perceiving the world and self. Integration fails, however, if a common causality is not attributed to the sensory signals, as would occur in conditions of spatiotemporal discrepancies. In the case of passive self-motion, visual and vestibular cues are integrated according to statistical optimality, yet the extent of cue conflicts that do not compromise this optimality is currently underexplored. Here, we investigate whether human subjects can learn to integrate two arbitrary, but co-occurring, visual and vestibular cues of self-motion. Participants made size comparisons between two successive whole-body rotations using only visual, only vestibular, and both modalities together. The vestibular stimulus provided a yaw self-rotation cue, the visual a roll (Experiment 1) or pitch (Experiment 2) rotation cue. Experimentally measured thresholds in the bimodal condition were compared with theoretical predictions derived from the single-cue thresholds. Our results show that human subjects combine and optimally integrate vestibular and visual information, each signaling self-motion around a different rotation axis (yaw vs. roll and yaw vs. pitch). This finding suggests that the experience of two temporally co-occurring but spatially unrelated self-motion cues leads to inferring a common cause for these two initially unrelated sources of information about self-motion. We discuss our results in terms of specific task demands, cross-modal adaptation, and spatial compatibility. The importance of these results for the understanding of bodily illusions is also discussed.