scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Vision in 2017"


Journal ArticleDOI
TL;DR: In this article, a bag-of-features approach is proposed to capture consistencies or departures therefrom of the statistics of real-world images in different color spaces and transform domains.
Abstract: Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a "bag of feature maps" approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies-or departures therefrom-of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models.

204 citations


Journal ArticleDOI
TL;DR: This work demonstrates that strong serial dependencies occur within both perceptual and decision processes, with very little contribution from the response, and shows that serial dependence is important for perception, exploiting temporal redundancies to enhance perceptual efficiency.
Abstract: There is good evidence that biological perceptual systems exploit the temporal continuity in the world: When asked to reproduce or rate sequentially presented stimuli (varying in almost any dimension), subjects typically err toward the previous stimulus, exhibiting so-called "serial dependence." At this stage it is unclear whether the serial dependence results from averaging within the perceptual system, or at later stages. Here we demonstrate that strong serial dependencies occur within both perceptual and decision processes, with very little contribution from the response. Using a technique to isolate pure perceptual effects (Fritsche, Mostert, & de Lange, 2017), we show strong serial dependence in orientation judgements, over the range of orientations where theoretical considerations predict the effects to be maximal. In a second experiment we dissociate responses from stimuli to show that serial dependence occurs only between stimuli, not responses. The results show that serial dependence is important for perception, exploiting temporal redundancies to enhance perceptual efficiency.

141 citations


Journal ArticleDOI
TL;DR: This review will cover the history of OCTA and survey its most important clinical applications, and the salient problems in the interpretation and analysis of OCTa are described, and recent advances are highlighted.
Abstract: Citation: Gao SS, Jia Y, Zhang M, et al. Optical coherence tomography angiography. Invest Ophthalmol Vis Sci. 2016;57:OCT27–OCT36. DOI:10.1167/iovs.15-19043 Optical coherence tomography angiography (OCTA) is a noninvasive approach that can visualize blood vessels down to the capillary level. With the advent of high-speed OCT and efficient algorithms, practical OCTA of ocular circulation is now available to ophthalmologists. Clinical investigations that used OCTA have increased exponentially in the past few years. This review will cover the history of OCTA and survey its most important clinical applications. The salient problems in the interpretation and analysis of OCTA are described, and recent advances are highlighted.

125 citations


Journal ArticleDOI
TL;DR: QUEST+ is a Bayesian adaptive psychometric testing method that allows an arbitrary number of stimulus dimensions, psychometric function parameters, and trial outcomes and provides a general method to accelerate data collection in many areas of cognitive and perceptual science.
Abstract: QUEST+ is a Bayesian adaptive psychometric testing method that allows an arbitrary number of stimulus dimensions, psychometric function parameters, and trial outcomes. It is a generalization and extension of the original QUEST procedure and incorporates many subsequent developments in the area of parametric adaptive testing. With a single procedure, it is possible to implement a wide variety of experimental designs, including conventional threshold measurement; measurement of psychometric function parameters, such as slope and lapse; estimation of the contrast sensitivity function; measurement of increment threshold functions; measurement of noise-masking functions; Thurstone scale estimation using pair comparisons; and categorical ratings on linear and circular stimulus dimensions. QUEST+ provides a general method to accelerate data collection in many areas of cognitive and perceptual science.

105 citations


Journal ArticleDOI
TL;DR: The Places Database is described, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world.
Abstract: The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.

102 citations


Journal ArticleDOI
TL;DR: Bayesian microsaccade detection (BMD) is developed, which performs inference based on a simple statistical model of eye positions, which returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined.
Abstract: Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the "true" microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise-although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the "true" microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package.

89 citations


Journal ArticleDOI
TL;DR: This work investigates the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants, and demonstrates the dynamics with which individuals adapt to changes in the environment's statistics.
Abstract: Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

81 citations


Journal ArticleDOI
TL;DR: It is suggested that mid-level perceptual features, including curvature, contain cues to whether an object may be animate versus manmade, and that the visual system capitalizes on these early cues to facilitate object detection, recognition, and classification.
Abstract: While substantial work has focused on how the visual system achieves basic-level recognition, less work has asked about how it supports large-scale distinctions between objects, such as animacy and real-world size. Previous work has shown that these dimensions are reflected in our neural object representations (Konkle & Caramazza, 2013), and that objects of different real-world sizes have different mid-level perceptual features (Long, Konkle, Cohen, & Alvarez, 2016). Here, we test the hypothesis that animates and manmade objects also differ in mid-level perceptual features. To do so, we generated synthetic images of animals and objects that preserve some texture and form information ("texforms"), but are not identifiable at the basic level. We used visual search efficiency as an index of perceptual similarity, as search is slower when targets are perceptually similar to distractors. Across three experiments, we find that observers can find animals faster among objects than among other animals, and vice versa, and that these results hold when stimuli are reduced to unrecognizable texforms. Electrophysiological evidence revealed that this mixed-animacy search advantage emerges during early stages of target individuation, and not during later stages associated with semantic processing. Lastly, we find that perceived curvature explains part of the mixed-animacy search advantage and that observers use perceived curvature to classify texforms as animate/inanimate. Taken together, these findings suggest that mid-level perceptual features, including curvature, contain cues to whether an object may be animate versus manmade. We propose that the visual system capitalizes on these early cues to facilitate object detection, recognition, and classification.

77 citations


Journal ArticleDOI
TL;DR: An introduction to measurements and methods to study the human visual white matter using diffusion MRI and a range of findings from recent studies on connections between different visual field maps, the effects of visual impairment on the white matter, and the properties underlying networks that process visual information supporting visual face recognition are reviewed.
Abstract: Visual neuroscience has traditionally focused much of its attention on understanding the response properties of single neurons or neuronal ensembles. The visual white matter and the long-range neuronal connections it supports are fundamental in establishing such neuronal response properties and visual function. This review article provides an introduction to measurements and methods to study the human visual white matter using diffusion MRI. These methods allow us to measure the microstructural and macrostructural properties of the white matter in living human individuals; they allow us to trace long-range connections between neurons in different parts of the visual system and to measure the biophysical properties of these connections. We also review a range of findings from recent studies on connections between different visual field maps, the effects of visual impairment on the white matter, and the properties underlying networks that process visual information supporting visual face recognition. Finally, we discuss a few promising directions for future studies. These include new methods for analysis of MRI data, open datasets that are becoming available to study brain connectivity and white matter properties, and open source software for the analysis of these data.

67 citations


Journal ArticleDOI
TL;DR: The results suggest that to estimate stiffness, the visual system strongly relies on measures of the extent to which an object changes shape in response to forces.
Abstract: Nonrigid materials, such as jelly, rubber, or sponge move and deform in distinctive ways depending on their stiffness. Which cues do we use to infer stiffness? We simulated cubes of varying stiffness and optical appearance (e.g., wood, metal, wax, jelly) being subjected to two kinds of deformation: (a) a rigid cylinder pushing downwards into the cube to various extents (shape change, but little motion: shape dominant), (b) a rigid cylinder retracting rapidly from the cube (same initial shapes, differences in motion: motion dominant). Observers rated the apparent softness/hardness of the cubes. In the shape-dominant condition, ratings mainly depended on how deeply the rod penetrated the cube and were almost unaffected by the cube's intrinsic physical properties. In contrast, in the motion-dominant condition, ratings varied systematically with the cube's intrinsic stiffness, and were less influenced by the extent of the perturbation. We find that both results are well predicted by the absolute magnitude of deformation, suggesting that when asked to judge stiffness, observers resort to simple heuristics based on the amount of deformation. Softness ratings for static, unperturbed cubes varied substantially and systematically depending on the optical properties. However, when animated, the ratings were again dominated by the extent of the deformation, and the effect of optical appearance was negligible. Together, our results suggest that to estimate stiffness, the visual system strongly relies on measures of the extent to which an object changes shape in response to forces.

62 citations


Journal ArticleDOI
TL;DR: Significant effects of set size and delay duration on both model-free and model-based measures of dispersion are found and relative stability of working memory even at higher set sizes is consistent with earlier results for motion direction and spatial frequency.
Abstract: We used a delayed-estimation paradigm to characterize the joint effects of set size (one, two, four, or six) and delay duration (1, 2, 3, or 6 s) on visual working memory for orientation. We conducted two experiments: one with delay durations blocked, another with delay durations interleaved. As dependent variables, we examined four model-free metrics of dispersion as well as precision estimates in four simple models. We tested for effects of delay time using analyses of variance, linear regressions, and nested model comparisons. We found significant effects of set size and delay duration on both model-free and model-based measures of dispersion. However, the effect of delay duration was much weaker than that of set size, dependent on the analysis method, and apparent in only a minority of subjects. The highest forgetting slope found in either experiment at any set size was a modest 1.14°/s. As secondary results, we found a low rate of nontarget reports, and significant estimation biases towards oblique orientations (but no dependence of their magnitude on either set size or delay duration). Relative stability of working memory even at higher set sizes is consistent with earlier results for motion direction and spatial frequency. We compare with a recent study that performed a very similar experiment.

Journal ArticleDOI
TL;DR: It is argued that in this particular experimental condition, the visual system is able to synthesize a higher resolution percept from multiple views of a poorly resolved image, a hypothesis that might extend the current understanding of how fixational eye motion serves high acuity vision.
Abstract: Even during fixation, our eyes are constantly in motion, creating an ever-changing signal in each photoreceptor. Neuronal processes can exploit such transient signals to serve spatial vision, but it is not known how our finest visual acuity-one that we use for deciphering small letters or identifying distant faces and objects-is maintained when confronted with such change. We used an adaptive optics scanning laser ophthalmoscope to precisely control the spatiotemporal input on a photoreceptor scale in human observers during a visual discrimination task under conditions with habitual, cancelled or otherwise manipulated retinal image motion. We found that when stimuli moved, acuities were about 25% better than when no motion occurred, regardless of whether that motion was self-induced, a playback of similar motion, or an external simulation. We argue that in our particular experimental condition, the visual system is able to synthesize a higher resolution percept from multiple views of a poorly resolved image, a hypothesis that might extend the current understanding of how fixational eye motion serves high acuity vision.

Journal ArticleDOI
TL;DR: Findings suggest that individual differences in motion sensitivity reflect decision making and attentional control rather than integration in MT/V5 or V3A, and that its neural underpinnings may be related to Duncan's “multiple-demand” (MD) system.
Abstract: Research in the Visual Development Unit on "dorsal stream vulnerability' (DSV) arose from research in two somewhat different areas. In the first, using cortical milestones for local and global processing from our neurobiological model, we identified cerebral visual impairment in infants in the first year of life. In the second, using photo/videorefraction in population refractive screening programs, we showed that infant spectacle wear could reduce the incidence of strabismus and amblyopia, but many preschool children, who had been significantly hyperopic earlier, showed visuo-motor and attentional deficits. This led us to compare developing dorsal and ventral streams, using sensitivity to global motion and form as signatures, finding deficits in motion sensitivity relative to form in children with Williams syndrome, or perinatal brain injury in hemiplegia or preterm birth. Later research showed that this "DSV" was common across many disorders, both genetic and acquired, from autism to amblyopia. Here, we extend DSV to be a cluster of problems, common to many disorders, including poor motion sensitivity, visuo-motor spatial integration for planning actions, attention, and number skills. In current research, we find that individual differences in motion coherence sensitivity in typically developing children are correlated with MRI measures of area variations in parietal lobe, fractional anisotropy (from TBSS) of the superior longitudinal fasciculus, and performance on tasks of mathematics and visuo-motor integration. These findings suggest that individual differences in motion sensitivity reflect decision making and attentional control rather than integration in MT/V5 or V3A. Its neural underpinnings may be related to Duncan's "multiple-demand" (MD) system.

Journal ArticleDOI
TL;DR: Comparison of the present data with an earlier study by Uchikawa & Boynton (1987) suggests that some changes in the Japanese color lexicon have occurred over the last 30 years.
Abstract: Despite numerous prior studies, important questions about the Japanese color lexicon persist, particularly about the number of Japanese basic color terms and their deployment across color space. Here, 57 native Japanese speakers provided monolexemic terms for 320 chromatic and 10 achromatic Munsell color samples. Through k-means cluster analysis we revealed 16 statistically distinct Japanese chromatic categories. These included eight chromatic basic color terms (aka/red, ki/yellow, midori/green, ao/blue, pink, orange, cha/brown, and murasaki/purple) plus eight additional terms: mizu ("water")/light blue, hada ("skin tone")/peach, kon ("indigo")/dark blue, matcha ("green tea")/yellow-green, enji/maroon, oudo ("sand or mud")/mustard, yamabuki ("globeflower")/gold, and cream. Of these additional terms, mizu was used by 98% of informants, and emerged as a strong candidate for a 12th Japanese basic color term. Japanese and American English color-naming systems were broadly similar, except for color categories in one language (mizu, kon, teal, lavender, magenta, lime) that had no equivalent in the other. Our analysis revealed two statistically distinct Japanese motifs (or color-naming systems), which differed mainly in the extension of mizu across our color palette. Comparison of the present data with an earlier study by Uchikawa & Boynton (1987) suggests that some changes in the Japanese color lexicon have occurred over the last 30 years.

Journal ArticleDOI
TL;DR: A recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched.
Abstract: Our visual environment is full of texture-"stuff" like cloth, bark, or gravel as distinct from "things" like dresses, trees, or paths-and humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery ("parafoveal") and when observers were able to make eye movements to all three patches ("inspection"). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler Portilla and Simoncelli model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the Portilla and Simoncelli model (nine compared to four textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesize textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.

Journal ArticleDOI
TL;DR: This study reports on data from ∼13,000 observers who were surveyed online and shows that assumptions about the illumination of the dress affects the subjective interpretation of observers, compared to demographic factors, such as age or gender, which have a relatively smaller influence.
Abstract: There has been considerable interest in a stimulus ("the dress") that yields starkly divergent subjective color percepts between observers. It has been proposed that individual differences in the subjective interpretation of this stimulus are due to the different assumptions that individuals make about how the dress was illuminated. In this study, we address this possible explanation empirically by reporting on data from ∼13,000 observers who were surveyed online. We show that assumptions about the illumination of the dress-i.e., whether the stimulus was illuminated by natural or artificial light or whether it was in a shadow-strongly affects the subjective interpretation of observers, compared to demographic factors, such as age or gender, which have a relatively smaller influence. We interpret these findings in a Bayesian framework by also showing that prior exposure to long- or short-wavelength lights due to circadian type shapes the subjective experience of the dress stimulus in theoretically expected ways.

Journal ArticleDOI
TL;DR: It is shown that the perceived color of the dress is negatively correlated with the assumed illumination along the daylight locus, and this finding confirms the idea that the perceivable colors of the Dress depend on the assumptions about the illumination.
Abstract: Millions of Internet users around the world challenged science by asking why a certain photo of a dress led different observers to have surprisingly different judgments about the color of the dress. The reason this particular photo produces so diverse a variety of judgments presumably is that the photo allows a variety of interpretations about the illumination of the dress. The most obvious explanation from color science should be that observers have different implicit assumptions about the illumination in the photo. We show that the perceived color of the dress is negatively correlated with the assumed illumination along the daylight locus. Moreover, by manipulating the observers' assumptions prior to seeing the photo, we can steer how observers will see the colors of the dress. These findings confirm the idea that the perceived colors of the dress depend on the assumptions about the illumination. The phenomenon illustrates the power of unconscious inferences and implicit assumptions in perception.

Journal ArticleDOI
TL;DR: Overall, these findings reveal that an engagement of working memory can have an impact on microsaccadic rate, consistent with the view thatMicrosaccade generation is pervious to top-down processes.
Abstract: Microsaccades are tiny eye movements that individuals perform unconsciously during fixation. Despite that the nature and the functions of microsaccades are still lively debated, recent evidence has shown an association between these micro eye movements and higher order cognitive processes. Here, in two experiments, we specifically focused on working memory and addressed whether differential memory load could be reflected in a modulation of microsaccade dynamics. In Experiment 1, participants memorized a numerical sequence composed of either two (low-load condition) or five digits (high-load condition), appearing at fixation. The results showed a reduction in the microsaccadic rate in the high-load compared to the low-load condition. In Experiment 2, five red or green digits were always presented at fixation. Participants either memorized the color (low-load condition) or the five digits (high-load condition). Hence, visual stimuli were exactly the same in both conditions. Consistent with Experiment 1, microsaccadic rate was lower in the high-load than in the low-load condition. Overall, these findings reveal that an engagement of working memory can have an impact on microsaccadic rate, consistent with the view that microsaccade generation is pervious to top-down processes.

Journal ArticleDOI
TL;DR: The findings suggest participants integrate shape, motion, and optical cues to infer stiffness, with optical cues playing a major role for the range of stimuli.
Abstract: Visually inferring the stiffness of objects is important for many tasks but is challenging because, unlike optical properties (e.g., gloss), mechanical properties do not directly affect image values. Stiffness must be inferred either (a) by recognizing materials and recalling their properties (associative approach) or (b) from shape and motion cues when the material is deformed (estimation approach). Here, we investigated interactions between these two inference types. Participants viewed renderings of unfamiliar shapes with 28 materials (e.g., nickel, wax, cork). In Experiment 1, they viewed nondeformed, static versions of the objects and rated 11 material attributes (e.g., soft, fragile, heavy). The results confirm that the optical materials elicited a wide range of apparent properties. In Experiment 2, using a blue plastic material with intermediate apparent softness, the objects were subjected to physical simulations of 12 shape-transforming processes (e.g., twisting, crushing, stretching). Participants rated softness and extent of deformation. Both correlated with the physical magnitude of deformation. Experiment 3 combined variations in optical cues with shape cues. We find that optical cues completely dominate. Experiment 4 included the entire motion sequence of the deformation, yielding significant contributions of optical as well as motion cues. Our findings suggest participants integrate shape, motion, and optical cues to infer stiffness, with optical cues playing a major role for our range of stimuli.

Journal ArticleDOI
TL;DR: Results for the example of tarachopic amblyopia are presented, showing that scrambled vision is indeed an apt interpretation and the eidolon factory is an algorithm that generates stimuli in such a meaningful and transparent way.
Abstract: Meanings and qualities are fundamental attributes of visual awareness. We propose ''eidolons'' as a tool for establishing equivalence classes of appearance along meaningful dimensions. The ''eidolon factory'' is an algorithm that generates stimuli in such a meaningful and transparent way. The algorithm allows us to focus on location, scale, and size of perceptually salient structures, proto-objects, and perhaps even semantics rather than global overall parameters, such as contrast and spatial frequency. The eidolon factory is based on models of the psychogenesis of visual awareness. It affects the image in terms of the disruption of image structure across space and spatial scales. This is a very general method with many potential applications. We illustrate a few instances. We present results for the example of tarachopic amblyopia, showing that scrambled vision is indeed an apt interpretation.

Journal ArticleDOI
TL;DR: This work implements a standard early spatial vision model in an image-computable way, allowing it to take arbitrary luminance images as input, and shows that contrast gain-control with the fitted parameters results in a very sparse encoding of luminance information, in line with notions from efficient coding.
Abstract: A large part of classical visual psychophysics was concerned with the fundamental question of how pattern information is initially encoded in the human visual system. From these studies a relatively standard model of early spatial vision emerged, based on spatial frequency and orientation-specific channels followed by an accelerating nonlinearity and divisive normalization: contrast gain-control. Here we implement such a model in an image-computable way, allowing it to take arbitrary luminance images as input. Testing our implementation on classical psychophysical data, we find that it explains contrast detection data including the ModelFest data, contrast discrimination data, and oblique masking data, using a single set of parameters. Leveraging the advantage of an image-computable model, we test our model against a recent dataset using natural images as masks. We find that the model explains these data reasonably well, too. To explain data obtained at different presentation durations, our model requires different parameters to achieve an acceptable fit. In addition, we show that contrast gain-control with the fitted parameters results in a very sparse encoding of luminance information, in line with notions from efficient coding. Translating the standard early spatial vision model to be image-computable resulted in two further insights: First, the nonlinear processing requires a denser sampling of spatial frequency and orientation than optimal coding suggests. Second, the normalization needs to be fairly local in space to fit the data obtained with natural image masks. Finally, our image-computable model can serve as tool in future quantitative analyses: It allows optimized stimuli to be used to test the model and variants of it, with potential applications as an image-quality metric. In addition, it may serve as a building block for models of higher level processing.

Journal ArticleDOI
TL;DR: Results indeed confirm that contextual cues predominantly affect white perceivers, and provide direct support for the idea that the ambiguity in the perceived color of the dress can be explained by the different assumptions that people have about the illumination chromaticity in the foreground of the scene.
Abstract: We investigated whether people who report different colors for #thedress do so because they have different assumptions about the illumination in #thedress scene. We introduced a spherical illumination probe (Koenderink, Pont, van Doorn, Kappers, & Todd, 2007) into the original photograph, placed in fore-, or background of the scene and-for each location-let observers manipulate the probe's chromaticity, intensity and the direction of the illumination. Their task was to adjust the probe such that it would appear as a white sphere in the scene. When the probe was located in the foreground, observers who reported the dress to be white (white perceivers) tended to produce bluer adjustments than observers who reported it as blue (blue perceivers). Blue perceivers tended to perceive the illumination as less chromatic. There were no differences in chromaticity settings between perceiver types for the probe placed in the background. Perceiver types also did not differ in their illumination intensity and direction estimates across probe locations. These results provide direct support for the idea that the ambiguity in the perceived color of the dress can be explained by the different assumptions that people have about the illumination chromaticity in the foreground of the scene. In a second experiment we explore the possibility that blue perceivers might overall be less sensitive to contextual cues, and measure white and blue perceivers' dress color matches and labels for manipulated versions of the original photo. Results indeed confirm that contextual cues predominantly affect white perceivers.

Journal ArticleDOI
TL;DR: A robust oblique effect is revealed, whereby asymptotic performance for oblique orientations was substantially lower than for cardinal orientations, which is interpreted as the result of multiplicative attenuation of contrast responses for oblow orientations.
Abstract: Orientation perception is not comparable across all orientations-a phenomenon commonly referred to as the oblique effect. Here, we first assessed the interaction between stimulus contrast and the oblique effect. Specifically, we examined whether the impairment in behavioral performance for oblique versus cardinal orientations is best explained by a contrast or a response gain modulation of the contrast psychometric function. Results revealed a robust oblique effect, whereby asymptotic performance for oblique orientations was substantially lower than for cardinal orientations, which we interpret as the result of multiplicative attenuation of contrast responses for oblique orientations. Next, we assessed how orientation anisotropies interact with attention by measuring psychometric functions for orientations under low or high attentional load. Interestingly, attentional load affects the performance for cardinal and oblique orientations differently: While it multiplicatively attenuates contrast psychometric functions for both cardinal and oblique orientation conditions, the magnitude of this effect is greater for the obliques. Thus, having less attentional resources available seems to impair the response for oblique orientations to a larger degree than for cardinal orientations.


Journal ArticleDOI
TL;DR: Observers do not have strong initial priors for distribution shapes and quickly learn simple ones but have the ability to adjust their representations to more complex feature distributions as information accumulates with further repetitions of the same distractor distribution.
Abstract: We recently demonstrated that observers are capable of encoding not only summary statistics, such as mean and variance of stimulus ensembles, but also the shape of the ensembles. Here, for the first time, we show the learning dynamics of this process, investigate the possible priors for the distribution shape, and demonstrate that observers are able to learn more complex distributions, such as bimodal ones. We used speeding and slowing of response times between trials (intertrial priming) in visual search for an oddly oriented line to assess internal models of distractor distributions. Experiment 1 demonstrates that two repetitions are sufficient for enabling learning of the shape of uniform distractor distributions. In Experiment 2, we compared Gaussian and uniform distractor distributions, finding that following only two repetitions Gaussian distributions are represented differently than uniform ones. Experiment 3 further showed that when distractor distributions are bimodal (with a 30° distance between two uniform intervals), observers initially treat them as uniform, and only with further repetitions do they begin to treat the distributions as bimodal. In sum, observers do not have strong initial priors for distribution shapes and quickly learn simple ones but have the ability to adjust their representations to more complex feature distributions as information accumulates with further repetitions of the same distractor distribution.

Journal ArticleDOI
TL;DR: A surprising difference was found between the two types of movements: Although saccades targeted the physical location of the flashes, pointing movements were strongly biased toward the perceived location (about 63% of the perceptual illusion).
Abstract: Our visual system allows us to localize objects in the world and plan motor actions toward them. We have recently shown that the localization of moving objects differs between perception and saccadic eye movements (Lisi & Cavanagh, 2015), suggesting different localization mechanisms for perception and action. This finding, however, could reflect a unique feature of the saccade system rather than a general dissociation between perception and action. To disentangle these hypotheses, we compared object localization between saccades and hand movements. We flashed brief targets on top of double-drift stimuli (moving Gabors with the internal pattern drifting orthogonally to their displacement, inducing large distortions in perceived location and direction) and asked participants to point or make saccades to them. We found a surprising difference between the two types of movements: Although saccades targeted the physical location of the flashes, pointing movements were strongly biased toward the perceived location (about 63% of the perceptual illusion). The same bias was found when pointing movements were made in open-loop conditions (without vision of the hand). These results indicate that dissociations are present between different types of actions (not only between action and perception) and that visual processing for saccadic eye movements differs from that for other actions. Because the position bias in the double-drift stimulus depends on a persisting influence of past sensory signals, we suggest that spatial maps for saccades might reflect only recent, short-lived signals, and the spatial representations supporting conscious perception and hand movements integrate visual input over longer temporal intervals.

Journal ArticleDOI
TL;DR: It is suggested that presaccadic attention exerts its influence on vision in a spatially and feature-selective manner, enhancing performance and sharpening feature tuning at the future gaze location before the eyes start moving.
Abstract: Saccadic eye movements cause a rapid sweep of the visual image across the retina and bring the saccade's target into high-acuity foveal vision. Even before saccade onset, visual processing is selectively prioritized at the saccade target. To determine how this presaccadic attention shift exerts its influence on visual selection, we compare the dynamics of perceptual tuning curves before movement onset at the saccade target and in the opposite hemifield. Participants monitored a 30-Hz sequence of randomly oriented gratings for a target orientation. Combining a reverse correlation technique previously used to study orientation tuning in neurons and general additive mixed modeling, we found that perceptual reports were tuned to the target orientation. The gain of orientation tuning increased markedly within the last 100 ms before saccade onset. In addition, we observed finer orientation tuning right before saccade onset. This increase in gain and tuning occurred at the saccade target location and was not observed at the incongruent location in the opposite hemifield. The present findings suggest, therefore, that presaccadic attention exerts its influence on vision in a spatially and feature-selective manner, enhancing performance and sharpening feature tuning at the future gaze location before the eyes start moving.

Journal ArticleDOI
TL;DR: In this article, the authors explore the plausibility of the central claims of this approach in the context of a task where subjects walk through a virtual environment performing interception, avoidance, and path following.
Abstract: While it is universally acknowledged that both bottom up and top down factors contribute to allocation of gaze, we currently have limited understanding of how top-down factors determine gaze choices in the context of ongoing natural behavior. One purely top-down model by Sprague, Ballard, and Robinson (2007) suggests that natural behaviors can be understood in terms of simple component behaviors, or modules, that are executed according to their reward value, with gaze targets chosen in order to reduce uncertainty about the particular world state needed to execute those behaviors. We explore the plausibility of the central claims of this approach in the context of a task where subjects walk through a virtual environment performing interceptions, avoidance, and path following. Many aspects of both walking direction choices and gaze allocation are consistent with this approach. Subjects use gaze to reduce uncertainty for task-relevant information that is used to inform action choices. Notably the addition of motion to peripheral objects did not affect fixations when the objects were irrelevant to the task, suggesting that stimulus saliency was not a major factor in gaze allocation. The modular approach of independent component behaviors is consistent with the main aspects of performance, but there were a number of deviations suggesting that modules interact. Thus the model forms a useful, but incomplete, starting point for understanding top-down factors in active behavior.

Journal ArticleDOI
TL;DR: It is shown that the advantage of peripheral vision in scene recognition, as well as the efficiency advantage for central vision, can be replicated using state-of-the-art deep neural network models and proposed and provided support for the hypothesis that the peripheral advantage comes from the inherent usefulness of peripheral features.
Abstract: What are the roles of central and peripheral vision in human scene recognition? Larson and Loschky (2009) showed that peripheral vision contributes more than central vision in obtaining maximum scene recognition accuracy. However, central vision is more efficient for scene recognition than peripheral, based on the amount of visual area needed for accurate recognition. In this study, we model and explain the results of Larson and Loschky (2009) using a neurocomputational modeling approach. We show that the advantage of peripheral vision in scene recognition, as well as the efficiency advantage for central vision, can be replicated using state-of-the-art deep neural network models. In addition, we propose and provide support for the hypothesis that the peripheral advantage comes from the inherent usefulness of peripheral features. This result is consistent with data presented by Thibaut, Tran, Szaffarczyk, and Boucart (2014), who showed that patients with central vision loss can still categorize natural scenes efficiently. Furthermore, by using a deep mixture-of-experts model ("The Deep Model," or TDM) that receives central and peripheral visual information on separate channels simultaneously, we show that the peripheral advantage emerges naturally in the learning process: When trained to categorize scenes, the model weights the peripheral pathway more than the central pathway. As we have seen in our previous modeling work, learning creates a transform that spreads different scene categories into different regions in representational space. Finally, we visualize the features for the two pathways, and find that different preferences for scene categories emerge for the two pathways during the training process.

Journal ArticleDOI
TL;DR: The size and shape of the FOV covers the region of the visual field that contains relevant information for reading English, which varies between subjects, and may prove useful in predicting behavioral aspects of reading.
Abstract: Skilled reading requires rapidly recognizing letters and word forms; people learn this skill best for words presented in the central visual field. Measurements over the last decade have shown that when children learn to read, responses within ventral occipito-temporal cortex (VOT) become increasingly selective to word forms. We call these regions the VOT reading circuitry (VOTRC). The portion of the visual field that evokes a response in the VOTRC is called the field of view (FOV). We measured the FOV of the VOTRC and found that it is a small subset of the entire field of view available to the human visual system. For the typical subject, the FOV of the VOTRC in each hemisphere is contralaterally and foveally biased. The FOV of the left VOTRC extends ∼9° into the right visual field and ∼4° into the left visual field along the horizontal meridian. The FOV of the right VOTRC is roughly mirror symmetric to that of the left VOTRC. The size and shape of the FOV covers the region of the visual field that contains relevant information for reading English. It may be that the size and shape of the FOV, which varies between subjects, will prove useful in predicting behavioral aspects of reading.