scispace - formally typeset
Search or ask a question

Showing papers on "Visual perception published in 2019"


Journal ArticleDOI
09 Aug 2019-Science
TL;DR: This study developed and implemented several key technological advances that together enable writing neural activity into dozens of single neurons in mouse V1 at physiological time scales and developed an experimental approach to drive large numbers of individually specified neurons, distributed across V1 volumes and targeted on the basis of natural response-selectivity properties observed during specific visual stimuli.
Abstract: INTRODUCTION Perceptual experiences in mammals may arise from patterns of neural circuit activity in cerebral cortex. For example, primary visual cortex (V1) is causally capable of initiating visual perception; in human neurosurgery patients, V1 electrical microstimulation has been reported to elicit basic visual percepts including spots of light, patterns, shapes, motions, and colors. Related phenomena have been studied in laboratory animals using similar electrical stimulation procedures, although detailed investigation has been difficult because studies of percept initiation in cortex have not involved groups of neurons individually selected for stimulation. Therefore, it is not clear how different percepts arise in cortex, nor why some stimuli fail to generate perceptual experiences. Answering these questions will require working with basic cellular elements within cortical circuit architecture during perception. RATIONALE To understand how circuits in V1 are specifically involved in visual perception, it is essential to probe, at the most basic cellular level, how behaviorally consequential percepts are initiated and maintained. In this study, we developed and implemented several key technological advances that together enable writing neural activity into dozens of single neurons in mouse V1 at physiological time scales. These methods also enabled us to simultaneously read out the impact of this stimulation on downstream network activity across hundreds of nearby neurons. Successful training of alert mice to discriminate the precisely defined circuit inputs enabled systematic investigation of basic cortical dynamics underlying perception. RESULTS We developed an experimental approach to drive large numbers of individually specified neurons, distributed across V1 volumes and targeted on the basis of natural response-selectivity properties observed during specific visual stimuli (movies of drifting horizontal or vertical gratings). To implement this approach, we built an optical read-write system capable of kilohertz speed, millimeter-scale lateral scope, and three-dimensional (3D) access across superficial to deep layers of cortex to tens or hundreds of individually specified neurons. This system was integrated with an unusual microbial opsin gene identified by crystal structure–based genome mining: ChRmine, named after the deep-red color carmine. This newly identified opsin confers properties crucial for cellular-resolution percept-initiation experiments: red-shifted light sensitivity, extremely large photocurrents alongside millisecond spike-timing fidelity, and compatibility with simultaneous two-photon Ca2+ imaging. Using ChRmine together with custom holographic devices to create arbitrarily specified light patterns, we were able to measure naturally occurring large-scale 3D ensemble activity patterns during visual experience and then replay these natural patterns at the level of many individually specified cells. We found that driving specific ensembles of cells on the basis of natural stimulus-selectivity resulted in recruitment of a broad network with dynamical patterns corresponding to those elicited by real visual stimuli and also gave rise to the correctly selective behaviors even in the absence of visual input. This approach allowed mapping of the cell numbers, layers, network dynamics, and adaptive events underlying generation of behaviorally potent percepts in neocortex, via precise control over naturally occurring, widely distributed, and finely resolved temporal parameters and cellular elements of the corresponding neural representations. CONCLUSION The cortical population dynamics that emerged after optogenetic stimulation both predicted the correctly elicited behavior and mimicked the natural neural representations of visual stimuli.

396 citations


Journal ArticleDOI
03 May 2019-Science
TL;DR: An artificial neural network built to model the behavior of the target visual system was used to construct images predicted to either broadly activate large populations of neurons or selectively activate one population while keeping the others unchanged, demonstrating that these models can partially generalize and provide a shareable way to embed collective knowledge of visual processing.
Abstract: INTRODUCTION The pattern of light that strikes the eyes is processed and re-represented via patterns of neural activity in a “deep” series of six interconnected cortical brain areas called the ventral visual stream. Visual neuroscience research has revealed that these patterns of neural activity underlie our ability to recognize objects and their relationships in the world. Recent advances have enabled neuroscientists to build ever more precise models of this complex visual processing. Currently, the best such models are particular deep artificial neural network (ANN) models in which each brain area has a corresponding model layer and each brain neuron has a corresponding model neuron. Such models are quite good at predicting the responses of brain neurons, but their contribution to an understanding of primate visual processing remains controversial. RATIONALE These ANN models have at least two potential limitations. First, because they aim to be high-fidelity computerized copies of the brain, the total set of computations performed by these models is difficult for humans to comprehend in detail. In that sense, each model seems like a “black box,” and it is unclear what form of understanding has been achieved. Second, the generalization ability of these models has been questioned because they have only been tested on visual stimuli that are similar to those used to “teach” the models. Our goal was to assess both of these potential limitations through nonhuman primate neurophysiology experiments in a mid-level visual brain area. We sought to answer two questions: (i) Despite these ANN models’ opacity to simple “understanding,” is the knowledge embedded in them already useful for a potential application (i.e., neural activity control)? (ii) Do these models accurately predict brain responses to novel images? RESULTS We conducted several closed-loop neurophysiology experiments: After matching model neurons to each of the recorded brain neural sites, we used the model to synthesize entirely novel “controller” images based on the model’s implicit knowledge of how the ventral visual stream works. We then presented those images to each subject to test the model’s ability to control the subject’s neurons. In one test, we asked the model to try to control each brain neuron so strongly as to activate it beyond its typically observed maximal activation level. We found that the model-generated synthetic stimuli successfully drove 68% of neural sites beyond their naturally observed activation levels (chance level is 1%). In an even more stringent test, the model revealed that it is capable of selectively controlling an entire neural subpopulation, activating a particular neuron while simultaneously inactivating the other recorded neurons (76% success rate; chance is 1%). Next, we used these non-natural synthetic controller images to ask whether the model’s ability to predict the brain responses would hold up for these highly novel images. We found that the model was indeed quite accurate, predicting 54% of the image-evoked patterns of brain response (chance level is 0%), but it is clearly not yet perfect. CONCLUSION Even though the nonlinear computations of deep ANN models of visual processing are difficult to accurately summarize in a few words, they nonetheless provide a shareable way to embed collective knowledge of visual processing, and they can be refined by new knowledge. Our results demonstrate that the currently embedded knowledge already has potential application value (neural control) and that these models can partially generalize outside the world in which they “grew up.” Our results also show that these models are not yet perfect and that more accurate ANN models would produce even more precise neural control. Such noninvasive neural control is not only a potentially powerful tool in the hands of neuroscientists but also could lead to a new class of therapeutic applications.

265 citations


Journal ArticleDOI
TL;DR: An artificial optoelectronic neuromorphic device array to emulate the light‐adaptable synaptic functions (photopic and scotopic adaptation) of the biological visual perception system is presented and successfully demonstrates diverse visual synaptic functions such as phototriggered short‐term plasticity, long‐term potentiation, and neural facilitation.
Abstract: Emulating the biological visual perception system typically requires a complex architecture including the integration of an artificial retina and optic nerves with various synaptic behaviors. However, self-adaptive synaptic behaviors, which are frequently translated into visual nerves to adjust environmental light intensities, have been one of the serious challenges for the artificial visual perception system. Here, an artificial optoelectronic neuromorphic device array to emulate the light-adaptable synaptic functions (photopic and scotopic adaptation) of the biological visual perception system is presented. By employing an artificial visual perception circuit including a metal chalcogenide photoreceptor transistor and a metal oxide synaptic transistor, the optoelectronic neuromorphic device successfully demonstrates diverse visual synaptic functions such as phototriggered short-term plasticity, long-term potentiation, and neural facilitation. More importantly, the environment-adaptable perception behaviors at various levels of the light illumination are well reproduced by adjusting load transistor in the circuit, exhibiting the acts of variable dynamic ranges of biological system. This development paves a new way to fabricate an environmental-adaptable artificial visual perception system with profound implications for the field of future neuromorphic electronics.

149 citations


Journal ArticleDOI
TL;DR: There is a large overlap in neural processing during perception and imagery: neural representations of imagined and perceived stimuli are similar in the visual, parietal, and frontal cortex, and perceptions and imagery seem to rely on similar top-down connectivity.

143 citations


Journal ArticleDOI
TL;DR: The results suggest that the laterality of visuospatial attention affects the sensorimotor gating system depending on the attentional condition, and the process of visual information processing may differ between the left and right brain.
Abstract: The integration of multiple sensory modalities allows us to adapt to the environment of the outside world. It is widely known that visual stimuli interfere with the processing of auditory information, which is involved in the ability to pay attention. Additionally, visuospatial attention has the characteristic of laterality. It is unclear whether this laterality of visuospatial attention affects the processing of auditory stimuli. The sensorimotor gating system is a neurological process, which filters out unnecessary stimuli from environmental stimuli in the brain. Prepulse inhibition (PPI) is an operational measure of the sensorimotor gating system, which a weaker prestimulus (prepulse), such as a visual stimulus, inhibits the startle reflex elicited by a subsequent robust startling stimulus (pulse) such as a tone. Therefore, we investigated whether the visual stimulus from the left or right visual space affects the sensorimotor gating system in a "rest" task (low attentional condition) and a "selective attention" task (high attentional condition). In the selective attention task, we found that the target prepulse presented in the left and bilateral visual fields suppressed the startle reflex more than that presented in the right visual field. By contrast, there was no laterality of PPI in the no-target prepulse condition, and there was no laterality of PPI in the rest task. These results suggest that the laterality of visuospatial attention affects the sensorimotor gating system depending on the attentional condition. Moreover, the process of visual information processing may differ between the left and right brain.

117 citations


Journal ArticleDOI
TL;DR: An overview of eye-tracking technology, the perceptual and cognitive processes involved in medical interpretation, how eye tracking has been employed to understand medical interpretation and promote medical education and training, and some of the promises and challenges for future applications of this technology are provided.
Abstract: Inspecting digital imaging for primary diagnosis introduces perceptual and cognitive demands for physicians tasked with interpreting visual medical information and arriving at appropriate diagnoses and treatment decisions. The process of medical interpretation and diagnosis involves a complex interplay between visual perception and multiple cognitive processes, including memory retrieval, problem-solving, and decision-making. Eye-tracking technologies are becoming increasingly available in the consumer and research markets and provide novel opportunities to learn more about the interpretive process, including differences between novices and experts, how heuristics and biases shape visual perception and decision-making, and the mechanisms underlying misinterpretation and misdiagnosis. The present review provides an overview of eye-tracking technology, the perceptual and cognitive processes involved in medical interpretation, how eye tracking has been employed to understand medical interpretation and promote medical education and training, and some of the promises and challenges for future applications of this technology.

101 citations


Journal ArticleDOI
29 Nov 2019-eLife
TL;DR: In this paper, the authors quantified the amount of stimulus-specific information represented within the BOLD signal on every trial, and found a significant negative correlation which indicated that as post-stimulus alpha/beta power decreased, stimulus specific information increased.
Abstract: Massed synchronised neuronal firing is detrimental to information processing. When networks of task-irrelevant neurons fire in unison, they mask the signal generated by task-critical neurons. On a macroscopic level, such synchronisation can contribute to alpha/beta (8–30 Hz) oscillations. Reducing the amplitude of these oscillations, therefore, may enhance information processing. Here, we test this hypothesis. Twenty-one participants completed an associative memory task while undergoing simultaneous EEG-fMRI recordings. Using representational similarity analysis, we quantified the amount of stimulus-specific information represented within the BOLD signal on every trial. When correlating this metric with concurrently-recorded alpha/beta power, we found a significant negative correlation which indicated that as post-stimulus alpha/beta power decreased, stimulus-specific information increased. Critically, we found this effect in three unique tasks: visual perception, auditory perception, and visual memory retrieval, indicating that this phenomenon transcends both stimulus modality and cognitive task. These results indicate that alpha/beta power decreases parametrically track the fidelity of both externally-presented and internally-generated stimulus-specific information represented within the cortex.

94 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: This work proposes a novel video captioning approach that takes into account both visual perception and syntax representation learning to generate accurate descriptions of videos, and achieves substantially better performance than the state-of-the-art methods.
Abstract: Video captioning is a challenging task that involves not only visual perception but also syntax representation learning. Recent progress in video captioning has been achieved through visual perception, but syntax representation learning is still under-explored. We propose a novel video captioning approach that takes into account both visual perception and syntax representation learning to generate accurate descriptions of videos. Specifically, we use sentence templates composed of Part-of-Speech (POS) tags to represent the syntax structure of captions, and accordingly, syntax representation learning is performed by directly inferring POS tags from videos. The visual perception is implemented by a mixture model which translates visual cues into lexical words that are conditional on the learned syntactic structure of sentences. Thus, a video captioning task consists of two sub-tasks: video POS tagging and visual cue translation, which are jointly modeled and trained in an end-to-end fashion. Evaluations on three public benchmark datasets demonstrate that our proposed method achieves substantially better performance than the state-of-the-art methods, which validates the superiority of joint modeling of syntax representation learning and visual perception for video captioning.

93 citations


Journal ArticleDOI
15 May 2019
TL;DR: This work used DVS for visual perception and showed that the visual component can be bound with the system velocity to enable dynamic world perception, which creates an opportunity for real-time navigation and obstacle avoidance.
Abstract: The hallmark of modern robotics is the ability to directly fuse the platform's perception with its motoric ability-the concept often referred to as "active perception." Nevertheless, we find that action and perception are often kept in separated spaces, which is a consequence of traditional vision being frame based and only existing in the moment and motion being a continuous entity. This bridge is crossed by the dynamic vision sensor (DVS), a neuromorphic camera that can see the motion. We propose a method of encoding actions and perceptions together into a single space that is meaningful, semantically informed, and consistent by using hyperdimensional binary vectors (HBVs). We used DVS for visual perception and showed that the visual component can be bound with the system velocity to enable dynamic world perception, which creates an opportunity for real-time navigation and obstacle avoidance. Actions performed by an agent are directly bound to the perceptions experienced to form its own "memory." Furthermore, because HBVs can encode entire histories of actions and perceptions-from atomic to arbitrary sequences-as constant-sized vectors, autoassociative memory was combined with deep learning paradigms for controls. We demonstrate these properties on a quadcopter drone ego-motion inference task and the MVSEC (multivehicle stereo event camera) dataset.

92 citations


Journal ArticleDOI
TL;DR: It is shown that network units tuned to abstract numerosity, and therefore reminiscent of real number neurons, spontaneously emerge in a biologically inspired deep neural network that was merely trained on visual object recognition.
Abstract: Humans and animals have a "number sense," an innate capability to intuitively assess the number of visual items in a set, its numerosity. This capability implies that mechanisms to extract numerosity indwell the brain's visual system, which is primarily concerned with visual object recognition. Here, we show that network units tuned to abstract numerosity, and therefore reminiscent of real number neurons, spontaneously emerge in a biologically inspired deep neural network that was merely trained on visual object recognition. These numerosity-tuned units underlay the network's number discrimination performance that showed all the characteristics of human and animal number discriminations as predicted by the Weber-Fechner law. These findings explain the spontaneous emergence of the number sense based on mechanisms inherent to the visual system.

84 citations


Journal ArticleDOI
06 Mar 2019-PLOS ONE
TL;DR: Incidence and point prevalence of visual problems in acute stroke is alarmingly high, affecting over half the survivors, and crucial information can be provided on visual status and its functional significance to the stroke team, patients and carers, enabling early intervention.
Abstract: Background Visual problems are an under-reported sequela following stroke. The aim of this study is to report annual incidence and point prevalence of visual problems in an acute adult stroke population and to explore feasibility of early timing of visual assessment. Methods and findings Multi-centre acute stroke unit, prospective, epidemiology study (1st July 2014 to 30th June 2015). Orthoptists reviewed all patients with assessment of visual acuity, visual fields, ocular alignment, ocular motility, visual inattention and visual perception. 1033 patients underwent visual screening at a median of 3 days (IQR 2) and full visual assessment at a median of 4 days (IQR 7) after the incident stroke: 52% men, 48% women, mean age 73 years and 87% ischaemic strokes. Excluding pre-existent eye problems, the incidence of new onset visual sequelae was 48% for all stroke admissions and 60% in stroke survivors. Three quarters 752/1033 (73%) had visual problems (point prevalence): 56% with impaired central vision, 40% eye movement abnormalities, 28% visual field loss, 27% visual inattention, 5% visual perceptual disorders. 281/1033 (27%) had normal eye exams. Conclusions Incidence and point prevalence of visual problems in acute stroke is alarmingly high, affecting over half the survivors. For most, visual screening and full visual assessment was achieved within about 5 days of stroke onset. Crucial information can thus be provided on visual status and its functional significance to the stroke team, patients and carers, enabling early intervention.

Journal ArticleDOI
TL;DR: It is posited that adaptations to real-world structure collectively support optimal usage of limited cortical processing resources and take positional regularities into account will thus be essential for understanding efficient object vision in the real world.

Posted ContentDOI
Joshua H. Siegle1, Xiaoxuan Jia1, Séverine Durand1, Samuel D. Gale1, Corbett Bennett1, Nile Graddis1, Greggory Heller1, Tamina K. Ramirez1, Hannah Choi2, Hannah Choi1, Jennifer Luviano1, Peter A. Groblewski1, Ruweida Ahmed1, Anton Arkhipov1, Amy Bernard1, Yazan N. Billeh1, Brown D1, Michael A. Buice1, Nicholas Cain1, Shiella Caldejon1, Linzy Casal1, Cho A1, Chvilicek M1, Timothy C. Cox3, Kael Dai1, Daniel J. Denman4, Daniel J. Denman1, de Vries Sej1, Dietzman R1, Luke Esposito1, Colin Farrell1, David Feng1, John Galbraith1, Marina Garrett1, Emily Gelfand1, Nicole Hancock1, Julie A. Harris1, Robert Howard1, Brian Hu1, Hytnen R1, Ramakrishnan Iyer1, Jessett E1, Johnson K1, India Kato1, Justin T. Kiggins1, Sophie Lambert1, Jérôme Lecoq1, Peter Ledochowitsch1, Jung Hoon Lee1, Arielle Leon1, Yang Li1, Liang E1, Fuhui Long1, Kyla Mace1, Melchior J1, Daniel Millman1, Mollenkopf Ts1, Chelsea Nayan1, Lindsay Ng1, Kiet Ngo1, Thuyanh V. Nguyen1, Philip R. Nicovich1, Kat North1, Gabriel Koch Ocker1, Douglas R. Ollerenshaw1, Michael Oliver1, Marius Pachitariu, Jed Perkins1, Melissa Reding1, David Reid1, Miranda Robertson1, Kara Ronellenfitch1, Sam Seid1, Clifford R. Slaughterbeck1, Michelle Stoecklin1, David Sullivan1, Sutton B1, Jackie Swapp1, Carol L. Thompson1, Karly M. Turner1, Wayne Wakeman1, Jennifer D. Whitesell1, Derric Williams1, Ali Williford1, R.D. Young1, Hongkui Zeng1, Sarah A. Naylor1, John W. Phillips1, Robert Reid1, Stefan Mihalas1, Olsen1, Christof Koch1 
16 Oct 2019-bioRxiv
TL;DR: A large, open dataset that surveys spiking from units in six cortical and two thalamic regions responding to a battery of visual stimuli finds that inter-area functional connectivity mirrors the anatomical hierarchy from the Allen Mouse Brain Connectivity Atlas and provides a foundation for understanding coding and dynamics in the mouse cortico-thalamic visual system.
Abstract: The mammalian visual system, from retina to neocortex, has been extensively studied at both anatomical and functional levels. Anatomy indicates the cortico-thalamic system is hierarchical, but characterization of cellular-level functional interactions across multiple levels of this hierarchy is lacking, partially due to the challenge of simultaneously recording activity across numerous regions. Here, we describe a large, open dataset (part of the Allen Brain Observatory) that surveys spiking from units in six cortical and two thalamic regions responding to a battery of visual stimuli. Using spike cross-correlation analysis, we find that inter-area functional connectivity mirrors the anatomical hierarchy from the Allen Mouse Brain Connectivity Atlas. Classical functional measures of hierarchy, including visual response latency, receptive field size, phase-locking to a drifting grating stimulus, and autocorrelation timescale are all correlated with the anatomical hierarchy. Moreover, recordings during a visual task support the behavioral relevance of hierarchical processing. Overall, this dataset and the hierarchy we describe provide a foundation for understanding coding and dynamics in the mouse cortico-thalamic visual system.

Journal ArticleDOI
TL;DR: A comprehensive evaluation framework for visual recognition models that is underpinned by Visual Psychophysics is introduced, and over millions of procedurally rendered 3D scenes and 2D images, the performance of well-known convolutional neural networks is compared.
Abstract: By providing substantial amounts of data and standardized evaluation protocols, datasets in computer vision have helped fuel advances across all areas of visual recognition. But even in light of breakthrough results on recent benchmarks, it is still fair to ask if our recognition algorithms are doing as well as we think they are. The vision sciences at large make use of a very different evaluation regime known as Visual Psychophysics to study visual perception. Psychophysics is the quantitative examination of the relationships between controlled stimuli and the behavioral responses they elicit in experimental test subjects. Instead of using summary statistics to gauge performance, psychophysics directs us to construct item-response curves made up of individual stimulus responses to find perceptual thresholds, thus allowing one to identify the exact point at which a subject can no longer reliably recognize the stimulus class. In this article, we introduce a comprehensive evaluation framework for visual recognition models that is underpinned by this methodology. Over millions of procedurally rendered 3D scenes and 2D images, we compare the performance of well-known convolutional neural networks. Our results bring into question recent claims of human-like performance, and provide a path forward for correcting newly surfaced algorithmic deficiencies.

Journal ArticleDOI
TL;DR: Results indicate that selectively allocating attention toward important task-related information is the most important skill developed in experts across domains, whereas expertise in medicine is reflected more in an extended visual span.
Abstract: Perceptual-cognitive skills enable an individual to integrate environmental information with existing knowledge to be able to process stimuli and execute appropriate responses on complex tasks. Various underlying processes could explain how perceptual-cognitive skills impact on expert performance, as articulated in three theoretical accounts: (a) the long-term working memory theory, which argues that experts are able to encode and retrieve visual information from long-term working memory more than less experienced counterparts; (b) the information-reduction hypothesis, which suggests that experts can optimize the amount of information processed by selectively allocating their attentional resources to task relevant stimuli and ignore irrelevant stimuli; and (c) the holistic model of image perception, which proposes that experts are able to extract visual information from distal and para-foveal regions, allowing more efficient global-local processing of the scene. In this systematic review, we examine the validity of the aforementioned theories based on gaze features associated with the proposed processes. The information-reduction hypothesis was supported in most studies, except in medicine where the holistic model of image perception garners stronger support. These results indicate that selectively allocating attention toward important task-related information is the most important skill developed in experts across domains, whereas expertise in medicine is reflected more in an extended visual span. Large discrepancies in the outcomes of the papers reviewed suggest that there is not one theory that fits all domains of expertise. The review provides some essential building blocks, however, to help synthesize theoretical concepts across expertise domains. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Journal ArticleDOI
TL;DR: A novel deep generative multiview model for the accurate visual image reconstruction from the human brain activities measured by functional magnetic resonance imaging (fMRI) by using two view-specific generators with a shared latent space is proposed.
Abstract: Neural decoding, which aims to predict external visual stimuli information from evoked brain activities, plays an important role in understanding human visual system. Many existing methods are based on linear models, and most of them only focus on either the brain activity pattern classification or visual stimuli identification. Accurate reconstruction of the perceived images from the measured human brain activities still remains challenging. In this paper, we propose a novel deep generative multiview model for the accurate visual image reconstruction from the human brain activities measured by functional magnetic resonance imaging (fMRI). Specifically, we model the statistical relationships between the two views (i.e., the visual stimuli and the evoked fMRI) by using two view-specific generators with a shared latent space. On the one hand, we adopt a deep neural network architecture for visual image generation, which mimics the stages of human visual processing. On the other hand, we design a sparse Bayesian linear model for fMRI activity generation, which can effectively capture voxel correlations, suppress data noise, and avoid overfitting. Furthermore, we devise an efficient mean-field variational inference method to train the proposed model. The proposed method can accurately reconstruct visual images via Bayesian inference. In particular, we exploit a posterior regularization technique in the Bayesian inference to regularize the model posterior. The quantitative and qualitative evaluations conducted on multiple fMRI data sets demonstrate the proposed method can reconstruct visual images more accurately than the state of the art.

Journal ArticleDOI
TL;DR: BOLD5000 as mentioned in this paper is a large-scale human functional MRI (fMRI) dataset that includes almost 5,000 distinct images depicting real-world scenes, including images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets.
Abstract: Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that include neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enables fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science–the intertwined study of biological and computer vision. Machine-accessible metadata file describing the reported data (ISA-Tab format)

Journal ArticleDOI
01 Jul 2019
TL;DR: It is argued that processes like recollection constitute inadequate labels for characterizing neural mechanisms, and it is advocated considering the component operations and representations of process like recollection in isolation.
Abstract: Thanks to patients Phineas Gage and Henry Molaison, we have long known that behavioral control depends on the frontal lobes, whereas declarative memory depends on the medial temporal lobes (MTL). For decades, cognitive functions-behavioral control, declarative memory-have served as labels for characterizing the division of labor in cortex. This approach has made enormous contributions to understanding how the brain enables the mind, providing a systems-level explanation of brain function that constrains lower-level investigations of neural mechanism. Today, the approach has evolved such that functional labels are often applied to brain networks rather than focal brain regions. Furthermore, the labels have diversified to include both broadly-defined cognitive functions (declarative memory, visual perception) and more circumscribed mental processes (recollection, familiarity, priming). We ask whether a process-a high-level mental phenomenon corresponding to an introspectively-identifiable cognitive event-is the most productive label for dissecting memory. For example, recollection conflates a neurocomputational operation (pattern completion-based retrieval) with a class of representational content (associative, high-dimensional memories). Because a full theory of memory must identify operations and representations separately, and specify how they interact, we argue that processes like recollection constitute inadequate labels for characterizing neural mechanisms. Instead, we advocate considering the component operations and representations of processes like recollection in isolation. For the organization of memory, the evidence suggests that pattern completion is recapitulated widely across the ventral visual stream and MTL, but the division of labor between sites within this pathway can be explained by representational content.

Journal ArticleDOI
TL;DR: The results suggest that alpha power is crucial to isolate a subject from the environment, and move attention from external to internal cues, and emphasize that the emerging use of VR associated with EEG may have important implications to study brain rhythms and support the design of artificial systems.
Abstract: Variations in alpha rhythm have a significant role in perception and attention. Recently, alpha decrease has been associated with externally directed attention, especially in the visual domain, whereas alpha increase has been related to internal processing such as mental arithmetic. However, the role of alpha oscillations and how the different components of a task (processing of external stimuli, internal manipulation/representation, and task demand) interact to affect alpha power are still unclear. Here, we investigate how alpha power is differently modulated by attentional tasks depending both on task difficulty (less/more demanding task) and direction of attention (internal/external). To this aim, we designed two experiments that differently manipulated these aspects. Experiment 1, outside Virtual Reality (VR), involved two tasks both requiring internal and external attentional components (intake of visual items for their internal manipulation) but with different internal task demands (arithmetic vs. reading). Experiment 2 took advantage of the VR (mimicking an aircraft cabin interior) to manipulate attention direction: it included a condition of VR immersion only, characterized by visual external attention, and a condition of a purely mental arithmetic task during VR immersion, requiring neglect of sensory stimuli. Results show that: (1) In line with previous studies, visual external attention caused a significant alpha decrease, especially in parieto-occipital regions; (2) Alpha decrease was significantly larger during the more demanding arithmetic task, when the task was driven by external visual stimuli; (3) Alpha dramatically increased during the purely mental task in VR immersion, whereby the external stimuli had no relation with the task. Our results suggest that alpha power is crucial to isolate a subject from the environment, and move attention from external to internal cues. Moreover, they emphasize that the emerging use of VR associated with EEG may have important implications to study brain rhythms and support the design of artificial systems.

Journal ArticleDOI
TL;DR: It is argued that, in a functional model of visual perception, featuring probabilistic inference over a hierarchy of features, inferences about high-level features modulate inferences in the fine structure of SCCs as stimulus identity and, more importantly, stimulus complexity varies.
Abstract: Spike count correlations (SCCs) are ubiquitous in sensory cortices, are characterized by rich structure, and arise from structured internal dynamics. However, most theories of visual perception treat contributions of neurons to the representation of stimuli independently and focus on mean responses. Here, we argue that, in a functional model of visual perception, featuring probabilistic inference over a hierarchy of features, inferences about high-level features modulate inferences about low-level features ultimately introducing structured internal dynamics and patterns in SCCs. Specifically, high-level inferences for complex stimuli establish the local context in which neurons in the primary visual cortex (V1) interpret stimuli. Since the local context differentially affects multiple neurons, this conjecture predicts specific modulations in the fine structure of SCCs as stimulus identity and, more importantly, stimulus complexity varies. We designed experiments with natural and synthetic stimuli to measure the fine structure of SCCs in V1 of awake behaving macaques and assessed their dependence on stimulus identity and stimulus statistics. We show that the fine structure of SCCs is specific to the identity of natural stimuli and changes in SCCs are independent of changes in response mean. Critically, we demonstrate that stimulus specificity of SCCs in V1 can be directly manipulated by altering the amount of high-order structure in synthetic stimuli. Finally, we show that simple phenomenological models of V1 activity cannot account for the observed SCC patterns and conclude that the stimulus dependence of SCCs is a natural consequence of structured internal dynamics in a hierarchical probabilistic model of natural images.

Journal ArticleDOI
TL;DR: This work exploited stimulus tuning to highlight the functional dissociation of these distinct signals, reconciling prior inconsistencies across species and stimuli regarding the ubiquity of visual gamma oscillations during natural vision.

Journal ArticleDOI
TL;DR: The main objective of this study is to survey the recently conducted studies on depth perception in VR, augmented reality (AR), and mixed reality (MR).
Abstract: Depth perception is one of the important elements in virtual reality (VR). The perceived depth is influenced by the head mounted displays that inevitability decreases the virtual content's depth perception. While several questions within this area are still under research; the main objective of this study is to survey the recently conducted studies on depth perception in VR, augmented reality (AR), and mixed reality (MR). First, depth perception in the human visual system is discussed including the different visual cues involved in depth perception. Second, research performed to understand and confirm depth perception issue is examined. The contributions made to improve depth perception and specifically distance perception will be discussed with their main proposed design key, advantages, and limitations. Most of the contributions were based on using one or two depth cues to improve depth perception in VR, AR, and MR.

Journal ArticleDOI
TL;DR: Objective tasks showed that similar contrast and colour appearance can be produced in the virtual environment with minor impact on fine-details due to limited resolution, indicating the proposed methodology's capability to provide realistic immersive environments.

Journal ArticleDOI
TL;DR: A significant new perspective is that noise in neural representation limits the precision of recall, and several recent models incorporate this view to account for failures of binding in WM.
Abstract: How does visual working memory (WM) store the binding between different features of a visual object (like colour, orientation, and location), and does memorizing these bindings require additional resources beyond memorizing individual features? These questions have traditionally been addressed by comparing performance across different types of change detection task. More recently, experimental tasks such as analogue (cued) recall, combined with analysis methods including Bayesian hypothesis testing and formal model comparison, have shed new light on the properties of WM. A significant new perspective is that noise in neural representation limits the precision of recall, and several recent models incorporate this view to account for failures of binding in WM. We review the literature on feature binding with a focus on these new developments and discuss their implications for the interpretation of classical findings.

Journal ArticleDOI
TL;DR: Clear differences in neurons' preferred disparities across areas are found, suggesting that higher visual area RL is specialized for encoding visual stimuli very close to the mouse, likely reflecting an adaptation to natural image statistics.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This work proposes to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities, and develops an action-level representation fusion scheme, which predicts an action candidate from each representation and adaptively consolidate these action candidates into the final action.
Abstract: A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities. For example, to "go to the nearest chair'', the agent might need to identify a chair in a living room using semantics, follow along a hallway using vanishing point cues, and avoid obstacles using depth. Therefore, utilizing the appropriate visual perception abilities based on a situational understanding of the visual environment can empower these navigation models in unseen visual environments. We propose to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities. To fully utilize each representation, we develop an action-level representation fusion scheme, which predicts an action candidate from each representation and adaptively consolidate these action candidates into the final action. Furthermore, we employ a data-driven inter-task affinity regularization to reduce redundancies and improve generalization. Our approach leads to a significantly improved performance in novel environments over ImageNet-pretrained baseline and other fusion methods.

Journal ArticleDOI
TL;DR: While under limited circumstances CNNs are able to meet or outperform human task performance, it is found that they are not currently a good model for human graphical perception.
Abstract: Convolutional neural networks can successfully perform many computer vision tasks on images. For visualization, how do CNNs perform when applied to graphical perception tasks? We investigate this question by reproducing Cleveland and McGill's seminal 1984 experiments, which measured human perception efficiency of different visual encodings and defined elementary perceptual tasks for visualization. We measure the graphical perceptual capabilities of four network architectures on five different visualization tasks and compare to existing and new human performance baselines. While under limited circumstances CNNs are able to meet or outperform human task performance, we find that CNNs are not currently a good model for human graphical perception. We present the results of these experiments to foster the understanding of how CNNs succeed and fail when applied to data visualizations.

Journal ArticleDOI
15 Sep 2019
TL;DR: Owing to rapid technological progress, larval zebrafish provide unique opportunities for obtaining a comprehensive understanding of the intermediate processing steps occurring between visual and motor centers, revealing how visuomotor transformations are implemented in a vertebrate brain.
Abstract: Visual stimuli can evoke complex behavioral responses, but the underlying streams of neural activity in mammalian brains are difficult to follow because of their size. Here, I review the visual system of zebrafish larvae, highlighting where recent experimental evidence has localized the functional steps of visuomotor transformations to specific brain areas. The retina of a larva encodes behaviorally relevant visual information in neural activity distributed across feature-selective ganglion cells such that signals representing distinct stimulus properties arrive in different areas or layers of the brain. Motor centers in the hindbrain encode motor variables that are precisely tuned to behavioral needs within a given stimulus setting. Owing to rapid technological progress, larval zebrafish provide unique opportunities for obtaining a comprehensive understanding of the intermediate processing steps occurring between visual and motor centers, revealing how visuomotor transformations are implemented in a vertebrate brain.

Journal ArticleDOI
TL;DR: The neural representations enabling perceived similarity using behavioral judgments, fMRI and MEG, and representational similarity analyses are investigated to characterize the relationship between perceived similarity of key object dimensions and neural activity.

Journal ArticleDOI
TL;DR: It is found that the pattern of small fixational eye movements called microsaccades changes around behaviorally relevant moments in a way that stabilizes the position of the eyes, a new behavioral correlate of voluntary, or goal-directed, temporal attention.
Abstract: Our visual input is constantly changing, but not all moments are equally relevant. Visual temporal attention, the prioritization of visual information at specific points in time, increases perceptual sensitivity at behaviorally relevant times. The dynamic processes underlying this increase are unclear. During fixation, humans make small eye movements called microsaccades, and inhibiting microsaccades improves perception of brief stimuli. Here, we investigated whether temporal attention changes the pattern of microsaccades in anticipation of brief stimuli. Human observers (female and male) judged stimuli presented within a short sequence. Observers were given either an informative precue to attend to one of the stimuli, which was likely to be probed, or an uninformative (neutral) precue. We found strong microsaccadic inhibition before the stimulus sequence, likely due to its predictable onset. Critically, this anticipatory inhibition was stronger when the first target in the sequence (T1) was precued (task-relevant) than when the precue was uninformative. Moreover, the timing of the last microsaccade before T1 and the first microsaccade after T1 shifted such that both occurred earlier when T1 was precued than when the precue was uninformative. Finally, the timing of the nearest pre- and post-T1 microsaccades affected task performance. Directing voluntary temporal attention therefore affects microsaccades, helping to stabilize fixation at the most relevant moments over and above the effect of predictability. Just as saccading to a relevant stimulus can be an overt correlate of the allocation of spatial attention, precisely timed gaze stabilization can be an overt correlate of the allocation of temporal attention.SIGNIFICANCE STATEMENT We pay attention at moments in time when a relevant event is likely to occur. Such temporal attention improves our visual perception, but how it does so is not well understood. Here, we discovered a new behavioral correlate of voluntary, or goal-directed, temporal attention. We found that the pattern of small fixational eye movements called microsaccades changes around behaviorally relevant moments in a way that stabilizes the position of the eyes. Microsaccades during a brief visual stimulus can impair perception of that stimulus. Therefore, such fixation stabilization may contribute to the improvement of visual perception at attended times. This link suggests that, in addition to cortical areas, subcortical areas mediating eye movements may be recruited with temporal attention.