scispace - formally typeset
Search or ask a question

Showing papers on "Visual perception published in 2021"


Journal ArticleDOI
Joshua H. Siegle1, Xiaoxuan Jia1, Séverine Durand1, Sam Gale1, Corbett Bennett1, Nile Graddis1, Greggory Heller1, Tamina K. Ramirez1, Hannah Choi2, Hannah Choi1, Jennifer Luviano1, Peter A. Groblewski1, Ruweida Ahmed1, Anton Arkhipov1, Amy Bernard1, Yazan N. Billeh1, Dillan Brown1, Michael A. Buice1, Nicolas Cain1, Shiella Caldejon1, Linzy Casal1, Andrew Cho1, Maggie Chvilicek1, Timothy C. Cox3, Kael Dai1, Daniel J. Denman4, Daniel J. Denman1, Saskia E. J. de Vries1, Roald Dietzman1, Luke Esposito1, Colin Farrell1, David Feng1, John Galbraith1, Marina Garrett1, Emily Gelfand1, Nicole Hancock1, Julie A. Harris1, Robert Howard1, Brian Hu1, Ross Hytnen1, Ramakrishnan Iyer1, Erika Jessett1, Katelyn Johnson1, India Kato1, Justin T. Kiggins1, Sophie Lambert1, Jérôme Lecoq1, Peter Ledochowitsch1, Jung Hoon Lee1, Arielle Leon1, Yang Li1, Elizabeth Liang1, Fuhui Long1, Kyla Mace1, Jose Melchior1, Daniel Millman1, Tyler Mollenkopf1, Chelsea Nayan1, Lydia Ng1, Kiet Ngo1, Thuyahn Nguyen1, Philip R. Nicovich1, Kat North1, Gabriel Koch Ocker1, Douglas R. Ollerenshaw1, Michael Oliver1, Marius Pachitariu, Jed Perkins1, Melissa Reding1, David Reid1, Miranda Robertson1, Kara Ronellenfitch1, Sam Seid1, Cliff Slaughterbeck1, Michelle Stoecklin1, David Sullivan1, Ben Sutton1, Jackie Swapp1, Carol L. Thompson1, Kristen Turner1, Wayne Wakeman1, Jennifer D. Whitesell1, Derric Williams1, Ali Williford1, R.D. Young1, Hongkui Zeng1, Sarah A. Naylor1, John W. Phillips1, R. Clay Reid1, Stefan Mihalas1, Shawn R. Olsen1, Christof Koch1 
20 Jan 2021-Nature
TL;DR: In this paper, a large-scale dataset of tens of thousands of units in six cortical and two thalamic regions in the brains of mice responding to a battery of visual stimuli is presented.
Abstract: The anatomy of the mammalian visual system, from the retina to the neocortex, is organized hierarchically1. However, direct observation of cellular-level functional interactions across this hierarchy is lacking due to the challenge of simultaneously recording activity across numerous regions. Here we describe a large, open dataset-part of the Allen Brain Observatory2-that surveys spiking from tens of thousands of units in six cortical and two thalamic regions in the brains of mice responding to a battery of visual stimuli. Using cross-correlation analysis, we reveal that the organization of inter-area functional connectivity during visual stimulation mirrors the anatomical hierarchy from the Allen Mouse Brain Connectivity Atlas3. We find that four classical hierarchical measures-response latency, receptive-field size, phase-locking to drifting gratings and response decay timescale-are all correlated with the hierarchy. Moreover, recordings obtained during a visual task reveal that the correlation between neural activity and behavioural choice also increases along the hierarchy. Our study provides a foundation for understanding coding and signal propagation across hierarchically organized cortical and thalamic visual areas.

199 citations


Journal ArticleDOI
TL;DR: The previous research and application of visual perception in different industrial fields such as product surface defect detection, intelligent agricultural production, intelligent driving, image synthesis, and event reconstruction are reviewed.
Abstract: Visual perception refers to the process of organizing, identifying, and interpreting visual information in environmental awareness and understanding. With the rapid progress of multimedia acquisition technology, research on visual perception has been a hot topic in the academical field and industrial applications. Especially after the introduction of artificial intelligence theory, intelligent visual perception has been widely used to promote the development of industrial production towards intelligence. In this article, we review the previous research and application of visual perception in different industrial fields such as product surface defect detection, intelligent agricultural production, intelligent driving, image synthesis, and event reconstruction. The applications basically cover most of the intelligent visual perception processing technologies. Through this survey, it will provide a comprehensive reference for research on this direction. Finally, this article also summarizes the current challenges of visual perception and predicts its future development trends.

127 citations


Journal ArticleDOI
31 Mar 2021-Nature
TL;DR: This article showed that prefrontal cortex acts as a domain-general controller for both attention and selection in rhesus monkeys, and that attention facilitated behavior by enhancing and transforming the representation of the selected memory or attended stimulus.
Abstract: Cognitive control guides behaviour by controlling what, when, and how information is represented in the brain1. For example, attention controls sensory processing; top-down signals from prefrontal and parietal cortex strengthen the representation of task-relevant stimuli2-4. A similar 'selection' mechanism is thought to control the representations held 'in mind'-in working memory5-10. Here we show that shared neural mechanisms underlie the selection of items from working memory and attention to sensory stimuli. We trained rhesus monkeys to switch between two tasks, either selecting one item from a set of items held in working memory or attending to one stimulus from a set of visual stimuli. Neural recordings showed that similar representations in prefrontal cortex encoded the control of both selection and attention, suggesting that prefrontal cortex acts as a domain-general controller. By contrast, both attention and selection were represented independently in parietal and visual cortex. Both selection and attention facilitated behaviour by enhancing and transforming the representation of the selected memory or attended stimulus. Specifically, during the selection task, memory items were initially represented in independent subspaces of neural activity in prefrontal cortex. Selecting an item caused its representation to transform from its own subspace to a new subspace used to guide behaviour. A similar transformation occurred for attention. Our results suggest that prefrontal cortex controls cognition by dynamically transforming representations to control what and when cognitive computations are engaged.

106 citations


Journal ArticleDOI
TL;DR: Findings illustrate the need for communication-friendly face-coverings, and emphasise the need to be communication-aware when wearing a face covering, to understand the impact of face coverings on hearing and communication.
Abstract: To understand the impact of face coverings on hearing and communication. An online survey consisting of closed-set and open-ended questions distributed within the UK to gain insights into experienc...

99 citations


Book
26 Aug 2021
TL;DR: This book discusses the role of attention in perception, the nature and Function of Memory, and theories of Cognition: From Metaphors to Computational Models.
Abstract: Introduction to Cognitive Psychology. Cognitive Processes. Experimental Psychology. Computer Models of Information Processing. Cognitive Neuropsychology. Minds, Brains and Computers. Perception and Attention. The Biological Bases of Perception. Psychological Approaches to Visual Perception. Visual Illusions. Marr's Theory. Object Recognition Processes. Perception: A Summary. Attention. The Role of Attention in Perception. Automaticity. The Spotlight Model of Visual Attention. Visual Attention. Perception, Attention and Consciousness. Disorders of Perception and Attention. Introduction. Blindsight. Unilateral Spatial Neglect. Visual Agnosia. Disorders of Face Processing - Prosopagnosia and Related Conditions. Memory. The Nature and Function of Memory. Multistore Models and Working Memory. Ebbinghaus and the First Long-term Memory Experiments. The Role of Knowledge, Meaning, and Schemas in Memory. Input Processing and Encoding. Retrieval Cues and Feature Overlap. Retrieval Mechanisms in Recall and Recognition. Automatic and Controlled Memory Processes. Memory in Real Life. Disorders of Memory. The Tragic Effects of Amnesia. The Causes of Organic Amnesia. Short-term and Long-term Memory Impairments. Anterograde and Retrograde Amnesia. Memory Functions Preserved in Amnesia. Other Types of Amnesia. Thinking, Problem-solving and Reasoning. Introduction. Early Research on Problem-solving. Problem-space Theory of Problem-solving. Problem-solving and Knowledge. Deductive and Inductive Reasoning. Statistical Reasoning. Everyday Reasoning. Disorders of Thinking. Executive Function and the Frontal Lobes. Introduction. The frontal Lobes. Problem-solving and Reasoning Deficits. The Executive Functions of the Frontal Lobes. Language. Introduction. The Language System. Psychology and Linguistics. Recognising Spoken and Written Words. Production of Spoken Words. Sentence Comprehension. Sentence Production. Discourse Level. Disorders of Language. Introduction. Historical Perspective. The Psycholinguistic. Disruptions to Language Processing at Word Level. Disruption to Processing of Syntax. Disruption to Processing of Discourse. Theories of Cognition: From Metaphors to Computational Models. Symbol-based Systems. Connectionist Systems. Symbols and Neurons Compared.

91 citations


Proceedings ArticleDOI
21 Jun 2021
TL;DR: In this article, the authors introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets and show that existing state-of-the-art short-term models are limited for long-term tasks.
Abstract: Our world offers a never-ending stream of visual stimuli, yet today’s vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.

69 citations


Journal ArticleDOI
17 Mar 2021-Neuron
TL;DR: The superior colliculus is a conserved sensorimotor structure that integrates visual and other sensory information to drive reflexive behaviors as mentioned in this paper, and the evidence for this is strong and compelling.

66 citations


Journal ArticleDOI
Yun Liu1, Yu-Chao Gu1, Xin-Yu Zhang1, Weiwei Wang1, Ming-Ming Cheng1 
TL;DR: A hierarchical visual perception (HVP) module to imitate the primate visual cortex for hierarchical perception learning is proposed, and with the HVP module incorporated, a lightweight SOD network is designed, namely, HVPNet.
Abstract: Recently, salient object detection (SOD) has witnessed vast progress with the rapid development of convolutional neural networks (CNNs). However, the improvement of SOD accuracy comes with the increase in network depth and width, resulting in large network size and heavy computational overhead. This prevents state-of-the-art SOD methods from being deployed into practical platforms, especially mobile devices. To promote the deployment of real-world SOD applications, we aim at developing a lightweight SOD model in this article. Our observation comes from that the primate visual system processes visual signals hierarchically with different receptive fields and eccentricities in different visual cortex areas. Inspired by this, we propose a hierarchical visual perception (HVP) module to imitate the primate visual cortex for hierarchical perception learning. With the HVP module incorporated, we design a lightweight SOD network, namely, HVPNet. Extensive experiments on popular benchmarks demonstrate that HVPNet achieves highly competitive accuracy compared with state-of-the-art SOD methods while running at 4.3 frames/s CPU speed and 333.2 frames/s GPU speed with only 1.23M parameters.

62 citations


Journal ArticleDOI
TL;DR: In this article, the authors analyzed large-scale optical and electrophysiological recordings from six visual cortical areas in behaving mice that were repeatedly presented with the same natural movies and found representational drift over timescales spanning minutes to days across multiple visual areas, cortical layers, and cell types.

58 citations


Journal ArticleDOI
TL;DR: A prototype neuromorphic vision system is proposed and demonstrated by networking a retinomorphic sensor with a memristive crossbar that allows for fast letter recognition and object tracking and indicates the capabilities of image sensing, processing and recognition in the full analog regime.
Abstract: Compared to human vision, conventional machine vision composed of an image sensor and processor suffers from high latency and large power consumption due to physically separated image sensing and processing. A neuromorphic vision system with brain-inspired visual perception provides a promising solution to the problem. Here we propose and demonstrate a prototype neuromorphic vision system by networking a retinomorphic sensor with a memristive crossbar. We fabricate the retinomorphic sensor by using WSe2/h-BN/Al2O3 van der Waals heterostructures with gate-tunable photoresponses, to closely mimic the human retinal capabilities in simultaneously sensing and processing images. We then network the sensor with a large-scale Pt/Ta/HfO2/Ta one-transistor-one-resistor (1T1R) memristive crossbar, which plays a similar role to the visual cortex in the human brain. The realized neuromorphic vision system allows for fast letter recognition and object tracking, indicating the capabilities of image sensing, processing and recognition in the full analog regime. Our work suggests that such a neuromorphic vision system may open up unprecedented opportunities in future visual perception applications.

57 citations


Journal ArticleDOI
20 Sep 2021-ACS Nano
TL;DR: In this article, the authors proposed a fully memristor-based artificial visual perception nervous system (AVPNS) which consists of a quantum-dot-based photoelectric memrisor and a nanosheet-based threshold-switching (TS) memrisors.
Abstract: The visual perception system is the most important system for human learning since it receives over 80% of the learning information from the outside world. With the exponential growth of artificial intelligence technology, there is a pressing need for high-energy and area-efficiency visual perception systems capable of processing efficiently the received natural information. Currently, memristors with their elaborate dynamics, excellent scalability, and information (e.g., visual, pressure, sound, etc.) perception ability exhibit tremendous potential for the application of visual perception. Here, we propose a fully memristor-based artificial visual perception nervous system (AVPNS) which consists of a quantum-dot-based photoelectric memristor and a nanosheet-based threshold-switching (TS) memristor. We use a photoelectric and a TS memristor to implement the synapse and leaky integrate-and-fire (LIF) neuron functions, respectively. With the proposed AVPNS we successfully demonstrate the biological image perception, integration and fire, as well as the biosensitization process. Furthermore, the self-regulation process of a speed meeting control system in driverless automobiles can be accurately and conceptually emulated by this system. Our work shows that the functions of the biological visual nervous system may be systematically emulated by a memristor-based hardware system, thus expanding the spectrum of memristor applications in artificial intelligence.

Proceedings ArticleDOI
19 Jan 2021
TL;DR: ArtEmis as mentioned in this paper is a large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.
Abstract: We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 455K emotion attributions and explanations from humans, on 80K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at https://artemisdataset.org.

Journal ArticleDOI
TL;DR: In this article, the authors performed chronic 2-photon imaging of mouse V1 populations to directly compare the representational stability of artificial versus naturalistic visual stimuli over weeks and found that responses to gratings were highly stable across sessions.
Abstract: To produce consistent sensory perception, neurons must maintain stable representations of sensory input. However, neurons in many regions exhibit progressive drift across days. Longitudinal studies have found stable responses to artificial stimuli across sessions in visual areas, but it is unclear whether this stability extends to naturalistic stimuli. We performed chronic 2-photon imaging of mouse V1 populations to directly compare the representational stability of artificial versus naturalistic visual stimuli over weeks. Responses to gratings were highly stable across sessions. However, neural responses to naturalistic movies exhibited progressive representational drift across sessions. Differential drift was present across cortical layers, in inhibitory interneurons, and could not be explained by differential response strength or higher order stimulus statistics. However, representational drift was accompanied by similar differential changes in local population correlation structure. These results suggest representational stability in V1 is stimulus-dependent and may relate to differences in preexisting circuit architecture of co-tuned neurons.

Journal ArticleDOI
TL;DR: The experimental results not only could provide strong support for the modularity theory about the brain cognitive function, but show the superiority of the proposed Bi-LSTM model with attention mechanism again.

Journal ArticleDOI
TL;DR: The ecoset dataset as discussed by the authors is a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans.
Abstract: Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.

Journal ArticleDOI
TL;DR: In this paper, the authors measured visual acuity at isoeccentric peripheral locations (10 deg eccentricity), every 15° of polar angle, on each trial, observers judged the orientation (± 45°) of one of four equidistant, suprathreshold grating stimuli varying in spatial frequency (SF).
Abstract: Human vision is heterogeneous around the visual field. At a fixed eccentricity, performance is better along the horizontal than the vertical meridian and along the lower than the upper vertical meridian. These asymmetric patterns, termed performance fields, have been found in numerous visual tasks, including those mediated by contrast sensitivity and spatial resolution. However, it is unknown whether spatial resolution asymmetries are confined to the cardinal meridians or whether and how far they extend into the upper and lower hemifields. Here, we measured visual acuity at isoeccentric peripheral locations (10 deg eccentricity), every 15° of polar angle. On each trial, observers judged the orientation (± 45°) of one of four equidistant, suprathreshold grating stimuli varying in spatial frequency (SF). On each block, we measured performance as a function of stimulus SF at 4 of 24 isoeccentric locations. We estimated the 75%-correct SF threshold, SF cutoff point (i.e., chance-level), and slope of the psychometric function for each location. We found higher SF estimates (i.e., better acuity) for the horizontal than the vertical meridian and for the lower than the upper vertical meridian. These asymmetries were most pronounced at the cardinal meridians and decreased gradually as the angular distance from the vertical meridian increased. This gradual change in acuity with polar angle reflected a shift of the psychometric function without changes in slope. The same pattern was found under binocular and monocular viewing conditions. These findings advance our understanding of visual processing around the visual field and help constrain models of visual perception.

Journal ArticleDOI
TL;DR: This paper conducted a large-scale study of the production and visual perception of facial expressions of emotion in the wild and found that of the 16,384 possible facial configurations that people can theoretically produce, only 35 were successfully used to transmit emotive information across cultures, and only 8 within a smaller number of cultures.
Abstract: Automatic recognition of emotion from facial expressions is an intense area of research, with a potentially long list of important application. Yet, the study of emotion requires knowing which facial expressions are used within and across cultures in the wild, not in controlled lab conditions; but such studies do not exist. Which and how many cross-cultural and cultural-specific facial expressions do people commonly use? And, what affect variables does each expression communicate to observers? If we are to design technology that understands the emotion of users, we need answers to these two fundamental questions. In this paper, we present the first large-scale study of the production and visual perception of facial expressions of emotion in the wild. We find that of the 16,384 possible facial configurations that people can theoretically produce, only 35 are successfully used to transmit emotive information across cultures, and only 8 within a smaller number of cultures. Crucially, we find that visual analysis of cross-cultural expressions yields consistent perception of emotion categories and valence, but not arousal. In contrast, visual analysis of cultural-specific expressions yields consistent perception of valence and arousal, but not of emotion categories. Additionally, we find that the number of expressions used to communicate each emotion is also different, e.g., 17 expressions transmit happiness, but only 1 is used to convey disgust.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis, and show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations.
Abstract: Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing This indicates some fundamental differences exist in how the brain and CNNs represent visual information Convolutional neural networks are increasingly used to model human vision Here, the authors compare the performance of 14 different CNNs and human fMRI responses to real-world and artificial objects to show some fundamental differences exist between them

Journal ArticleDOI
TL;DR: Experimental results on three widely used image datasets show that the proposed visual saliency-detection model based on spatial position prior of attractive objects and sparse background features is effective and efficient, and is superior to other state-of-the-art saliency -detection models.
Abstract: In this paper, we propose an effective visual saliency-detection model based on spatial position prior of attractive objects and sparse background features. Firstly, since multi-orientation features are among the key visual stimuli in the human visual system (HVS) to perceive object spatial information, discrete wavelet frame transform (DWDT) is applied to extract directionality characteristics for calculating the centoid of remarkable objects in the original image. Then, the color contrast feature is used to represent the physical characteristics of salient objects. Thirdly, in order to explore and utilize the background features of an input image, sparse dictionary learning is performed to statistically analyze and estimate the background feature map. Finally, three distinctive cues of the directional feature including the color contrast feature and the background feature are combined to generate a final robust saliency map. Experimental results on three widely used image datasets show that our proposed method is effective and efficient, and is superior to other state-of-the-art saliency-detection models.

Journal ArticleDOI
16 Jun 2021-Neuron
TL;DR: This paper found that responses of mouse lateral posterior nucleus (LP) neurons projecting to higher visual areas likely derive from feedforward input from primary visual cortex (V1) combined with information from many cortical and subcortical areas, including superior colliculus.

Journal ArticleDOI
TL;DR: In this paper, the authors showed that top-down signals originating in the frontal eye fields causally shape visual cortex activity and perception through mechanisms of oscillatory phase realignment at the beta frequency.
Abstract: Voluntary allocation of visual attention is controlled by top-down signals generated within the Frontal Eye Fields (FEFs) that can change the excitability of lower-level visual areas. However, the mechanism through which this control is achieved remains elusive. Here, we emulated the generation of an attentional signal using single-pulse transcranial magnetic stimulation to activate the FEFs and tracked its consequences over the visual cortex. First, we documented changes to brain oscillations using electroencephalography and found evidence for a phase reset over occipital sites at beta frequency. We then probed for perceptual consequences of this top-down triggered phase reset and assessed its anatomical specificity. We show that FEF activation leads to cyclic modulation of visual perception and extrastriate but not primary visual cortex excitability, again at beta frequency. We conclude that top-down signals originating in FEF causally shape visual cortex activity and perception through mechanisms of oscillatory realignment. Visual attention requires top-down modulation from the frontal eye fields to change cortical excitability of visual cortex. Here, the authors show that these top-down signals shape perception through mechanisms of oscillatory phase realignment at the beta frequency.

Journal ArticleDOI
TL;DR: In this paper, six empirical studies present examples of how to capture visual perception in the complexity of a classroom lesson, and one theoretical contribution provides the very first model of teachers' cognitions during teaching in relation to their visual perception, which in turn will allow future research to move beyond explorations towards hypothesis testing.
Abstract: Classrooms full of pupils can be very overwhelming, both for teachers and students, as well as for their joint interactions. It is thus crucial that both can distil the relevant information in this complex scenario and interpret it appropriately. This distilling and interpreting happen to a large extent via visual perception, which is the core focus of the current Special Issue. Six empirical studies present examples of how to capture visual perception in the complexity of a classroom lesson. These examples open up new avenues that go beyond studying perception in restricted and artificial laboratory scenarios: some using video recordings from authentic lessons to others studying actual classrooms. This movement towards more realistic scenarios allows to study the visual perception in classrooms from new perspectives, namely that of the teachers, the learners, and their interactions. This in turn enables to shed novel light onto well-established theoretical concepts, namely students’ engagement during actual lessons, teachers’ professional vision while teaching, and establishment of joint attention between teachers and students in a lesson. Additionally, one theoretical contribution provides the very first model of teachers’ cognitions during teaching in relation to their visual perception, which in turn will allow future research to move beyond explorations towards hypothesis testing. However, to fully thrive, this field of research has to address two crucial challenges: (i) the heterogeneity of its methodological approaches (e.g., varying age groups, subjects taught, lesson formats) and (ii) the recording and processing of personal data of many people (often minors). Hence, these new approaches bear not only new chances for insights but also new responsibilities for the researchers.

Journal ArticleDOI
TL;DR: This work proposes a model, EEG-ChannelNet, to learn a brain manifold for EEG classification and introduces a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations.
Abstract: This work presents a novel method of exploring human brain-visual representations, with a view towards replicating these processes in machines. The core idea is to learn plausible computational and biological representations by correlating human neural activity and natural images. Thus, we first propose a model, EEG-ChannelNet , to learn a brain manifold for EEG classification. After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations. We then carry out image classification and saliency detection on the learned manifold. Performance analyses show that our approach satisfactorily decodes visual information from neural signals. This, in turn, can be used to effectively supervise the training of deep learning models, as demonstrated by the high performance of image classification and saliency detection on out-of-training classes. The obtained results show that the learned brain-visual features lead to improved performance and simultaneously bring deep models more in line with cognitive neuroscience work related to visual perception and attention.

Journal ArticleDOI
TL;DR: In this article, the authors present a checklist for comparative studies of visual reasoning in humans and machines, highlighting how to overcome potential pitfalls in design and inference and highlight the importance of aligning experimental conditions.
Abstract: With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.

Journal ArticleDOI
TL;DR: A critical review of the related significant aspects is provided and an overview of existing applications of deep learning in computational visual perception is included, which shows that there is a significant improvement in the accuracy using dropout and data augmentation.
Abstract: Computational visual perception, also known as computer vision, is a field of artificial intelligence that enables computers to process digital images and videos in a similar way as biological vision does. It involves methods to be developed to replicate the capabilities of biological vision. The computer vision’s goal is to surpass the capabilities of biological vision in extracting useful information from visual data. The massive data generated today is one of the driving factors for the tremendous growth of computer vision. This survey incorporates an overview of existing applications of deep learning in computational visual perception. The survey explores various deep learning techniques adapted to solve computer vision problems using deep convolutional neural networks and deep generative adversarial networks. The pitfalls of deep learning and their solutions are briefly discussed. The solutions discussed were dropout and augmentation. The results show that there is a significant improvement in the accuracy using dropout and data augmentation. Deep convolutional neural networks’ applications, namely, image classification, localization and detection, document analysis, and speech recognition, are discussed in detail. In-depth analysis of deep generative adversarial network applications, namely, image-to-image translation, image denoising, face aging, and facial attribute editing, is done. The deep generative adversarial network is unsupervised learning, but adding a certain number of labels in practical applications can improve its generating ability. However, it is challenging to acquire many data labels, but a small number of data labels can be acquired. Therefore, combining semisupervised learning and generative adversarial networks is one of the future directions. This article surveys the recent developments in this direction and provides a critical review of the related significant aspects, investigates the current opportunities and future challenges in all the emerging domains, and discusses the current opportunities in many emerging fields such as handwriting recognition, semantic mapping, webcam-based eye trackers, lumen center detection, query-by-string word, intermittently closed and open lakes and lagoons, and landslides.

Journal ArticleDOI
TL;DR: In the VR setting, the orange daylight led to warmer thermal perception in (close-to-) comfortable temperatures, resulting in a color-induced thermal perception and indicating that orange glazing should be used with caution in a slightly warm environment.
Abstract: ObjectiveTemperature–color interaction effects on subjective perception and physiological responses are investigated using a novel hybrid experimental method combining thermal and visual stimuli fr...

Journal ArticleDOI
TL;DR: Widespread stable visual organization beyond the traditional visual system, in default-mode network and hippocampus is demonstrated, indicating that visual–spatial organization is a fundamental coding principle that structures the communication between distant brain regions.
Abstract: The human visual system is organized as a hierarchy of maps that share the topography of the retina. Known retinotopic maps have been identified using simple visual stimuli under strict fixation, conditions different from everyday vision which is active, dynamic, and complex. This means that it remains unknown how much of the brain is truly visually organized. Here I demonstrate widespread stable visual organization beyond the traditional visual system, in default-mode network and hippocampus. Detailed topographic connectivity with primary visual cortex during movie-watching, resting-state, and retinotopic-mapping experiments revealed that visual-spatial representations throughout the brain are warped by cognitive state. Specifically, traditionally visual regions alternate with default-mode network and hippocampus in preferentially representing the center of the visual field. This visual role of default-mode network and hippocampus would allow these regions to interface between abstract memories and concrete sensory impressions. Together, these results indicate that visual-spatial organization is a fundamental coding principle that structures the communication between distant brain regions.

Journal ArticleDOI
03 Mar 2021-Neuron
TL;DR: In this paper, the authors investigated how larval zebrafish select between simultaneously presented visual stimuli and found that a mix of winner-take-all (WTA) and averaging strategies best simulates behavioral responses.

Journal ArticleDOI
07 Apr 2021-Neuron
TL;DR: In this article, the anterior cingulate cortex area (ACA) is recruited selectively by recent errors, and 30-Hz optogenetic stimulation of ACAVIS neurons in anesthetized mice recapitulates the increased gamma and reduced theta VIS oscillatory changes that are associated with endogenous post-error performance during behavior and subsequently increased visually evoked spiking, a hallmark feature of visual attention.

Journal ArticleDOI
TL;DR: Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units as discussed by the authors, the former one is a visual perception task while the latter corresponds to visual context reasoning.
Abstract: Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual perception and visual context reasoning, is still in its early stage. It is the core task of many different computer vision applications, such as object detection, visual semantic segmentation, visual relationship detection, or scene graph generation. Since it helps to enhance the accuracy and the consistency of the resulting interpretation, visual context reasoning is often incorporated with visual perception in current deep end-to-end visual semantic information pursuit methods. Surprisingly, a comprehensive review for this exciting area is still lacking. In this survey, we present a unified theoretical paradigm for all these methods, followed by an overview of the major developments and the future trends in each potential direction. The common benchmark datasets, the evaluation metrics and the comparisons of the corresponding methods are also introduced.