scispace - formally typeset
Search or ask a question
Topic

Human visual system model

About: Human visual system model is a research topic. Over the lifetime, 8697 publications have been published within this topic receiving 259440 citations.


Papers
More filters
Proceedings ArticleDOI
04 Dec 1990
TL;DR: A model is described for image segmentation that tries to capture the low-level depth reconstruction exhibited in early human vision, giving an important role to edge terminations, which gives rise to a family of optimal contours, called nonlinear splines, that minimize length and the square of curvature.
Abstract: A model is described for image segmentation that tries to capture the low-level depth reconstruction exhibited in early human vision, giving an important role to edge terminations. The problem is to find a decomposition of the domain D of an image that has a minimum of disrupted edges-junctions of edges, crack tips, corners, and cusps-by creating suitable continuations for the disrupted edges behind occluding regions. The result is a decomposition of D into overlapping regions R/sub 1/ union . . . union R/sub n/ ordered by occlusion, which is called the 2.1-D sketch. Expressed as a minimization problem, the model gives rise to a family of optimal contours, called nonlinear splines, that minimize length and the square of curvature. These are essential in the construction of the 2.1-D sketch of an image, as the continuations of disrupted edges. An algorithm is described that constructs the 2.1-D sketch of an image, and gives results for several example images. The algorithm yields the same interpretations of optical illusions as the human visual system. >

173 citations

Book ChapterDOI
07 Oct 2012
TL;DR: This work complements existing state-of-the art large-scale dynamic computer vision datasets like Hollywood-2 and UCF Sports with human eye movements collected under the ecological constraints of the visual action recognition task, and introduces novel dynamic consistency and alignment models, which underline the remarkable stability of patterns of visual search among subjects.
Abstract: Systems based on bag-of-words models operating on image features collected at maxima of sparse interest point operators have been extremely successful for both computer-based visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in "saccade and fixate" regimes, the knowledge, methodology, and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large-scale dynamic computer vision datasets like Hollywood-2[1] and UCF Sports[2] with human eye movements collected under the ecological constraints of the visual action recognition task. To our knowledge these are the first massive human eye tracking datasets of significant size to be collected for video (497,107 frames, each viewed by 16 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as opposed to free-viewing. Second, we introduce novel dynamic consistency and alignment models, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the massive amounts of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the most advanced computer vision practice, can lead to state of the art results.

172 citations

Journal ArticleDOI
TL;DR: A new saliency detection model based on the human visual sensitivity and the amplitude spectrum of quaternion Fourier transform (QFT) to represent the color, intensity, and orientation distributions for image patches is proposed.
Abstract: With the wide applications of saliency information in visual signal processing, many saliency detection methods have been proposed. However, some key characteristics of the human visual system (HVS) are still neglected in building these saliency detection models. In this paper, we propose a new saliency detection model based on the human visual sensitivity and the amplitude spectrum of quaternion Fourier transform (QFT). We use the amplitude spectrum of QFT to represent the color, intensity, and orientation distributions for image patches. The saliency value for each image patch is calculated by not only the differences between the QFT amplitude spectrum of this patch and other patches in the whole image, but also the visual impacts for these differences determined by the human visual sensitivity. The experiment results show that the proposed saliency detection model outperforms the state-of-the-art detection models. In addition, we apply our proposed model in the application of image retargeting and achieve better performance over the conventional algorithms.

171 citations

Journal ArticleDOI
TL;DR: The results support the hypothesis that the human visual system incorporates a stationary light-source constraint in the perceptual processing of spatial layout of scenes.
Abstract: Phenomenally strong visual illusions are described in which the motion of an object's cast shadow determines the perceived 3-D trajectory of the object. Simply adjusting the motion of a shadow is sufficient to induce dramatically different apparent trajectories of the object casting the shadow. Psychophysical results obtained with the use of 3-D graphics are reported which show that: (i) the information provided by the motion of an object's shadow overrides other strong sources of information and perceptual biases, such as the assumption of constant object size and a general viewpoint; (ii) the natural constraint of shadow darkness plays a role in the interpretation of a moving image patch as a shadow, but under some conditions even unnatural light shadows can induce apparent motion in depth of an object; (iii) when shadow motion is caused by a moving light source, the visual system incorrectly interprets the shadow motion as consistent with a moving object, rather than a moving light source. The results support the hypothesis that the human visual system incorporates a stationary light-source constraint in the perceptual processing of spatial layout of scenes.

170 citations

Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work proposes an objective no-reference video quality assessment method by integrating both effects of content-dependency and temporal-memory effects into a deep neural network, which outperforms five state-of-the-art methods by a large margin.
Abstract: Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.

170 citations


Network Information
Related Topics (5)
Feature (computer vision)
128.2K papers, 1.7M citations
89% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Image segmentation
79.6K papers, 1.8M citations
86% related
Image processing
229.9K papers, 3.5M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202349
202294
2021279
2020311
2019351
2018348