scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2014"


Journal ArticleDOI
TL;DR: Extensive experiments performed on four largescale benchmark databases demonstrate that the proposed IQA index VSI works better in terms of the prediction accuracy than all state-of-the-art IQA indices the authors can find while maintaining a moderate computational complexity.
Abstract: Perceptual image quality assessment (IQA) aims to use computational models to measure the image quality in consistent with subjective evaluations. Visual saliency (VS) has been widely studied by psychologists, neurobiologists, and computer scientists during the last decade to investigate, which areas of an image will attract the most attention of the human visual system. Intuitively, VS is closely related to IQA in that suprathreshold distortions can largely affect VS maps of images. With this consideration, we propose a simple but very effective full reference IQA method using VS. In our proposed IQA model, the role of VS is twofold. First, VS is used as a feature when computing the local quality map of the distorted image. Second, when pooling the quality score, VS is employed as a weighting function to reflect the importance of a local region. The proposed IQA index is called visual saliency-based index (VSI). Several prominent computational VS models have been investigated in the context of IQA and the best one is chosen for VSI. Extensive experiments performed on four large-scale benchmark databases demonstrate that the proposed IQA index VSI works better in terms of the prediction accuracy than all state-of-the-art IQA indices we can find while maintaining a moderate computational complexity. The MATLAB source code of VSI and the evaluation results are publicly available online at http://sse.tongji.edu.cn/linzhang/IQA/VSI/VSI.htm.

823 citations


Journal ArticleDOI
TL;DR: A serial dependence in perception characterized by a spatiotemporally tuned, orientation-selective operator—which the authors call a continuity field—that may promote visual stability over time is revealed.
Abstract: Visual input often arrives in a noisy and discontinuous stream, owing to head and eye movements, occlusion, lighting changes, and many other factors. Yet the physical world is generally stable; objects and physical characteristics rarely change spontaneously. How then does the human visual system capitalize on continuity in the physical environment over time? We found that visual perception in humans is serially dependent, using both prior and present input to inform perception at the present moment. Using an orientation judgment task, we found that, even when visual input changed randomly over time, perceived orientation was strongly and systematically biased toward recently seen stimuli. Furthermore, the strength of this bias was modulated by attention and tuned to the spatial and temporal proximity of successive stimuli. These results reveal a serial dependence in perception characterized by a spatiotemporally tuned, orientation-selective operator-which we call a continuity field-that may promote visual stability over time.

522 citations


Journal ArticleDOI
TL;DR: A robust IR small target detection algorithm based on HVS is proposed to pursue good performance in detection rate, false alarm rate, and speed simultaneously.
Abstract: Robust human visual system (HVS) properties can effectively improve the infrared (IR) small target detection capabilities, such as detection rate, false alarm rate, speed, etc. However, current algorithms based on HVS usually improve one or two of the aforementioned detection capabilities while sacrificing the others. In this letter, a robust IR small target detection algorithm based on HVS is proposed to pursue good performance in detection rate, false alarm rate, and speed simultaneously. First, an HVS size-adaptation process is used, and the IR image after preprocessing is divided into subblocks to improve detection speed. Then, based on HVS contrast mechanism, the improved local contrast measure, which can improve detection rate and reduce false alarm rate, is proposed to calculate the saliency map, and a threshold operation along with a rapid traversal mechanism based on HVS attention shift mechanism is used to get the target subblocks quickly. Experimental results show the proposed algorithm has good robustness and efficiency for real IR small target detection applications.

324 citations


Journal ArticleDOI
TL;DR: In this article, the authors employ magnetoencephalography decoding analysis to measure the dynamics of size and position-invariant visual information development in the ventral visual stream.
Abstract: The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.

265 citations


Journal ArticleDOI
TL;DR: A new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth, which shows superior performance over other existing ones in saliency estimation for 3D images is proposed.
Abstract: Many saliency detection models for 2D images have been proposed for various multimedia processing applications during the past decades. Currently, the emerging applications of stereoscopic display require new saliency detection models for salient region extraction. Different from saliency detection for 2D images, the depth feature has to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a novel stereoscopic saliency detection framework based on the feature contrast of color, luminance, texture, and depth. Four types of features, namely color, luminance, texture, and depth, are extracted from discrete cosine transform coefficients for feature contrast calculation. A Gaussian model of the spatial distance between image patches is adopted for consideration of local and global contrast calculation. Then, a new fusion method is designed to combine the feature maps to obtain the final saliency map for stereoscopic images. In addition, we adopt the center bias factor and human visual acuity, the important characteristics of the human visual system, to enhance the final saliency map for stereoscopic images. Experimental results on eye tracking databases show the superior performance of the proposed model over other existing methods.

166 citations


Journal ArticleDOI
TL;DR: The results suggest that the appearance of category-selective regions at this coarse scale of representation may be explained by the systematic convergence of responses to low-level features that are characteristic of each category.
Abstract: Neuroimaging studies have revealed strong selectivity for object categories in high-level regions of the human visual system. However, it is unknown whether this selectivity is truly based on object category, or whether it reflects tuning for low-level features that are common to images from a particular category. To address this issue, we measured the neural response to different object categories across the ventral visual pathway. Each object category elicited a distinct neural pattern of response. Next, we compared the patterns of neural response between object categories. We found a strong positive correlation between the neural patterns and the underlying low-level image properties. Importantly, this correlation was still evident when the within-category correlations were removed from the analysis. Next, we asked whether basic image properties could also explain variation in the pattern of response to different exemplars from one object category (faces). A significant correlation was also evident between the similarity of neural patterns of response and the low-level properties of different faces, particularly in regions associated with face processing. These results suggest that the appearance of category-selective regions at this coarse scale of representation may be explained by the systematic convergence of responses to low-level features that are characteristic of each category.

144 citations


Journal ArticleDOI
TL;DR: A novel DFT watermarking scheme featuring perceptually optimal visibility versus robustness is proposed and the robustness of the proposed method is globally slightly better than state-of-the-art.
Abstract: More than ever, the growing amount of exchanged digital content calls for efficient and practical techniques to protect intellectual property rights. During the past two decades, watermarking techniques have been proposed to embed and detect information within these contents, with four key requirements at hand: robustness, security, capacity, and invisibility. So far, researchers mostly focused on the first three, but seldom addressed the invisibility from a perceptual perspective and instead mostly relied on objective quality metrics. In this paper, a novel DFT watermarking scheme featuring perceptually optimal visibility versus robustness is proposed. The watermark, a noise-like square patch of coefficients, is embedded by substitution within the Fourier domain; the amplitude component adjusts the watermark strength, and the phase component holds the information. A perceptual model of the human visual system (HVS) based on the contrast sensitivity function (CSF) and a local contrast pooling is used to determine the optimal strength at which the mark reaches the visibility threshold. A novel blind detection method is proposed to assess the presence of the watermark. The proposed approach exhibits high robustness to various kinds of attacks, including geometrical distortions. Experimental results show that the robustness of the proposed method is globally slightly better than state-of-the-art. A comparative study was conducted at the visibility threshold (from subjective data) and showed that the obtained performances are more stable across various kinds of content.

113 citations


Journal ArticleDOI
TL;DR: A no-reference quality metric for stereoscopic images is proposed based on a top-down method modeling the binocular quality perception of the human visual system in the context of blurriness and blockiness.
Abstract: Quality perception of 3-D images is one of the most important parameters for accelerating advances in 3-D imaging fields. Despite active research in recent years for understanding the quality perception of 3-D images, binocular quality perception of asymmetric distortions in stereoscopic images is not thoroughly comprehended. In this paper, we explore the relationship between the perceptual quality of stereoscopic images and visual information, and introduce a model for binocular quality perception. Based on this binocular quality perception model, a no-reference quality metric for stereoscopic images is proposed. The proposed metric is a top-down method modeling the binocular quality perception of the human visual system in the context of blurriness and blockiness. Perceptual blurriness and blockiness scores of left and right images were computed using local blurriness, blockiness, and visual saliency information and then combined into an overall quality index using the binocular quality perception model. Experiments for image and video databases show that the proposed metric provides consistent correlations with subjective quality scores. The results also show that the proposed metric provides higher performance than existing full-reference methods even though the proposed method is a no-reference approach.

110 citations


Proceedings ArticleDOI
01 Jan 2014
TL;DR: The TESTIMAGES archive is presented, a huge and free collection of sample images designed for analysis and quality assessment of different kinds of displays and image processing tecnhiques, and plans to extend the archive including datasets for other kinds of specific analyses.
Abstract: We present the TESTIMAGES archive, a huge and free collection of sample images designed for analysis and quality assessment of different kinds of displays (i.e. monitors, televisions and digital cinema projectors) and image processing tecnhiques. The archive includes more than 2 million images originally acquired and divided in four different categories: SAMPLING and SAMPLING_PATTERNS (aimed at testing resampling algorithms), COLOR (aimed at testing color rendering on different displays) and PATTERNS (aimed at testing the rendering of standard geometrical patterns). The archive is currently online as a SourceForge project and, even if not yet publicized in the scientific community, it has already been used in different contexts and cited in scientific publications. We plan to extend the archive including datasets for other kinds of specific analyses.

110 citations


Journal ArticleDOI
TL;DR: A physiological inverse tone mapping algorithm inspired by the property of the Human Visual System (HVS) first imitates the retina response and deduce it to be local adaptive; then it estimates local adaptation luminance at each point in the image; finally, the LDR image and local luminance are applied to the inversed local retina response to reconstruct the dynamic range of the original scene.
Abstract: The mismatch between the Low Dynamic Range (LDR) content and the High Dynamic Range (HDR) display arouses the research on inverse tone mapping algorithms. In this paper, we present a physiological inverse tone mapping algorithm inspired by the property of the Human Visual System (HVS). It first imitates the retina response and deduce it to be local adaptive; then estimates local adaptation luminance at each point in the image; finally, the LDR image and local luminance are applied to the inversed local retina response to reconstruct the dynamic range of the original scene. The good performance and high-visual quality were validated by operating on 40 test images. Comparison results with several existing inverse tone mapping methods prove the conciseness and efficiency of the proposed algorithm.

105 citations


Journal ArticleDOI
TL;DR: A new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type is described.
Abstract: We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type. The model also accounts for perceptual factors, such as the nonuniform resolution of the human eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the relative saliency of objects or regions are supported by data obtained through a series of eye-tracking experiments. The algorithm developed from the model elements operates by finding and segmenting salient 3D space-time regions in a video, then calculating the saliency strength of each segment using measured attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced. The saliency energy of both segmented objects and frames are weighted using models of human foveation and Panum's fusional area yielding a single predictor of 3D saliency.

Journal ArticleDOI
TL;DR: The integration of color and texture information provides a robust feature set for color image retrieval and yields higher retrieval accuracy than some conventional methods even though its feature vector dimension is not higher than those of the latter for different test DBs.
Abstract: Content-based image retrieval (CBIR) has been an active research topic in the last decade. Feature extraction and representation is one of the most important issues in the CBIR. In this paper, we propose a content-based image retrieval method based on an efficient integration of color and texture features. As its color features, pseudo-Zernike chromaticity distribution moments in opponent chromaticity space are used. As its texture features, rotation-invariant and scale-invariant image descriptor in steerable pyramid domain are adopted, which offers an efficient and flexible approximation of early processing in the human visual system. The integration of color and texture information provides a robust feature set for color image retrieval. Experimental results show that the proposed method yields higher retrieval accuracy than some conventional methods even though its feature vector dimension is not higher than those of the latter for different test DBs.

Journal ArticleDOI
TL;DR: This paper introduces the concept of QR images, an automatic method to embed QR codes into color images with bounded probability of detection error, compatible with standard decoding applications and can be applied to any color image with full area coverage.
Abstract: This paper introduces the concept of QR images, an automatic method to embed QR codes into color images with bounded probability of detection error. These embeddings are compatible with standard decoding applications and can be applied to any color image with full area coverage. The QR information bits are encoded into the luminance values of the image, taking advantage of the immunity of QR readers against local luminance disturbances. To mitigate the visual distortion of the QR image, the algorithm utilizes halftoning masks for the selection of modified pixels and nonlinear programming techniques to locally optimize luminance levels. A tractable model for the probability of error is developed and models of the human visual system are considered in the quality metric used to optimize the luminance levels of the QR image. To minimize the processing time, the optimization techniques proposed to consider the mechanics of a common binarization method and are designed to be amenable for parallel implementations. Experimental results show the graceful degradation of the decoding rate and the perceptual quality as a function the embedding parameters. A visual comparison between the proposed and existing methods is presented.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper proposes the lattice conditional ordinal random field model that incorporates local evidence as well as neighboring order agreement and implements the new model in the continuous domain and applies it to scoring actionness in both image and video datasets.
Abstract: Action analysis in image and video has been attracting more and more attention in computer vision. Recognizing specific actions in video clips has been the main focus. We move in a new, more general direction in this paper and ask the critical fundamental question: what is action, how is action different from motion, and in a given image or video where is the action? We study the philosophical and visual characteristics of action, which lead us to define actionness: intentional bodily movement of biological agents (people, animals). To solve the general problem, we propose the lattice conditional ordinal random field model that incorporates local evidence as well as neighboring order agreement. We implement the new model in the continuous domain and apply it to scoring actionness in both image and video datasets. Our experiments demonstrate not only that our new model can outperform the popular ranking SVM but also that indeed action is distinct from motion.

Journal ArticleDOI
TL;DR: This work proposes a novel fast algorithm for visually salient object detection, robust to real-world illumination conditions, and uses it to extract salient objects which can be efficiently used for training the machine learning-based object detection and recognition unit of the proposed system.
Abstract: Existing object recognition techniques often rely on human labeled data conducting to severe limitations to design a fully autonomous machine vision system. In this work, we present an intelligent machine vision system able to learn autonomously individual objects present in real environment. This system relies on salient object detection. In its design, we were inspired by early processing stages of human visual system. In this context we suggest a novel fast algorithm for visually salient object detection, robust to real-world illumination conditions. Then we use it to extract salient objects which can be efficiently used for training the machine learning-based object detection and recognition unit of the proposed system. We provide results of our salient object detection algorithm on MSRA Salient Object Database benchmark comparing its quality with other state-of-the-art approaches. The proposed system has been implemented on a humanoid robot, increasing its autonomy in learning and interaction with humans. We report and discuss the obtained results, validating the proposed concepts.

Journal ArticleDOI
TL;DR: Improved adaptive performance of the proposed scheme is in resistant to several types of attacks in comparison with the previous schemes; the adaptive performance refers to the adaptive parameter of the luminance masking functioned to improve the performance or robustness of an image from any attacks.
Abstract: This paper proposes an adaptive watermarking scheme for e-government document images. The adaptive scheme combines the discrete cosine transform (DCT) and the singular value decomposition (SVD) using luminance masking. As a core of masking model in the human visual system (HVS), luminance masking is implemented to improve noise sensitivity. Genetic algorithm (GA), subsequently, is employed for the optimization of the scaling factor of the masking. Involving a number of steps, the scheme proposed through this study begins by calculating the mask of the host image using luminance masking. It is then continued by transforming the mask on each area into all frequencies domain. The watermark image, following this, is embedded by modifying the singular values of DCT-transformed host image with singular values of mask coefficient of host image and the control parameter of DCT-transformed watermark image using Genetic Algorithm (GA). The use of both the singular values and the control parameter respectively, in this case, is not only to improve the sensitivity of the watermark performance but also to avoid the false positive problem. The watermark image, afterwards, is extracted from the distorted images. The experiment results show the improved adaptive performance of the proposed scheme is in resistant to several types of attacks in comparison with the previous schemes; the adaptive performance refers to the adaptive parameter of the luminance masking functioned to improve the performance or robustness of an image from any attacks.

Journal ArticleDOI
TL;DR: The main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can be used in (k-n)-XVCS, which uses XOR operation for decoding, and to enhance the contrast.
Abstract: A (k,n) visual cryptographic scheme (VCS) encodes a secret image into n shadow images (printed on transparencies) distributed among n participants. When any k participants superimpose their transparencies on an overhead projector (OR operation), the secret image can be visually revealed by a human visual system without computation. However, the monotone property of OR operation degrades the visual quality of reconstructed image for OR-based VCS (OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR operation for decoding, was proposed to enhance the contrast. In this paper, we investigate the relation between OVCS and XVCS. Our main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can be used in (k,n)-XVCS. Meantime, the contrast is enhanced 2(k-1) times.

Journal ArticleDOI
Alexander Tanchenko1
TL;DR: A novel objective full-reference measure of image quality (VPSNR), which is a modified PSNR measure that for images compressed by block-based compression algorithms (like JPEG) the proposed measure in the pixel domain matches well with MOS.

Journal ArticleDOI
TL;DR: A novel method which combines the three mechanisms of HVS, the Proportional-Integral-Derivative (PID) algorithm, is proposed to detect and track the dim and small targets in infrared images and videos.

Journal ArticleDOI
TL;DR: It is demonstrated that both exogenously and endogenously cued attention facilitate the processing of visual target information, but not of visual hand information, suggesting the existence of a dedicated visuomotor binding mechanism that links the hand representation in visual and motor systems.

Journal ArticleDOI
TL;DR: An advanced foveal imaging model is proposed to generate the perceived representation of video by integrating visual attention into the foveation mechanism, and a novel approach to predict video fixations is proposed by mimicking the essential functionality of eye movement.
Abstract: Contrast sensitivity of the human visual system to visual stimuli can be significantly affected by several mechanisms, e.g., vision foveation and attention. Existing studies on foveation based video quality assessment only take into account static foveation mechanism. This paper first proposes an advanced foveal imaging model to generate the perceived representation of video by integrating visual attention into the foveation mechanism. For accurately simulating the dynamic foveation mechanism, a novel approach to predict video fixations is proposed by mimicking the essential functionality of eye movement. Consequently, an advanced contrast sensitivity function, derived from the attention driven foveation mechanism, is modeled and then integrated into a wavelet-based distortion visibility measure to build a full reference attention driven foveated video quality (AFViQ) metric. AFViQ exploits adequately perceptual visual mechanisms in video quality assessment. Extensive evaluation results with respect to several publicly available eye-tracking and video quality databases demonstrate promising performance of the proposed video attention model, fixation prediction approach, and quality metric.

Patent
05 Aug 2014
TL;DR: In this article, a computing system for realizing visual content of an image collection executes feature detection algorithms and semantic reasoning techniques on the images in the collection to elicit a number of different types of visual features of the images.
Abstract: A computing system for realizing visual content of an image collection executes feature detection algorithms and semantic reasoning techniques on the images in the collection to elicit a number of different types of visual features of the images. The computing system indexes the visual features and provides technologies for multi-dimensional content-based clustering, searching, and iterative exploration of the image collection using the visual features and/or the visual feature indices.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: The proposed technique provides a fused image with better edges and information content from human visual system (HVS) point of view and is found to be superior than that of Daubechies complex wavelet transform (DCxWT).
Abstract: Fusion of various images aids the rejuvenation of complementary attributes of the images. Similarly, medical image fusion constructs a composite image comprehending significant traits from multimodal source images. Current work exhibits medical image fusion utilizing Laplacian Pyramid (LP) employing DCT. LP decomposes the source medical images as different low pass filtered images, resembling a pyramidal structure. As the pyramidal level of decomposition increases, the quality of the fused image also increases. The proposed technique provides a fused image with better edges and information content from human visual system (HVS) point of view. Qualitative and quantitative analysis of the proposed technique is found to be superior than that of Daubechies complex wavelet transform (DCxWT).

Journal ArticleDOI
TL;DR: A psychophysical study designed to obtain local contrast detection thresholds for a database of natural images to provide researchers with a large ground-truth dataset in order to further investigate the properties of the human visual system using natural masks.
Abstract: Studies of visual masking have provided a wide range of important insights into the processes involved in visual coding. However, very few of these studies have employed natural scenes as masks. Little is known on how the particular features found in natural scenes affect visual detection thresholds and how the results obtained using unnatural masks relate to the results obtained using natural masks. To address this issue, this paper describes a psychophysical study designed to obtain local contrast detection thresholds for a database of natural images. Via a three-alternative forced-choice experiment, we measured thresholds for detecting 3.7 cycles/° vertically oriented log-Gabor noise targets placed within an 85 × 85-pixels patch (1.9° patch) drawn from 30 natural images from the CSIQ image database (Larson & Chandler, Journal of Electronic Imaging, 2010). Thus, for each image, we obtained a masking map in which each entry in the map denotes the root mean squared contrast threshold for detecting the log-Gabor noise target at the corresponding spatial location in the image. From qualitative observations we found that detection thresholds were affected by several patch properties such as visual complexity, fineness of textures, sharpness, and overall luminance. Our quantitative analysis shows that except for the sharpness measure (correlation coefficient of 0.7), the other tested low-level mask features showed a weak correlation (correlation coefficients less than or equal to 0.52) with the detection thresholds. Furthermore, we evaluated the performance of a computational contrast gain control model that performed fairly well with an average correlation coefficient of 0.79 in predicting the local contrast detection thresholds. We also describe specific choices of parameters for the gain control model. The objective of this database is to provide researchers with a large ground-truth dataset in order to further investigate the properties of the human visual system using natural masks.

Proceedings ArticleDOI
01 Apr 2014
TL;DR: This paper proposes and evaluates a video search framework based on using visual information to enrich the classic text-based search for video retrieval, and attempts to overcome the so called problem of semantic gap by automatically mapping query text to semantic concepts.
Abstract: Currently, popular search engines retrieve documents on the basis of text information. However, integrating the visual information with the text-based search for video and image retrieval is still a hot research topic. In this paper, we propose and evaluate a video search framework based on using visual information to enrich the classic text-based search for video retrieval. The framework extends conventional text-based search by fusing together text and visual scores, obtained from video subtitles (or automatic speech recognition) and visual concept detectors respectively. We attempt to overcome the so called problem of semantic gap by automatically mapping query text to semantic concepts. With the proposed framework, we endeavor to show experimentally, on a set of real world scenarios, that visual cues can effectively contribute to the quality improvement of video retrieval. Experimental results show that mapping text-based queries to visual concepts improves the performance of the search system. Moreover, when appropriately selecting the relevant visual concepts for a query, a very significant improvement of the system's performance is achieved.

Journal ArticleDOI
TL;DR: A new framework for quantifying 3D visual information is proposed that is applied to the problem of predicting visual fatigue experienced when viewing 3D displays, and the 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized Gaussian probability distribution model and a set of 3D perceptual weights.
Abstract: One of the most challenging ongoing issues in the field of 3D visual research is how to perceptually quantify object and surface visualizations that are displayed within a virtual 3D space between a human eye and 3D display. To seek an effective method of quantification, it is necessary to measure various elements related to the perception of 3D objects at different depths. We propose a new framework for quantifying 3D visual information that we call 3D visual activity (3DVA), which utilizes natural scene statistics measured over 3D visual coordinates. We account for important aspects of 3D perception by carrying out a 3D coordinate transform reflecting the nonuniform sampling resolution of the eye and the process of stereoscopic fusion. The 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized Gaussian probability distribution model and a set of 3D perceptual weights. We conducted a series of simulations that demonstrate the effectiveness of the 3DVA for quantifying the statistical dynamics of visual 3D space with respect to disparity, motion, texture, and color. A successful example application is also provided, whereby 3DVA is applied to the problem of predicting visual fatigue experienced when viewing 3D displays.

Patent
13 Jun 2014
TL;DR: In this paper, the methods, systems, and techniques to enhance computer vision application processing are disclosed, which may reduce power consumption for computer vision applications and improve processing efficiency for Computer vision applications.
Abstract: Methods, systems, and techniques to enhance computer vision application processing are disclosed. In particular, the methods, systems, and techniques may reduce power consumption for computer vision applications and improve processing efficiency for computer vision applications.

Journal ArticleDOI
16 Sep 2014
TL;DR: An analytical framework is presented that combines main perception and human visual requirements with analytical tools and principles used in related disciplines such as optics, computer graphics, computational imaging, and signal processing and defines a notion of perceivable light fields to account for the human visual system physiological requirements.
Abstract: Recently, there has been a substantial increase in efforts to develop 3-D visualization technologies that can provide the viewers with a realistic 3-D visual experience. Various terms such as “reality communication” have been used to categorize these efforts. In order to provide the viewers with a complete and realistic visual sensation, the display or visualization system and the displayed content need to match the physiological 3-D information sensing capabilities of the human visual system which can be quite complex. These may include spatial and temporal resolutions, depth perception, dynamic range, spectral contents, nonlinear effects, and vergence accommodation effects. In this paper, first we present an overview of some of the 3-D display research efforts which have been extensively pursued in Asia, Europe, and North America among other areas. Based on the limitations and comfort-based requirements of the human visual system when viewing a nonnatural visual input from 3-D displays, we present an analytical framework that combines main perception and human visual requirements with analytical tools and principles used in related disciplines such as optics, computer graphics, computational imaging, and signal processing. Building on the widely used notion of light fields, we define a notion of perceivable light fields to account for the human visual system physiological requirements, and propagate it back to the display device to determine the display device specifications. This helps us clarify the fundamental and practical requirements of the 3-D display devices for reality viewing communication. In view of the proposed analytical framework, we overview various methods that can be applied to overcome the extensive information needed to be displayed in order to meet the requirements imposed by the human visual system.

Journal ArticleDOI
TL;DR: This work characterized the orientation tuning properties of the perceptual process supporting probe discrimination; tuning was substantially reshaped by semantic manipulation, demonstrating that low-level feature detectors operate under partial control from higher level modules.
Abstract: In the early stages of image analysis, visual cortex represents scenes as spatially organized maps of locally defined features (e.g., edge orientation). As image reconstruction unfolds and features are assembled into larger constructs, cortex attempts to recover semantic content for object recognition. It is conceivable that higher level representations may feed back onto early processes and retune their properties to align with the semantic structure projected by the scene; however, there is no clear evidence to either support or discard the applicability of this notion to the human visual system. Obtaining such evidence is challenging because low and higher level processes must be probed simultaneously within the same experimental paradigm. We developed a methodology that targets both levels of analysis by embedding low-level probes within natural scenes. Human observers were required to discriminate probe orientation while semantic interpretation of the scene was selectively disrupted via stimulus inversion or reversed playback. We characterized the orientation tuning properties of the perceptual process supporting probe discrimination; tuning was substantially reshaped by semantic manipulation, demonstrating that low-level feature detectors operate under partial control from higher level modules. The manner in which such control was exerted may be interpreted as a top-down predictive strategy whereby global semantic content guides and refines local image reconstruction. We exploit the novel information gained from data to develop mechanistic accounts of unexplained phenomena such as the classic face inversion effect.