scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2015"


Journal ArticleDOI
TL;DR: Overall, the probabilistic atlases quantify the variability of topographic representations in human cortex and provide a useful reference for comparing data across studies that can be transformed into these standard spaces.
Abstract: The human visual system contains an array of topographically organized regions. Identifying these regions in individual subjects is a powerful approach to group-level statistical analysis, but this is not always feasible. We addressed this limitation by generating probabilistic maps of visual topographic areas in 2 standardized spaces suitable for use with adult human brains. Using standard fMRI paradigms, we identified 25 topographic maps in a large population of individual subjects (N = 53) and transformed them into either a surface- or volume-based standardized space. Here, we provide a quantitative characterization of the inter-subject variability within and across visual regions, including the likelihood that a given point would be classified as a part of any region (full probability map) and the most probable region for any given point (maximum probability map). By evaluating the topographic organization across the whole of visual cortex, we provide new information about the organization of individual visual field maps and large-scale biases in visual field coverage. Finally, we validate each atlas for use with independent subjects. Overall, the probabilistic atlases quantify the variability of topographic representations in human cortex and provide a useful reference for comparing data across studies that can be transformed into these standard spaces.

569 citations


Journal ArticleDOI
TL;DR: A new no-reference (NR) image quality assessment (IQA) metric is proposed using the recently revealed free-energy-based brain theory and classical human visual system (HVS)-inspired features to predict an image that the HVS perceives from a distorted image based on the free energy theory.
Abstract: In this paper we propose a new no-reference (NR) image quality assessment (IQA) metric using the recently revealed free-energy-based brain theory and classical human visual system (HVS)-inspired features. The features used can be divided into three groups. The first involves the features inspired by the free energy principle and the structural degradation model. Furthermore, the free energy theory also reveals that the HVS always tries to infer the meaningful part from the visual stimuli. In terms of this finding, we first predict an image that the HVS perceives from a distorted image based on the free energy theory, then the second group of features is composed of some HVS-inspired features (such as structural information and gradient magnitude) computed using the distorted and predicted images. The third group of features quantifies the possible losses of “naturalness” in the distorted image by fitting the generalized Gaussian distribution to mean subtracted contrast normalized coefficients. After feature extraction, our algorithm utilizes the support vector machine based regression module to derive the overall quality score. Experiments on LIVE, TID2008, CSIQ, IVC, and Toyama databases confirm the effectiveness of our introduced NR IQA metric compared to the state-of-the-art.

548 citations


Journal ArticleDOI
TL;DR: By the proposed approach, a deep architecture could be designed to learn the high-level features for scene recognition in an unsupervised fashion, and Experiments on standard data sets show that the method outperforms the state-of-the-art used forscene recognition.
Abstract: Scene recognition is an important problem in the field of computer vision, because it helps to narrow the gap between the computer and the human beings on scene understanding Semantic modeling is a popular technique used to fill the semantic gap in scene recognition However, most of the semantic modeling approaches learn shallow, one-layer representations for scene recognition, while ignoring the structural information related between images, often resulting in poor performance Modeled after our own human visual system, as it is intended to inherit humanlike judgment, a manifold regularized deep architecture is proposed for scene recognition The proposed deep architecture exploits the structural information of the data, making for a mapping between visible layer and hidden layer By the proposed approach, a deep architecture could be designed to learn the high-level features for scene recognition in an unsupervised fashion Experiments on standard data sets show that our method outperforms the state-of-the-art used for scene recognition

203 citations


Journal ArticleDOI
TL;DR: In this paper, the first large-scale human eye tracking dataset was released for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their relevance, dynamic, video stimuli, task control, as well as free-viewing.
Abstract: Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in ‘saccade and fixate’ regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 [1] and UCF Sports [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing . Second, we introduce novel dynamic consistency and alignment measures , which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results.

198 citations


Book
01 Jun 2015
TL;DR: It is argued that rapid object categorizations in natural scenes can be done without focused attention and are most likely based on coarse and unconscious visual representations activated with the first available (magnocellular) visual information.
Abstract: Visual categorization appears both effortless and virtually instantaneous, the study by Thorpe et al. (1996) was the first to estimate the processing time necessary to perform fast visual categorization of animals in briefly flashed (20ms) natural photographs. They observed a large differential EEG activity between target and distrater correct trials that developed from 150 ms after stimulus onset. A value that was later shown to be even shorter in monkeys! With such strong processing time constraints, it was difficult to escape the conclusion that rapid visual categorization was relying on massively parallel, essentially feed-forward processing of visual information. Since 1996, we have conducted a large number of studies to determine the characteristics and limits of fast visual categorization. The present chapter will review some of the main results obtained. I will argue that rapid object categorizations in natural scenes can be done without focused attention and are most likely based on coarse and unconscious visual representations activated with the first available (magnocellular) visual information. Fast visual processing proved efficient for the categorization of large superordinate object or scene categories, but shows its limits when more detailed basic representations are required. Basic objects (dogs, cars) or scenes (mountain or sea landscapes) representations need additionnal processing time to be activated. A finding that is at odds with the widely accepted idea that such basic representations are at the entry level of the system. Interestingly, focused attention is still not required to perform such, more time consuming, basic categorizations. Finally we will show that object and context processing can interact very early in an ascending wave of visual information processing. We will discuss how such data could result from our experience with a highly structured and predictable surrounding world that shaped neuronal visual selectivity.

166 citations


Journal ArticleDOI
TL;DR: A visual-attention-aware model to mimic the HVS for salient-object detection and proposes a method for extracting directional patches, as humans are sensitive to orientation features, and as directional patches are reliable cues.
Abstract: The human visual system (HVS) can reliably perceive salient objects in an image, but, it remains a challenge to computationally model the process of detecting salient objects without prior knowledge of the image contents. This paper proposes a visual-attention-aware model to mimic the HVS for salient-object detection. The informative and directional patches can be seen as visual stimuli, and used as neuronal cues for humans to interpret and detect salient objects. In order to simulate this process, two typical patches are extracted individually and in parallel from the intensity channel and the discriminant color channel, respectively, as the primitives. In our algorithm, an improved wavelet-based salient-patch detector is used to extract the visually informative patches. In addition, as humans are sensitive to orientation features, and as directional patches are reliable cues, we also propose a method for extracting directional patches. These two different types of patches are then combined to form the most important patches, which are called preferential patches and are considered as the visual stimuli applied to the HVS for salient-object detection. Compared with the state-of-the-art methods for salient-object detection, experimental results using publicly available datasets show that our produced algorithm is reliable and effective.

147 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrated that the proposed image watermarking scheme developed in the wavelet domain possesses the strong robustness against image manipulation attacks, but also, is comparable to other schemes in term of visual quality.

134 citations


Journal ArticleDOI
TL;DR: Experimental results show that the random forest regression model trained by the proposed DOG feature is highly correspondent to the HVS and is also robust when tested by available databases.
Abstract: Objective image quality assessment (IQA) plays an important role in the development of multimedia applications. Prediction of IQA metric should be consistent with human perception. The release of the newest IQA database (TID2013) challenges most of the widely used quality metrics (e.g., peak-to-noise-ratio and structure similarity index). We propose a new methodology to build the metric model using a regression approach. The new IQA score is set to be the nonlinear combination of features extracted from several difference of Gaussian (DOG) frequency bands, which mimics the human visual system (HVS). Experimental results show that the random forest regression model trained by the proposed DOG feature is highly correspondent to the HVS and is also robust when tested by available databases.

129 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared eight state-of-the-art CNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking.
Abstract: Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations.

124 citations


Patent
16 Dec 2015
TL;DR: An exemplary augmented reality vision barcode scanning system for use with a human visual system includes at least one scanning contact lens, wireless enabled companion eyewear, and a remote barcode decoder as discussed by the authors.
Abstract: An exemplary augmented reality vision barcode scanning system for use with a human visual system includes at least one scanning contact lens, wireless enabled companion eyewear, and a remote barcode decoder. A related, exemplary method includes scanning and displaying barcodes with the augmented reality vision barcode scanning system in conjunction with a human visual system.

106 citations


Proceedings ArticleDOI
TL;DR: This work proposes a full reference video QoE measure, named SSIMplus, that provides real-time prediction of the perceptual quality of a video based on human visual system behaviors, video content characteristics, display device properties, and viewing conditions.
Abstract: Today's viewers consume video content from a variety of connected devices, including smart phones, tablets, notebooks, TVs, and PCs. This imposes significant challenges for managing video traffic efficiently to ensure an acceptable quality-of-experience (QoE) for the end users as the perceptual quality of video content strongly depends on the properties of the display device and the viewing conditions. State-of-the-art full-reference objective video quality assessment algorithms do not take into account the combined impact of display device properties, viewing conditions, and video resolution while performing video quality assessment. We performed a subjective study in order to understand the impact of aforementioned factors on perceptual video QoE. We also propose a full reference video QoE measure, named SSIMplus, that provides real-time prediction of the perceptual quality of a video based on human visual system behaviors, video content characteristics (such as spatial and temporal complexity, and video resolution), display device properties (such as screen size, resolution, and brightness), and viewing conditions (such as viewing distance and angle). Experimental results have shown that the proposed algorithm outperforms state-of-the-art video quality measures in terms of accuracy and speed.

Journal ArticleDOI
TL;DR: Experimental results have demonstrated that the proposed steganographic scheme can achieve statistical security without degrading the image quality or the embedding capacity.
Abstract: Most state-of-the-art binary image steganographic techniques only consider the flipping distortion according to the human visual system, which will be not secure when they are attacked by steganalyzers. In this paper, a binary image steganographic scheme that aims to minimize the embedding distortion on the texture is presented. We extract the complement, rotation, and mirroring-invariant local texture patterns (crmiLTPs) from the binary image first. The weighted sum of crmiLTP changes when flipping one pixel is then employed to measure the flipping distortion corresponding to that pixel. By testing on both simple binary images and the constructed image data set, we show that the proposed measurement can well describe the distortions on both visual quality and statistics. Based on the proposed measurement, a practical steganographic scheme is developed. The steganographic scheme generates the cover vector by dividing the scrambled image into superpixels. Thereafter, the syndrome-trellis code is employed to minimize the designed embedding distortion. Experimental results have demonstrated that the proposed steganographic scheme can achieve statistical security without degrading the image quality or the embedding capacity.

Journal ArticleDOI
TL;DR: Extensive evaluations on three commonly used datasets, including the test with the dataset dependent optimal parameters, as well as the intra-dataset cross validation, show that the physiologically inspired DOCC model can produce quite competitive results in comparison to the state-of-the-art approaches, but with a relative simple implementation and without requiring fine-tuning of the method for each different dataset.
Abstract: The double-opponent (DO) color-sensitive cells in the primary visual cortex (V1) of the human visual system (HVS) have long been recognized as the physiological basis of color constancy. In this work we propose a new color constancy model by imitating the functional properties of the HVS from the single-opponent (SO) cells in the retina to the DO cells in V1 and the possible neurons in the higher visual cortexes. The idea behind the proposed double-opponency based color constancy (DOCC) model originates from the substantial observation that the color distribution of the responses of DO cells to the color-biased images coincides well with the vector denoting the light source color. Then the illuminant color is easily estimated by pooling the responses of DO cells in separate channels in LMS space with the pooling mechanism of $sum$ or $max$ . Extensive evaluations on three commonly used datasets, including the test with the dataset dependent optimal parameters, as well as the intra- and inter-dataset cross validation, show that our physiologically inspired DOCC model can produce quite competitive results in comparison to the state-of-the-art approaches, but with a relative simple implementation and without requiring fine-tuning of the method for each different dataset.

Journal ArticleDOI
01 Jun 2015-Optik
TL;DR: A survey of existing algorithms for no-reference image quality assessment is presented, which includes type of noise and distortions covered, techniques and parameters used by these algorithms, databases on which the algorithms are validated and benchmarking of their performance with each other and also with human visual system.

Journal ArticleDOI
TL;DR: Recent evidence is reviewed suggesting that patterns of response in high-level visual areas may be better explained by response to image properties that are characteristic of different object categories.
Abstract: Neuroimaging research over the past 20 years has begun to reveal a picture of how the human visual system is organized. A key distinction that has arisen from these studies is the difference in the organization of low-level and high-level visual regions. Low-level regions contain topographic maps that are tightly linked to properties of the image. In contrast, high-level visual areas are thought to be arranged in modules that are tightly linked to categorical or semantic information in the image. To date, an unresolved question has been how the strong functional selectivity for object categories in high-level visual regions might arise from the image-based representations found in low-level visual regions. Here, we review recent evidence suggesting that patterns of response in high-level visual areas may be better explained by response to image properties that are characteristic of different object categories.

Journal ArticleDOI
TL;DR: Experimental results show that the DO cells the authors modeled can flexibly capture both the structured chromatic and achromatic boundaries of salient objects in complex scenes when the cone inputs to DO cells are unbalanced.
Abstract: Brightness and color are two basic visual features integrated by the human visual system (HVS) to gain a better understanding of color natural scenes. Aiming to combine these two cues to maximize the reliability of boundary detection in natural scenes, we propose a new framework based on the color-opponent mechanisms of a certain type of color-sensitive double-opponent (DO) cells in the primary visual cortex (V1) of HVS. This type of DO cells has oriented receptive field with both chromatically and spatially opponent structure. The proposed framework is a feedforward hierarchical model, which has direct counterpart to the color-opponent mechanisms involved in from the retina to V1. In addition, we employ the spatial sparseness constraint (SSC) of neural responses to further suppress the unwanted edges of texture elements. Experimental results show that the DO cells we modeled can flexibly capture both the structured chromatic and achromatic boundaries of salient objects in complex scenes when the cone inputs to DO cells are unbalanced. Meanwhile, the SSC operator further improves the performance by suppressing redundant texture edges. With competitive contour detection accuracy, the proposed model has the additional advantage of quite simple implementation with low computational cost.

Journal ArticleDOI
TL;DR: A novel independent feature similarity (IFS) index is proposed for full-reference IQA that can effectively predict the quality of an image with color distortion and has relatively low computational complexity and high correlation with subjective quality evaluation.

Journal ArticleDOI
TL;DR: A no-reference objective blur metric based on edge model (EMBM) is presented to address the image blur assessment problem and advocate using only the salient edge pixels to simulate the blur assessment in Human Visual System (HVS).

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the orientation selectivity-based structure descriptor is robust to disturbance, and can effectively represent the structure degradation caused by different types of distortion.
Abstract: The human visual system is highly adaptive to extract structure information for scene perception, and structure character is widely used in perception-oriented image processing works. However, the existing structure descriptors mainly describe the luminance contrast of a local region, but cannot effectively represent the spatial correlation of structure. In this paper, we introduce a novel structure descriptor according to the orientation selectivity mechanism in the primary visual cortex. Research on cognitive neuroscience indicate that the arrangement of excitatory and inhibitory cortex cells arise orientation selectivity in a local receptive field, within which the primary visual cortex performs visual information extraction for scene understanding. Inspired by the orientation selectivity mechanism, we compute the correlations among pixels in a local region based on the similarities of their preferred orientation. By imitating the arrangement of the excitatory/inhibitory cells, the correlations between a central pixel and its local neighbors are binarized, and the spatial correlation is represented with a set of binary values, which is named the orientation selectivity-based pattern. Then, taking both the gradient magnitude and the orientation selectivity-based pattern into account, a rotation invariant structure descriptor is introduced. The proposed structure descriptor is applied in texture classification and reduced reference image quality assessment, as two different application domains to verify its generality and robustness. Experimental results demonstrate that the orientation selectivity-based structure descriptor is robust to disturbance, and can effectively represent the structure degradation caused by different types of distortion.

Journal ArticleDOI
TL;DR: In this article, a spatial filtering method is proposed to project the data into a low-dimensional space in which the trial-to-trial spectral covariance is maximized, and the resulting technique recovers physiologically plausible components (i.e., the recovered topographies match the lead fields of the underlying sources).

Posted Content
TL;DR: This work introduces a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours, and achieves competitive results compared to purpose-built models on standard recognition and retrieval tasks, while exhibiting greater flexibility in answering richer visual queries.
Abstract: The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, today's vision systems would still struggle to deal with visual queries that require a deeper reasoning. We propose a knowledge base (KB) framework to handle an assortment of visual queries, without the need to train new classifiers for new tasks. Building such a large-scale multimodal KB presents a major challenge of scalability. We cast a large-scale MRF into a KB representation, incorporating visual, textual and structured data, as well as their diverse relations. We introduce a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours. Our system achieves competitive results compared to purpose-built models on standard recognition and retrieval tasks, while exhibiting greater flexibility in answering richer visual queries.

Journal ArticleDOI
TL;DR: Observations of a selected improvement of processing speed at the lower positions of the computer screen after video game training and of retest effects are suggestive for limited possibilities to improve basic aspects of visual attention (TVA) with practice.

Journal ArticleDOI
TL;DR: This paper presents an extension of the previous 2D most apparent distortion algorithm to a 3D version (3D-MAD) to evaluate 3D image quality, and demonstrates that this algorithm significantly improves upon many other state-of-the-art 2D/3D IQA algorithms.
Abstract: Algorithms for a stereoscopic image quality assessment (IQA) aim to estimate the qualities of 3D images in a manner that agrees with human judgments The modern stereoscopic IQA algorithms often apply 2D IQA algorithms on stereoscopic views, disparity maps, and/or cyclopean images, to yield an overall quality estimate based on the properties of the human visual system This paper presents an extension of our previous 2D most apparent distortion (MAD) algorithm to a 3D version (3D-MAD) to evaluate 3D image quality The 3D-MAD operates via two main stages, which estimate perceived quality degradation due to 1) distortion of the monocular views and 2) distortion of the cyclopean view In the first stage, the conventional MAD algorithm is applied on the two monocular views, and then the combined binocular quality is estimated via a weighted sum of the two estimates, where the weights are determined based on a block-based contrast measure In the second stage, intermediate maps corresponding to the lightness distance and the pixel-based contrast are generated based on a multipathway contrast gain-control model Then, the cyclopean view quality is estimated by measuring the statistical-difference-based features obtained from the reference stereopair and the distorted stereopair, respectively Finally, the estimates obtained from the two stages are combined to yield an overall quality score of the stereoscopic image Tests on various 3D image quality databases demonstrate that our algorithm significantly improves upon many other state-of-the-art 2D/3D IQA algorithms

Journal ArticleDOI
TL;DR: A novel watermarking algorithm is proposed to embed the color image watermark in the direct current (DC) coefficients and the alternating current (AC) coefficients of the color host image by utilizing the two-level DCT.
Abstract: With the widespread use of color images in many areas, the colorful logo or mark is gradually used as watermark to protect the copyright in recent years. Since the color image watermark has more bit information, it is a challenging work to design a robust color watermarking scheme. By utilizing the two-level DCT, a novel watermarking algorithm is proposed to embed the color image watermark in the direct current (DC) coefficients and the alternating current (AC) coefficients of the color host image. Firstly, the host image is divided into $$8 \times 8$$ non-overlapping blocks, and these blocks are transformed by one-level DCT. Secondly, its upper-left $$4 \times 4$$ coefficients are further transformed by two-level DCT, and the transformed coefficients are ordered by zigzag arrangement. Thirdly, according to human visual system (HVS), the digital watermarks are embedded into the DC coefficient and the first seven AC coefficients of these blocks, respectively. Experimental results show that the proposed watermarking algorithm is robust to many common image processing attacks and geometric attacks, and the performance of the proposed method outperforms other color watermarking methods considered in this paper.

01 Jan 2015
TL;DR: The present paper attempts to illustrate how biological information can be used to constrain connectionist models and argues that although there is good evidence for certain coactivation related synaptic modification schemes, other learning mechanisms, including back-propagation, are not currently supported by experimental data.
Abstract: Many researchers interested in connectionist models accept that such models are "neurally inspired" but do not worry too much about whether their models are biologically realistic. While such a position may be perfectly justifiable, the present paper attempts to illustrate how biological information can be used to constrain connectionist models. Two particular areas are discussed. The first section deals with visual information processing in the primate and human visual system. It is argued that speed with which visual information is processed imposes major constraints on the architecture and operation of the visual system. In particular, it seems that a great deal of processing must depend on a single bottum-up pass. The second section deals with biological aspects of learning algorithms. It is argued that although there is good evidence for certain coactivation related synaptic modification schemes, other learning mechanisms, including back-propagation, are not currently supported by experimental data.

Journal ArticleDOI
TL;DR: The attentional processes that are active during search and their neural basis are discussed, including the gradual emergence of spatially specific and temporally sustained biases for representations of task-relevant visual objects in cortical maps.
Abstract: In visual search, observers try to find known target objects among distractors in visual scenes where the location of the targets is uncertain. This review article discusses the attentional processes that are active during search and their neural basis. Four successive phases of visual search are described. During the initial preparatory phase, a representation of the current search goal is activated. Once visual input has arrived, information about the presence of target-matching features is accumulated in parallel across the visual field (guidance). This information is then used to allocate spatial attention to particular objects (selection), before representations of selected objects are activated in visual working memory (recognition). These four phases of attentional control in visual search are characterized both at the cognitive level and at the neural implementation level. It will become clear that search is a continuous process that unfolds in real time. Selective attention in visual search is described as the gradual emergence of spatially specific and temporally sustained biases for representations of task-relevant visual objects in cortical maps.

Proceedings ArticleDOI
TL;DR: This work presents a novel interactive method in performing the visual JND test on compressed image/video using a binary-forced choice, which is often adopted to discriminate the difference in perception in a psychophysical experiment.
Abstract: The visual Just-Noticeable-Difference (JND) metric is characterized by the detectable minimum amount of two visual stimuli. Conducting the subjective JND test is a labor-intensive task. In this work, we present a novel interactive method in performing the visual JND test on compressed image/video. JND has been used to enhance perceptual visual quality in the context of image/video compression. Given a set of coding parameters, a JND test is designed to determine the distinguishable quality level against a reference image/video, which is called the anchor. The JND metric can be used to save coding bitrates by exploiting the special characteristics of the human visual system. The proposed JND test is conducted using a binary-forced choice, which is often adopted to discriminate the difference in perception in a psychophysical experiment. The assessors are asked to compare coded image/video pairs and determine whether they are of the same quality or not. A bisection procedure is designed to find the JND locations so as to reduce the required number of comparisons over a wide range of bitrates. We will demonstrate the efficiency of the proposed JND test, report experimental results on the image and video JND tests.

Journal ArticleDOI
TL;DR: It is found that the proposed watermarking scheme is fast enough to carry out these operations on a real timescale and the Fuzzy-BPN is successful candidate for implementing novel gray-scale image water marking scheme meeting real timelines.

Journal ArticleDOI
TL;DR: An overview of recent advances in 3D techniques is provided by relating them with human factors, primarily focusing on subjective and objective measurement methods, to ensure that human-friendly 3D content and displays will benefit from recent technical advances.
Abstract: Three-dimensional (3D) display systems are widely used nowadays, and their psychophysiological effects on human health have been investigated in detail. However, due to recent advances in 3D display technology, such as (super) multiview display or holography, there is a clear and pressing need to develop a new measurement method for determining optimal viewing parameters. Depending on the display system in question, virtual objects with depth information may present different properties to the human visual system and thus are perceived differently. The methods to measure the factors that affect human health in 3D displays need to be thoroughly reviewed in order to further investigate these characteristics and determine optimal viewing parameters. In this paper, we review various measurement methods that have been proposed to examine the effects of 3D stimuli on the human visual system. We provide an overview of recent advances in 3D techniques by relating them with human factors, primarily focusing on subjective and objective measurement methods, to ensure that human-friendly 3D content and displays will benefit from recent technical advances.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The proposed framework first localizes the moving-sounding objects through multimodal analysis and generate an audio attention map, in which greater value denotes higher possibility of a position being the sound source, and calculates the spatial and temporal attention maps using only the visual modality.
Abstract: In this paper, we propose to predict human fixations by incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, while discarding all audio information. But in the real world, we human beings not only direct our gaze according to visual saliency but also may be attracted by some salient audio. Psychological experiments show that audio may have some influence on visual attention, and subjects tend to be attracted the sound sources. Therefore, we propose to fuse both audio and visual information to predict fixations. In our framework, we first localize the moving-sounding objects through multimodal analysis and generate an audio attention map, in which greater value denotes higher possibility of a position being the sound source. Then we calculate the spatial and temporal attention maps using only the visual modality. At last, the audio, spatial and temporal attention maps are fused, generating our final audio-visual saliency map. We gather a set of videos and collect eye-tracking data under audio-visual test conditions. Experiment results show that we can achieve better performance when considering both audio and visual cues.