scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2018"


Proceedings ArticleDOI
25 Apr 2018
TL;DR: This work proposes to extend an object recognition system with an attention based few-shot classification weight generator, and to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors.
Abstract: The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on "unseen" categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results.

1,082 citations


01 Jan 2018
TL;DR: It is argued that understanding cognitive processes will need to consider the (inter)actions in the natural environment and that in cognitive tasks some independent components systematically relate to sensory processing as well as to action execution.
Abstract: In this presentation, we discuss embodied cognition in the human brain from perspectives of spatial cognition, sensorimotor processing, face processing, and mobile EEG recordings. The argument is based upon experimental evidence gathered from five separate studies. First, we focus on spatial representations and demonstrate that, given time pressure, information on the spatial orientation of houses, independent of a participant's own location, is best retrieved when it directly relates to potential actions. Thus providing evidence that even spatial representations code information in a manner directly related to the action. Next, we discuss the concept of representations as such. Using the example of face processing in the human visual system, we argue that the concept of representations should be confined to cases where neuronal activity contains explicit information on the variable of interest and, in turn, that this variable explains the complete part of the explainable variance, i.e. reaches the noise limit. Next, to push towards an investigation of cognition under natural conditions we present a benchmark test of mobile and research-grade EEG systems. Specifically, we demonstrate that the variance over systems contributes a significant part to the total variance of recorded event related potentials. As a next step, using Independent Component Analysis of EEG data we demonstrate that in cognitive tasks some independent components systematically relate to sensory processing as well as to action execution. This supports theories of the common coding theory and, thus, a mechanistic part of the embodied cognition framework. Finally, we demonstrate a real world application investigating face processing in the form of the N170 event related potential during natural visual exploration in a fully mobile setup. This technique allows investigating the physiological basis of cognitive processes under real world conditions. In this presentation we argue that understanding cognitive processes will need to consider the (inter)actions in the natural environment.

555 citations


Journal ArticleDOI
TL;DR: A deep neural network-based approach to image quality assessment (IQA) that allows for joint learning of local quality and local weights in an unified framework and shows a high ability to generalize between different databases, indicating a high robustness of the learned features.
Abstract: We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

479 citations


Posted Content
TL;DR: The robustness of humans and current convolutional deep neural networks on object recognition under twelve different types of image degradations is compared and it is shown that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on.
Abstract: We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.

351 citations


Book ChapterDOI
14 Apr 2018
TL;DR: In this paper, a two-player turn-based stochastic game is formulated to generate adversarial examples, where the first player's objective is to minimize the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random.
Abstract: Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player’s objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can converge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.

213 citations


Proceedings Article
01 Jan 2018
TL;DR: It is found that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
Abstract: Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

202 citations


Journal ArticleDOI
TL;DR: The Haar wavelet-based perceptual similarity index (HaarPSI) as discussed by the authors was proposed to assess local similarities between two images, as well as the relative importance of image areas.
Abstract: In most practical situations, the compression or transmission of images and videos creates distortions that will eventually be perceived by a human observer. Vice versa, image and video restoration techniques, such as inpainting or denoising, aim to enhance the quality of experience of human viewers. Correctly assessing the similarity between an image and an undistorted reference image as subjectively experienced by a human viewer can thus lead to significant improvements in any transmission, compression, or restoration system. This paper introduces the Haar wavelet-based perceptual similarity index (HaarPSI), a novel and computationally inexpensive similarity measure for full reference image quality assessment. The HaarPSI utilizes the coefficients obtained from a Haar wavelet decomposition to assess local similarities between two images, as well as the relative importance of image areas. The consistency of the HaarPSI with the human quality of experience was validated on four large benchmark databases containing thousands of differently distorted images. On these databases, the HaarPSI achieves higher correlations with human opinion scores than state-of-the-art full reference similarity measures like the structural similarity index (SSIM), the feature similarity index (FSIM), and the visual saliency-based index (VSI). Along with the simple computational structure and the short execution time, these experimental results suggest a high applicability of the HaarPSI in real world tasks.

193 citations


Reference EntryDOI
Frank Tong1
23 Mar 2018

192 citations


Journal ArticleDOI
TL;DR: The proposed DSCLSTM model can significantly boost the saliency detection performance by incorporating both global spatial interconnections and scene context modulation, which may uncover novel inspirations for studies on them in computational saliency models.
Abstract: Traditional saliency models usually adopt hand-crafted image features and human-designed mechanisms to calculate local or global contrast. In this paper, we propose a novel computational saliency model, i.e., deep spatial contextual long-term recurrent convolutional network (DSCLRCN), to predict where people look in natural scenes. DSCLRCN first automatically learns saliency related local features on each image location in parallel. Then, in contrast with most other deep network based saliency models which infer saliency in local contexts, DSCLRCN can mimic the cortical lateral inhibition mechanisms in human visual system to incorporate global contexts to assess the saliency of each image location by leveraging the deep spatial long short-term memory (DSLSTM) model. Moreover, we also integrate scene context modulation in DSLSTM for saliency inference, leading to a novel deep spatial contextual LSTM (DSCLSTM) model. The whole network can be trained end-to-end and works efficiently when testing. Experimental results on two benchmark datasets show that DSCLRCN can achieve state-of-the-art performance on saliency detection. Furthermore, the proposed DSCLSTM model can significantly boost the saliency detection performance by incorporating both global spatial interconnections and scene context modulation, which may uncover novel inspirations for studies on them in computational saliency models.

179 citations


Proceedings ArticleDOI
03 Dec 2018
TL;DR: In this article, the authors compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations.
Abstract: We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.

169 citations


Journal ArticleDOI
TL;DR: A survey on various image compression techniques, their limitations, compression rates and highlights current research in medical image compression is provided.

Proceedings ArticleDOI
06 Jun 2018
TL;DR: In this paper, a large-scale dataset is proposed to predict perceptual image error like human observers, and a deep learning model is trained using pairwise learning to predict the preference of one distorted image over another.
Abstract: The ability to estimate the perceptual error between images is an important problem in computer vision with many applications. Although it has been studied extensively, however, no method currently exists that can robustly predict visual differences like humans. Some previous approaches used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine learning to train models on human-labeled datasets, but creating large, high-quality datasets is difficult because people are unable to assign consistent error labels to distorted images. In this paper, we present a new learning-based method that is the first to predict perceptual image error like human observers. Since it is much easier for people to compare two given images and identify the one more similar to a reference than to assign quality scores to each, we propose a new, large-scale dataset labeled with the probability that humans will prefer one image over another. We then train a deep-learning model using a novel, pairwise-learning framework to predict the preference of one distorted image over the other. Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels. The perceptual error estimated by our new metric, PieAPP, is well-correlated with human opinion. Furthermore, it significantly outperforms existing algorithms, beating the state-of-the-art by almost 3A— on our test set in terms of binary error rate, while also generalizing to new kinds of distortions, unlike previous learning-based methods.

Journal ArticleDOI
TL;DR: The feature fusion algorithm is applied to the dictionary training procedure to finalize the robust model, which outperforms compared with the other state-of-the-art algorithms.
Abstract: In recent years, the analysis of natural image has made great progress while the image of the intrinsic component analysis can solve many computer vision problems, such as the image shadow detection and removal. This paper presents the novel model, which integrates the feature fusion and the multiple dictionary learning. Traditional model can hardly handle the challenge of reserving the removal accuracy while keeping the low time consuming. Inspire by the compressive sensing theory, traditional single dictionary scenario is extended to the multiple condition. The human visual system is more sensitive to the high frequency part of the image, and the high frequency part expresses most of the semantic information of the image. At the same time, the high frequency characteristic of the high and low resolution image is adopted in the dictionary training, which can effectively recover the loss in the high resolution image with high frequency information. This paper presents the integration of compressive sensing model with feature extraction to construct the two-stage methodology. Therefore, the feature fusion algorithm is applied to the dictionary training procedure to finalize the robust model. Simulation results proves the effectiveness of the model, which outperforms compared with the other state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: The proposed method had an advantage over the compared methods: HMAX, Sparse Coding and Natural Input Memory with Bayesian Likelihood Estimation (NIMBLE), and was comparable to the Deep Convolutional Network.

Journal ArticleDOI
TL;DR: Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models.
Abstract: In this paper, an accurate and efficient full-reference image quality assessment (IQA) model using the extracted Gabor features, called Gabor feature-based model (GFM), is proposed for conducting objective evaluation of screen content images (SCIs). It is well-known that the Gabor filters are highly consistent with the response of the human visual system (HVS), and the HVS is highly sensitive to the edge information. Based on these facts, the imaginary part of the Gabor filter that has odd symmetry and yields edge detection is exploited to the luminance of the reference and distorted SCI for extracting their Gabor features, respectively. The local similarities of the extracted Gabor features and two chrominance components, recorded in the LMN color space, are then measured independently. Finally, the Gabor-feature pooling strategy is employed to combine these measurements and generate the final evaluation score. Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model not only yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models. 1 1 The source code for the proposed GFM will be available at http://smartviplab.org/pubilcations/GFM.html .

Journal ArticleDOI
TL;DR: This paper separates structures into global and local structures, which correspond to basic and detailed perceptions of humans, respectively, and systematically combines the measurements of variations in the above-stated two types of structures to yield the final quality estimation of screen content images.
Abstract: With the quick development and popularity of computers, computer-generated signals have drastically invaded into our daily lives. Screen content image is a typical example, since it also includes graphic and textual images as components as compared with natural scene images which have been deeply explored, and thus screen content image has posed novel challenges to current researches, such as compression, transmission, display, quality assessment, and more. In this paper, we focus our attention on evaluating the quality of screen content images based on the analysis of structural variation, which is caused by compression, transmission, and more. We classify structures into global and local structures, which correspond to basic and detailed perceptions of humans, respectively. The characteristics of graphic and textual images, e.g., limited color variations, and the human visual system are taken into consideration. Based on these concerns, we systematically combine the measurements of variations in the above-stated two types of structures to yield the final quality estimation of screen content images. Thorough experiments are conducted on three screen content image quality databases, in which the images are corrupted during capturing, compression, transmission, etc. Results demonstrate the superiority of our proposed quality model as compared with state-of-the-art relevant methods.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed NRLT can achieve better performance in predicting the visual quality of SCIs than relevant existing methods, even including some full reference visual quality assessment methods.
Abstract: In this paper, we propose a novel no reference quality assessment method by incorporating statistical luminance and texture features (NRLT) for screen content images (SCIs) with both local and global feature representation. The proposed method is designed inspired by the perceptual property of the human visual system (HVS) that the HVS is sensitive to luminance change and texture information for image perception. In the proposed method, we first calculate the luminance map through the local normalization, which is further used to extract the statistical luminance features in global scope. Second, inspired by existing studies from neuroscience that high-order derivatives can capture image texture, we adopt four filters with different directions to compute gradient maps from the luminance map. These gradient maps are then used to extract the second-order derivatives by local binary pattern. We further extract the texture feature by the histogram of high-order derivatives in global scope. Finally, support vector regression is applied to train the mapping function from quality-aware features to subjective ratings. Experimental results on the public large-scale SCI database show that the proposed NRLT can achieve better performance in predicting the visual quality of SCIs than relevant existing methods, even including some full reference visual quality assessment methods.

Proceedings ArticleDOI
Da Pan1, Ping Shi1, Ming Hou1, Zefeng Ying1, Sizhe Fu1, Yuan Zhang1 
18 Jun 2018
TL;DR: Li et al. as discussed by the authors proposed a simple and efficient blind image quality assessment model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem.
Abstract: A key problem in blind image quality assessment (BIQA) is how to effectively model the properties of human visual system in a data-driven manner. In this paper, we propose a simple and efficient BIQA model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem. In principle, FCNN is capable of predicting a pixel-by-pixel similar quality map only from a distorted image by using the intermediate similarity maps derived from conventional full-reference image quality assessment methods. The predicted pixel-by-pixel quality maps have good consistency with the distortion correlations between the reference and distorted images. Finally, a deep pooling network regresses the quality map into a score. Experiments have demonstrated that our predictions outperform many state-of-the-art BIQA methods.

Posted Content
TL;DR: In this article, an attention-based few-shot classification weight generator was proposed to unify the recognition of both novel and base categories, which leads to feature representations that generalize better on unseen categories.
Abstract: The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on "unseen" categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results. The code and models of our paper will be published on: this https URL

Posted Content
TL;DR: A novel CNN architecture named as ISGAN is proposed to conceal a secret gray image into a color cover image on the sender side and exactly extract the secret image out on the receiver side and can achieve start-of-art performances on LFW, PASCAL-VOC12 and ImageNet datasets.
Abstract: Nowadays, there are plenty of works introducing convolutional neural networks (CNNs) to the steganalysis and exceeding conventional steganalysis algorithms. These works have shown the improving potential of deep learning in information hiding domain. There are also several works based on deep learning to do image steganography, but these works still have problems in capacity, invisibility and security. In this paper, we propose a novel CNN architecture named as \isgan to conceal a secret gray image into a color cover image on the sender side and exactly extract the secret image out on the receiver side. There are three contributions in our work: (i) we improve the invisibility by hiding the secret image only in the Y channel of the cover image; (ii) We introduce the generative adversarial networks to strengthen the security by minimizing the divergence between the empirical probability distributions of stego images and natural images. (iii) In order to associate with the human visual system better, we construct a mixed loss function which is more appropriate for steganography to generate more realistic stego images and reveal out more better secret images. Experiment results show that ISGAN can achieve start-of-art performances on LFW, Pascal VOC2012 and ImageNet datasets.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem.
Abstract: Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement and other related tasks. While these two problems are apparently unrelated, the goal of this work is to show that they can be connected by a simple linear relationship. Specifically, most Retinex-based algorithms have the characteristic feature of always increasing image brightness, which turns them into ideal candidates for effective image dehazing by directly applying Retinex to a hazy image whose intensities have been inverted. In this paper, we give theoretical proof that Retinex on inverted intensities is a solution to the image dehazing problem. Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem.

Posted Content
TL;DR: A new learning-based method that is the first to predict perceptual image error like human observers, and significantly outperforms existing algorithms, beating the state-of-the-art by almost 3× on the authors' test set in terms of binary error rate, while also generalizing to new kinds of distortions, unlike previous learning- based methods.
Abstract: The ability to estimate the perceptual error between images is an important problem in computer vision with many applications. Although it has been studied extensively, however, no method currently exists that can robustly predict visual differences like humans. Some previous approaches used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine learning to train models on human-labeled datasets, but creating large, high-quality datasets is difficult because people are unable to assign consistent error labels to distorted images. In this paper, we present a new learning-based method that is the first to predict perceptual image error like human observers. Since it is much easier for people to compare two given images and identify the one more similar to a reference than to assign quality scores to each, we propose a new, large-scale dataset labeled with the probability that humans will prefer one image over another. We then train a deep-learning model using a novel, pairwise-learning framework to predict the preference of one distorted image over the other. Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels. The perceptual error estimated by our new metric, PieAPP, is well-correlated with human opinion. Furthermore, it significantly outperforms existing algorithms, beating the state-of-the-art by almost 3x on our test set in terms of binary error rate, while also generalizing to new kinds of distortions, unlike previous learning-based methods.

Posted Content
22 Feb 2018
TL;DR: It is found that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
Abstract: Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we create the first adversarial examples designed to fool humans, by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by modifying models to more closely match the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

Journal ArticleDOI
TL;DR: Quality assessment using both quantitative evaluations and user studies suggests that the presented algorithm produces tone-mapped images that are visually pleasant and preserve details of the original image better than the existing methods.
Abstract: High-dynamic-range (HDR) images require tone mapping to be displayed properly on lower dynamic range devices. In this paper, a tone-mapping algorithm that uses histogram of luminance to construct a lookup table (LUT) for tone mapping is presented. Characteristics of the human visual system (HVS) are used to give more importance to visually distinguishable intensities while constructing the histogram bins. The method begins with constructing a histogram of the luminance channel, using bins that are perceived to be uniformly spaced by the HVS. Next, a refinement step is used, which removes the pixels from the bins that are indistinguishable by the HVS. Finally, the available display levels are distributed among the bins proportionate to the pixels counts thus giving due consideration to the visual contribution of each bin in the image. Quality assessment using both quantitative evaluations and user studies suggests that the presented algorithm produces tone-mapped images that are visually pleasant and preserve details of the original image better than the existing methods. Finally, implementation details of the algorithm on GPU for parallel processing are presented, which could achieve a significant gain in speed over CPU-based implementation.

Journal ArticleDOI
TL;DR: Experimental results have shown that the proposed IQA model for the SCIs produces high consistency with human perception of the SCI quality and outperforms the state-of-the-art quality models.
Abstract: In this paper, a novel image quality assessment (IQA) model for the screen content image s (SCIs) is proposed by using multi-scale difference of Gaussian (MDOG). Motivated by the observation that the human visual system (HVS) is sensitive to the edges while the image details can be better explored in different scales, the proposed model exploits MDOG to effectively characterize the edge information of the reference and distorted SCIs at two different scales, respectively. Then, the degree of edge similarity is measured in terms of the smaller-scale edge map. Finally, the edge strength computed based on the larger-scale edge map is used as the weighting factor to generate the final SCI quality score. Experimental results have shown that the proposed IQA model for the SCIs produces high consistency with human perception of the SCI quality and outperforms the state-of-the-art quality models.

Journal ArticleDOI
TL;DR: The results show for the first time the laminar organization of internally generated signals during visual working memory in the human visual system and provide new insights into how bottom-up and top-down signals in visual cortex are deployed.

Journal ArticleDOI
TL;DR: Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art blind quality assessment methods and is comparable with the popular FR methods on two recently published tone-mapped image databases.
Abstract: Currently, many tone mapping operators (TMOs) have been provided to compress high dynamic range images to low dynamic range (LDR) images for visualizing them on the common displays. Since quality degradation is inevitably induced by compression, how to evaluate the obtained LDR images is indeed a headache problem. Until now, only a few full reference (FR) image quality assessment metrics have been proposed. However, they highly depend on reference image and neglect human visual system characteristics, hindering the practical applications. In this paper, we propose an effective blind quality assessment method of tone-mapped image without access to reference image. Inspired by that the performance of existing TMOs largely depend on the brightness and chromatic and structural properties of a scene, we evaluate the perceptual quality from the perspective of color information processing in the brain. Specifically, motivated by the physiological and psychological evidence, we simulate the responses of single-opponent (SO) and double-opponent (DO) cells, which play an important role in the processing of the color information. To represent the textural information, we extract three features from gray-level co-occurrence matrix (GLCM) calculated from SO responses. Meanwhile, both GLCM and local binary pattern descriptor are employed to extract texture and structure in the responses of DO cells. All these extracted features and associated subjective ratings are learned to reveal the connection between feature space and human opinion score. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art blind quality assessment methods and is comparable with the popular FR methods on two recently published tone-mapped image databases.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that in addition to achieving a great imperceptibility, large capacity and sufficient security, the proposed scheme obtains a satisfactory level of robustness against image processing and geometrical attacks, simultaneously.
Abstract: Imperceptibility, capacity, robustness and security are four basic requirements of any watermarking technique. A new blind image watermarking technique based on the redundant discrete wavelet transform (RDWT) and singular value decomposition (SVD) is presented in this paper to satisfy all of these four watermarking requirements simultaneously. The gray-scale watermark image is directly embedded into the singular values of RDWT sub-bands after multiplying by a scaling factor. The self-adaptive differential evolution (SADE) algorithm is used to optimize the scaling factor values with the aim of reaching the highest possible robustness while guaranteeing a pre-determined watermarked image quality. By the use of human visual system (HVS) characteristics, an 8-bits digital signature is inserted into the watermarked image to solve the false positive problem which is a prevalent security problem for the most SVD-based watermarking methods. The digital signature is used for verification test before initialization of the watermark extraction procedure. Also, considering the existing redundancy in the RDWT domain, the scheme attained a large amount of capacity. Experimental results demonstrate that in addition to achieving a great imperceptibility, large capacity and sufficient security, the proposed scheme obtains a satisfactory level of robustness against image processing and geometrical attacks, simultaneously.

Journal ArticleDOI
TL;DR: An efficient streaming method to compute voxel-level material arrangements is presented, achieving both realistic reproduction of measured translucent materials and artistic effects involving multiple fully or partially transparent geometries.
Abstract: We present an efficient and scalable pipeline for fabricating full-colored objects with spatially-varying translucency from practical and accessible input data via multi-material 3D printing. Observing that the costs associated with BSSRDF measurement and processing are high, the range of 3D printable BSSRDFs are severely limited, and that the human visual system relies only on simple high-level cues to perceive translucency, we propose a method based on reproducing perceptual translucency cues. The input to our pipeline is an RGBA signal defined on the surface of an object, making our approach accessible and practical for designers. We propose a framework for extending standard color management and profiling to combined color and translucency management using a gamut correspondence strategy we call opaque relative processing. We present an efficient streaming method to compute voxel-level material arrangements, achieving both realistic reproduction of measured translucent materials and artistic effects involving multiple fully or partially transparent geometries.

Journal ArticleDOI
TL;DR: Computational experiments show that embedding QR code is more effective than the other watermarks in terms of better information carrying capacity, robustness and imperceptibility and the proposed scheme is novel and effective.
Abstract: A digital image watermarking technique is proposed to hide the relevant information in color digital images. The image is converted from RGB color space to YCbCr color space. This enables the algorithm to exploit characteristics of the Human Visual System (HVS) for embedding the watermark. The scheme embeds the watermark information utilizing wavelets transforms and Singular Value Decomposition (SVD) for this purpose. It uses a Quick Response (QR) code as the watermark. The QR code is a robust code from which embedded information can be extracted even if the retrieved QR code image is distorted. Thus the proposed technique employs a judicious combination of different algorithmic ideas including altered YCbCr color space, transformation into wavelet domain, SVD for selection of places to embed and QR codes for enhanced robustness. The watermarking scheme proposed is robust against various signal processing attacks (e.g. filtering, compression, noise addition etc.) as well as geometric attacks (e.g. rotation, cropping etc.). Computational experiments on a variety of cover images show that embedding QR code is more effective than the other watermarks in terms of better information carrying capacity, robustness and imperceptibility. The proposed scheme is novel and effective as it simultaneously provides advantages of each of the individual elements combined in this approach.