scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2016"


Journal ArticleDOI
TL;DR: A new nonreference underwater image quality measure (UIQM) is presented, which comprises three underwater image attribute measures selected for evaluating one aspect of the underwater image degradation, and each presented attribute measure is inspired by the properties of human visual systems (HVSs).
Abstract: Underwater images suffer from blurring effects, low contrast, and grayed out colors due to the absorption and scattering effects under the water. Many image enhancement algorithms for improving the visual quality of underwater images have been developed. Unfortunately, no well-accepted objective measure exists that can evaluate the quality of underwater images similar to human perception. Predominant underwater image processing algorithms use either a subjective evaluation, which is time consuming and biased, or a generic image quality measure, which fails to consider the properties of underwater images. To address this problem, a new nonreference underwater image quality measure (UIQM) is presented in this paper. The UIQM comprises three underwater image attribute measures: the underwater image colorfulness measure (UICM), the underwater image sharpness measure (UISM), and the underwater image contrast measure (UIConM). Each attribute is selected for evaluating one aspect of the underwater image degradation, and each presented attribute measure is inspired by the properties of human visual systems (HVSs). The experimental results demonstrate that the measures effectively evaluate the underwater image quality in accordance with the human perceptions. These measures are also used on the AirAsia 8501 wreckage images to show their importance in practical applications.

671 citations


Journal ArticleDOI
TL;DR: It was shown that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams and provided an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.
Abstract: The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

600 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work presents a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects and applies recognition algorithms on the authors' predicted representation to anticipate objects and actions.
Abstract: Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We present a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. Visual representations are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute. We then apply recognition algorithms on our predicted representation to anticipate objects and actions. We experimentally validate this idea on two datasets, anticipating actions one second in the future and objects five seconds in the future.

434 citations


Journal ArticleDOI
TL;DR: In this article, a deep neural network-based approach to image quality assessment (IQA) is presented, which is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression.
Abstract: We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

365 citations


Journal ArticleDOI
TL;DR: A measure to evaluate the reliability of depth map, and use it to reduce the influence of poor depth map on saliency detection, and two saliency maps are integrated into a final saliency map through weighted-sum method according to their importance.
Abstract: Stereoscopic perception is an important part of human visual system that allows the brain to perceive depth. However, depth information has not been well explored in existing saliency detection models. In this letter, a novel saliency detection method for stereoscopic images is proposed. First, we propose a measure to evaluate the reliability of depth map, and use it to reduce the influence of poor depth map on saliency detection. Then, the input image is represented as a graph, and the depth information is introduced into graph construction. After that, a new definition of compactness using color and depth cues is put forward to compute the compactness saliency map. In order to compensate the detection errors of compactness saliency when the salient regions have similar appearances with background, foreground saliency map is calculated based on depth-refined foreground seeds' selection (DRSS) mechanism and multiple cues contrast. Finally, these two saliency maps are integrated into a final saliency map through weighted-sum method according to their importance. Experiments on two publicly available stereo data sets demonstrate that the proposed method performs better than other ten state-of-the-art approaches.

240 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed blind image blur evaluation algorithm can produce blur scores highly consistent with subjective evaluations and outperforms the state-of-the-art image blur metrics and several general-purpose no-reference quality metrics.
Abstract: Blur is a key determinant in the perception of image quality. Generally, blur causes spread of edges, which leads to shape changes in images. Discrete orthogonal moments have been widely studied as effective shape descriptors. Intuitively, blur can be represented using discrete moments since noticeable blur affects the magnitudes of moments of an image. With this consideration, this paper presents a blind image blur evaluation algorithm based on discrete Tchebichef moments. The gradient of a blurred image is first computed to account for the shape, which is more effective for blur representation. Then the gradient image is divided into equal-size blocks and the Tchebichef moments are calculated to characterize image shape. The energy of a block is computed as the sum of squared non-DC moment values. Finally, the proposed image blur score is defined as the variance-normalized moment energy, which is computed with the guidance of a visual saliency model to adapt to the characteristic of human visual system. The performance of the proposed method is evaluated on four public image quality databases. The experimental results demonstrate that our method can produce blur scores highly consistent with subjective evaluations. It also outperforms the state-of-the-art image blur metrics and several general-purpose no-reference quality metrics.

239 citations


Journal ArticleDOI
TL;DR: This work shows by combining a novel method (minimal images) and simulations that the human recognition system uses features and learning processes, which are critical for recognition, but are not used by current models.
Abstract: Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.

161 citations


Journal ArticleDOI
TL;DR: This work benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking to demonstrate that shallow nets can outperform deep nets and humans when variations are weak.
Abstract: Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations.

161 citations


Journal ArticleDOI
TL;DR: This study presents a robust block-based image watermarking scheme based on the singular value decomposition (SVD) and human visual system in the discrete wavelet transform (DWT) domain that outperformed several previous schemes in terms of imperceptibility and robustness.
Abstract: Digital watermarking has been suggested as a way to achieve digital protection. The aim of digital watermarking is to insert the secret data into the image without significantly affecting the visual quality. This study presents a robust block-based image watermarking scheme based on the singular value decomposition (SVD) and human visual system in the discrete wavelet transform (DWT) domain. The proposed method is considered to be a block-based scheme that utilises the entropy and edge entropy as HVS characteristics for the selection of significant blocks to embed the watermark, which is a binary watermark logo. The blocks of the lowest entropy values and edge entropy values are selected as the best regions to insert the watermark. After the first level of DWT decomposition, the SVD is performed on the low-low sub-band to modify several elements in its U matrix according to predefined conditions. The experimental results of the proposed scheme showed high imperceptibility and high robustness against all image processing attacks and several geometrical attacks using examples of standard and real images. Furthermore, the proposed scheme outperformed several previous schemes in terms of imperceptibility and robustness. The security issue is improved by encrypting a portion of the important information using Advanced Standard Encryption a key size of 192-bits (AES-192).

160 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed index provides comparable or better quality predictions than the most recent and competing state-of-the-art IQA metrics in the literature, it is reliable and has low complexity.
Abstract: Applications of perceptual image quality assessment (IQA) in image and video processing, such as image acquisition, image compression, image restoration, and multimedia communication, have led to the development of many IQA metrics. In this paper, a reliable full reference IQA model is proposed that utilize gradient similarity (GS), chromaticity similarity (CS), and deviation pooling (DP). By considering the shortcomings of the commonly used GS to model the human visual system (HVS), a new GS is proposed through a fusion technique that is more likely to follow HVS. We propose an efficient and effective formulation to calculate the joint similarity map of two chromatic channels for the purpose of measuring color changes. In comparison with a commonly used formulation in the literature, the proposed CS map is shown to be more efficient and provide comparable or better quality predictions. Motivated by a recent work that utilizes the standard DP, a general formulation of the DP is presented in this paper and used to compute a final score from the proposed GS and CS maps. This proposed formulation of DP benefits from the Minkowski pooling and a proposed power pooling as well. The experimental results on six data sets of natural images, a synthetic data set, and a digitally retouched dataset show that the proposed index provides comparable or better quality predictions than the most recent and competing state-of-the-art IQA metrics in the literature, it is reliable and has low complexity. The MATLAB source code of the proposed metric is available at https://dl.dropboxusercontent.com/u/74505502/MDSI.m .

120 citations


Journal ArticleDOI
TL;DR: The difference of Gabor (DoGb) filters is proposed and improved (IDoGb), which is an extension of DoG but is sensitive to orientations and can better suppress the complex background edges, then achieves a lower false alarm rate.
Abstract: Infrared (IR) small target detection with high detection rate, low false alarm rate, and multiscale detection ability is a challenging task since raw IR images usually have low contrast and complex background. In recent years, robust human visual system (HVS) properties have been introduced into the IR small target detection field. However, existing algorithms based on HVS, such as difference of Gaussians (DoG) filters, are sensitive to not only real small targets but also background edges, which results in a high false alarm rate. In this letter, the difference of Gabor (DoGb) filters is proposed and improved (IDoGb), which is an extension of DoG but is sensitive to orientations and can better suppress the complex background edges, then achieves a lower false alarm rate. In addition, multiscale detection can be also achieved. Experimental results show that the IDoGb filter produces less false alarms at the same detection rate, while consuming only about 0.1 s for a single frame.

Journal ArticleDOI
TL;DR: A novel image steganography algorithm that combines the strengths of edge detection and XOR coding, to conceal a secret message either in the spatial domain or an Integer Wavelet Transform (IWT) based transform domain of the cover image is presented.
Abstract: A method for hiding data in the spatial or IWT domains of images is proposed.Design new edge detection method to estimate same edge intensities for both images.XOR operation is used to embed the message and to improve imperceptibility.Proposed method is robust against textural feature steganalysis. In this paper, we present a novel image steganography algorithm that combines the strengths of edge detection and XOR coding, to conceal a secret message either in the spatial domain or an Integer Wavelet Transform (IWT) based transform domain of the cover image. Edge detection enables the identification of sharp edges in the cover image that when embedding in would cause less degradation to the image quality compared to embedding in a pre-specified set of pixels that do not differentiate between sharp and smooth areas. This is motivated by the fact that the human visual system (HVS) is less sensitive to changes in sharp contrast areas compared to uniform areas of the image. The edge detection method presented here is capable of estimating the exact edge intensities for both the cover and stego images (before and after embedding the message), which is essential when extracting the message. The XOR coding, on the other hand, is a simple, yet effective, process that helps in reducing differences between the cover and stego images. In order to embed three secret message bits, the algorithm requires four bits of the cover image, but due to the coding mechanism, no more than two of the four bits will be changed when producing the stego image. The proposed method utilizes the sharpest regions of the image first and then gradually moves to the less sharp regions. Experimental results demonstrate that the proposed method has achieved better imperceptibility results than other popular steganography methods. Furthermore, when applying a textural feature steganalytic algorithm to differentiate between cover and stego images produced using various embedding rates, the proposed method maintained a good level of security compared to other steganography methods.

Posted Content
TL;DR: Inspired by the human visual system, low-level motion-based grouping cues can be used to learn an effective visual representation that significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.
Abstract: This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.

Journal ArticleDOI
TL;DR: Multivariate decoding of magneto-encephalography data is used to characterize the neural underpinnings of attentional selection in natural scenes with high temporal precision and shows that brain activity quickly tracks the presence of objects in scenes, but crucially only for those objects that were immediately relevant for the participant.
Abstract: The human visual system can only represent a small subset of the many objects present in cluttered scenes at any given time, such that objects compete for representation. Despite these processing limitations, the detection of object categories in cluttered natural scenes is remarkably rapid. How does the brain efficiently select goal-relevant objects from cluttered scenes? In the present study, we used multivariate decoding of magneto-encephalography (MEG) data to track the neural representation of within-scene objects as a function of top-down attentional set. Participants detected categorical targets (cars or people) in natural scenes. The presence of these categories within a scene was decoded from MEG sensor patterns by training linear classifiers on differentiating cars and people in isolation and testing these classifiers on scenes containing one of the two categories. The presence of a specific category in a scene could be reliably decoded from MEG response patterns as early as 160 ms, despite substantial scene clutter and variation in the visual appearance of each category. Strikingly, we find that these early categorical representations fully depend on the match between visual input and top-down attentional set: only objects that matched the current attentional set were processed to the category level within the first 200 ms after scene onset. A sensor-space searchlight analysis revealed that this early attention bias was localized to lateral occipitotemporal cortex, reflecting top-down modulation of visual processing. These results show that attention quickly resolves competition between objects in cluttered natural scenes, allowing for the rapid neural representation of goal-relevant objects. SIGNIFICANCE STATEMENT: Efficient attentional selection is crucial in many everyday situations. For example, when driving a car, we need to quickly detect obstacles, such as pedestrians crossing the street, while ignoring irrelevant objects. How can humans efficiently perform such tasks, given the multitude of objects contained in real-world scenes? Here we used multivariate decoding of magnetoencephalogaphy data to characterize the neural underpinnings of attentional selection in natural scenes with high temporal precision. We show that brain activity quickly tracks the presence of objects in scenes, but crucially only for those objects that were immediately relevant for the participant. These results provide evidence for fast and efficient attentional selection that mediates the rapid detection of goal-relevant objects in real-world environments.Copyright © 2016 the authors 0270-6474/16/3610522-07$15.00/0. Language: en

Journal ArticleDOI
TL;DR: An effective infrared and visible image fusion scheme in nonsubsampled contourlet transform (NSCT) domain, in which the NSCT is firstly employed to decompose each of the source images into a series of high frequency subbands and one low frequency subband.

Journal ArticleDOI
Sung-Ho Bae1, Munchurl Kim1
TL;DR: Extensive experimental results show that both SC-QI and SC-DM can very well characterize the HVS's properties of visual quality perception for local image characteristics and various distortion types, which is a distinctive merit of these methods compared with other IQA methods.
Abstract: Computational models for image quality assessment (IQA) have been developed by exploring effective features that are consistent with the characteristics of a human visual system (HVS) for visual quality perception. In this paper, we first reveal that many existing features used in computational IQA methods can hardly characterize visual quality perception for local image characteristics and various distortion types. To solve this problem, we propose a new IQA method, called the structural contrast-quality index (SC-QI), by adopting a structural contrast index (SCI), which can well characterize local and global visual quality perceptions for various image characteristics with structural-distortion types. In addition to SCI, we devise some other perceptually important features for our SC-QI that can effectively reflect the characteristics of HVS for contrast sensitivity and chrominance component variation. Furthermore, we develop a modified SC-QI, called structural contrast distortion metric (SC-DM), which inherits desirable mathematical properties of valid distance metricability and quasi-convexity. So, it can effectively be used as a distance metric for image quality optimization problems. Extensive experimental results show that both SC-QI and SC-DM can very well characterize the HVS’s properties of visual quality perception for local image characteristics and various distortion types, which is a distinctive merit of our methods compared with other IQA methods. As a result, both SC-QI and SC-DM have better performances with a strong consilience of global and local visual quality perception as well as with much lower computation complexity, compared with the state-of-the-art IQA methods. The MATLAB source codes of the proposed SC-QI and SC-DM are publicly available online at https://sites.google.com/site/sunghobaecv/iqa .

Journal ArticleDOI
TL;DR: Experimental results on five public databases demonstrate that the proposed RR IQA method has performance consistent with the human perception under a small amount of reference data (only 9 values).

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed method is superior in detection rate, false alarm rate, and processing time compared with the contrast algorithms, and it is an efficient method for IR small target detection in a complex background.
Abstract: Robust and efficient detection of an infrared (IR) small target is very important in the IR search and track system. Based on the contrast mechanism of the human visual system, an IR small target detection method with high detection rate, low false alarm rate, and short processing time is proposed in this letter. This method consists of two stages. At the first stage, with the top-hat filter and an adaptive threshold operation based on the constant false alarm rate applied to the original image, the suspicious target regions are obtained. In this way, the computing time of the following steps would be reduced a lot; meanwhile, the desired and predictable detection probability with the constant false alarm probability is maintained. At the second stage, we first define a new efficient local contrast measure between the target and the background, and the local self-similarity of an image is introduced to calculate the local saliency map. With the combination of the local self-similarity and local contrast, an efficient saliency map is obtained, which cannot only increase the signal-to-clutter ratio but also suppress residual clutter simultaneously. Then, a simple threshold operation on the saliency map is used to get the true targets. Experimental results indicate that the proposed method is superior in detection rate, false alarm rate, and processing time compared with the contrast algorithms, and it is an efficient method for IR small target detection in a complex background.

Journal ArticleDOI
22 Jun 2016
TL;DR: This work proposes an algorithm that only shades visible features of the image while cost‐effectively interpolating the remaining features without affecting perceived quality, and introduces a sampling scheme that incorporates multiple aspects of the human visual system: acuity, eye motion, contrast, and brightness adaptation.
Abstract: With ever-increasing display resolution for wide field-of-view displays---such as head-mounted displays or 8k projectors---shading has become the major computational cost in rasterization. To reduce computational effort, we propose an algorithm that only shades visible features of the image while cost-effectively interpolating the remaining features without affecting perceived quality. In contrast to previous approaches we do not only simulate acuity falloff but also introduce a sampling scheme that incorporates multiple aspects of the human visual system: acuity, eye motion, contrast (stemming from geometry, material or lighting properties), and brightness adaptation. Our sampling scheme is incorporated into a deferred shading pipeline to shade the image's perceptually relevant fragments while a pull-push algorithm interpolates the radiance for the rest of the image. Our approach does not impose any restrictions on the performed shading. We conduct a number of psycho-visual experiments to validate scene- and task-independence of our approach. The number of fragments that need to be shaded is reduced by 50% to 80%. Our algorithm scales favorably with increasing resolution and field-of-view, rendering it well-suited for head-mounted displays and wide-field-of-view projection.

Journal ArticleDOI
TL;DR: An effective method to evaluate the quality of stereoscopic images that are afflicted by symmetric distortions is proposed and a new 3D saliency map is developed, which not only greatly reduces the computational complexity by avoiding calculation of the depth information, but also assigns appropriate weights to the image contents.

Journal ArticleDOI
Shaoze Wang1, Kai Jin1, Haitong Lu1, Chuming Cheng, Juan Ye1, Dahong Qian1 
TL;DR: The experimental results revealed that the generic overall quality classification achieved a sensitivity of 87.45% at a specificity of 91.66%, with an area under the ROC curve of 0.9452, indicating the value of applying the algorithm, which is based on the human vision system, to assess the image quality of non-mydriatic photography, especially for low-cost ophthalmological telemedicine applications.
Abstract: Telemedicine and the medical “big data” era in ophthalmology highlight the use of non-mydriatic ocular fundus photography, which has given rise to indispensable applications of portable fundus cameras. However, in the case of portable fundus photography, non-mydriatic image quality is more vulnerable to distortions, such as uneven illumination, color distortion, blur, and low contrast. Such distortions are called generic quality distortions. This paper proposes an algorithm capable of selecting images of fair generic quality that would be especially useful to assist inexperienced individuals in collecting meaningful and interpretable data with consistency. The algorithm is based on three characteristics of the human visual system—multi-channel sensation, just noticeable blur, and the contrast sensitivity function to detect illumination and color distortion, blur, and low contrast distortion, respectively. A total of 536 retinal images, 280 from proprietary databases and 256 from public databases, were graded independently by one senior and two junior ophthalmologists, such that three partial measures of quality and generic overall quality were classified into two categories. Binary classification was implemented by the support vector machine and the decision tree, and receiver operating characteristic (ROC) curves were obtained and plotted to analyze the performance of the proposed algorithm. The experimental results revealed that the generic overall quality classification achieved a sensitivity of 87.45% at a specificity of 91.66%, with an area under the ROC curve of 0.9452, indicating the value of applying the algorithm, which is based on the human vision system, to assess the image quality of non-mydriatic photography, especially for low-cost ophthalmological telemedicine applications.

Proceedings ArticleDOI
12 Sep 2016
TL;DR: CueSee is presented, an augmented reality application on a head-mounted display (HMD) that facilitates product search by recognizing the product automatically and using visual cues to direct the user's attention to the product.
Abstract: Visual search is a major challenge for low vision people. Conventional vision enhancements like magnification help low vision people see more details, but cannot indicate the location of a target in a visual search task. In this paper, we explore visual cues---a new approach to facilitate visual search tasks for low vision people. We focus on product search and present CueSee, an augmented reality application on a head-mounted display (HMD) that facilitates product search by recognizing the product automatically and using visual cues to direct the user's attention to the product. We designed five visual cues that users can combine to suit their visual condition. We evaluated the visual cues with 12 low vision participants and found that participants preferred using our cues to conventional enhancements for product search. We also found that CueSee outperformed participants' best-corrected vision in both time and accuracy.

Journal ArticleDOI
TL;DR: Because the ssVEP technique can be readily accommodated to studying the viewing of complex scenes with multiple elements, it enables researchers to advance theoretical models of socioemotional perception, based on complex, quasinaturalistic viewing situations.
Abstract: Like many other primates, humans place a high premium on social information transmission and processing. One important aspect of this information concerns the emotional state of other individuals, conveyed by distinct visual cues such as facial expressions, overt actions, or by cues extracted from the situational context. A rich body of theoretical and empirical work has demonstrated that these socioemotional cues are processed by the human visual system in a prioritized fashion, in the service of optimizing social behavior. Furthermore, socioemotional perception is highly dependent on situational contexts and previous experience. Here, we review current issues in this area of research and discuss the utility of the steady-state visual evoked potential (ssVEP) technique for addressing key empirical questions. Methodological advantages and caveats are discussed with particular regard to quantifying time-varying competition among multiple perceptual objects, trial-by-trial analysis of visual cortical activation, functional connectivity, and the control of low-level stimulus features. Studies on facial expression and emotional scene processing are summarized, with an emphasis on viewing faces and other social cues in emotional contexts, or when competing with each other. Further, because the ssVEP technique can be readily accommodated to studying the viewing of complex scenes with multiple elements, it enables researchers to advance theoretical models of socioemotional perception, based on complex, quasinaturalistic viewing situations.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper proposes to use human fixation data to train a top-down saliency model that predicts relevant image locations when searching for specific objects and shows that the learned model can successfully prune bounding box proposals without rejecting the ground truth object locations.
Abstract: One of the central tasks for a household robot is searching for specific objects. It does not only require localizing the target object but also identifying promising search locations in the scene if the target is not immediately visible. As computation time and hardware resources are usually limited in robotics, it is desirable to avoid expensive visual processing steps that are exhaustively applied over the entire image. The human visual system can quickly select those image locations that have to be processed in detail for a given task. This allows us to cope with huge amounts of information and to efficiently deploy the limited capacities of our visual system. In this paper, we therefore propose to use human fixation data to train a top-down saliency model that predicts relevant image locations when searching for specific objects. We show that the learned model can successfully prune bounding box proposals without rejecting the ground truth object locations. In this aspect, the proposed model outperforms a model that is trained only on the ground truth segmentations of the target object instead of fixation data.

01 Jan 2016
TL;DR: This visual intelligence perception image and manipulation in visual communication, where people cope with some malicious virus inside their computer instead of reading a good book with a cup of tea in the afternoon, and end up in infectious downloads.
Abstract: edexcel igcse physics text answers, motor speech disorders diagnosis and treatment, zetor tractor manual pdf, achieving an aids transition preventing infections to sustain treatment, presario cq62 manual, craftsman leaf blower manual pdf, prepare for war rebecca brown, ecosystem sustainability and global change oceanography and marine biology series seas and oceans, chevrolet tracker 2015 manual, mercedes w204 om651 repair manual, civic learning through agricultural improvement bringing the loom and the anvil into proximity with the plow studies in the history of education, maternal newborn nursing care best evidence based practices, datadriven marketing the 15 metrics everyone in marketing should know, introductory english grammar stageberg, goldstein classical mechanics solutions manual, manual for refrigeration service technicians, cnml review course 2014, solution manual engineering economy 7th edition leland scribd, calculus third edition robert smith roland minton, ford explorer 2002 thru 2010 includes mercury mountineer haynes repair manual, advanced strength and applied elasticity solution manual 4th edition, i superorganism learning to love your inner ecosystem, mcculloch cs 370 operators manual, 2002 harley davidson dyna fxd models service manual set wide glide low rider super glide, 1995 toyota mr2 system wiring diagrams, virtual enterprises and collaborative networks ifip 18th world computer congress tc5wg55 5th working conference on virtual enterprises 22 27 in information and communication technology, 84 1201 carolinas solution preparation manual 134222, 1st grade math textbook 100 lessons 420 pages printed bw curriculum for homeschooling or classroom, tissue type plasminogen activity volume i t pa physiological and clinical aspects, space and time in language and literature, only nangi girl photo, assisting with patient care 2e, table tennis coaching manual

Journal ArticleDOI
TL;DR: A deep C-S inference network is constructed and Experimental results show that in accordance with different inputs, the network can learn distinct basic features for saliency modeling in its code layer, and in a comprehensive evaluation on several benchmark data sets, the proposed method can outperform the existing state-of-the-art algorithms.
Abstract: Research on visual perception indicates that the human visual system is sensitive to center–surround (C–S) contrast in the bottom–up saliency-driven attention process. Different from the traditional contrast computation of feature difference, models based on reconstruction have emerged to estimate saliency by starting from original images themselves instead of seeking for certain ad hoc features. However, in the existing reconstruction-based methods, the reconstruction parameters of each area are calculated independently without taking their global correlation into account. In this paper, inspired by the powerful feature learning and data reconstruction ability of deep autoencoders, we construct a deep C–S inference network and train it with the data sampled randomly from the entire image to obtain a unified reconstruction pattern for the current image. In this way, global competition in sampling and learning processes can be integrated into the nonlocal reconstruction and saliency estimation of each pixel, which can achieve better detection results than the models with separate consideration on local and global rarity. Moreover, by learning from the current scene, the proposed model can achieve the feature extraction and interaction simultaneously in an adaptive way, which can form a better generalization ability to handle more types of stimuli. Experimental results show that in accordance with different inputs, the network can learn distinct basic features for saliency modeling in its code layer. Furthermore, in a comprehensive evaluation on several benchmark data sets, the proposed method can outperform the existing state-of-the-art algorithms.

Proceedings Article
09 Jul 2016
TL;DR: One of the earliest efforts to bridge saliency detection to WOD via the self-paced curriculum learning, which can guide the learning procedure to gradually achieve faithful knowledge of multi-class objects from easy to hard is made.
Abstract: Weakly-supervised object detection (WOD) is a challenging problems in computer vision. The key problem is to simultaneously infer the exact object locations in the training images and train the object detectors, given only the training images with weak image-level labels. Intuitively, by simulating the selective attention mechanism of human visual system, saliency detection technique can select attractive objects in scenes and thus is a potential way to provide useful priors for WOD. However, the way to adopt saliency detection in WOD is not trivial since the detected saliency region might be possibly highly ambiguous in complex cases. To this end, this paper first comprehensively analyzes the challenges in applying saliency detection to WOD. Then, we make one of the earliest efforts to bridge saliency detection to WOD via the self-paced curriculum learning, which can guide the learning procedure to gradually achieve faithful knowledge of multi-class objects from easy to hard. The experimental results demonstrate that the proposed approach can successfully bridge saliency detection and WOD tasks and achieve the state-of-the-art object detection results under the weak supervision.

Journal ArticleDOI
Jie Li1, Lian Zou1, Jia Yan1, Dexiang Deng1, Tao Qu1, Guihui Xie1 
TL;DR: The convolutional neural network is introduced into the no-reference image quality assessment and the Prewitt magnitude of segmented images is combined to obtain the image quality score using the mean of all the products of the image patch scores and weights based on the result of segmenting images.
Abstract: No-reference image quality assessment is of great importance to numerous image processing applications, and various methods have been widely studied with promising results. These methods exploit handcrafted features in the transformation or space domain that are discriminated for image degradations. However, abundant a priori knowledge is required to extract these handcrafted features. The convolutional neural network (CNN) is recently introduced into the no-reference image quality assessment, which integrates feature learning and regression into one optimization process. Therefore, the network structure generates an effective model for estimating image quality. However, the image quality score obtained by the CNN is based on the mean of all of the image patch scores without considering the human visual system, such as edges and contour of images. In this paper, we combine the CNN and the Prewitt magnitude of segmented images and obtain the image quality score using the mean of all the products of the image patch scores and weights based on the result of segmented images. Experimental results on various image distortion types demonstrate that the proposed algorithm achieves good performance.

Journal ArticleDOI
TL;DR: The proposed JND model can outperform the conventional JND guided compression schemes by providing better visual quality at the same coding bits and outperforms the state-of-the-art schemes in terms of the distortion masking ability.
Abstract: We propose a novel just noticeable difference (JND) model for a screen content image (SCI). The distinct properties of the SCI result in different behaviors of the human visual system when viewing the textual content, which motivate us to employ a local parametric edge model with an adaptive representation of the edge profile in JND modeling. In particular, we decompose each edge profile into its luminance, contrast, and structure, and then evaluate the visibility threshold in different ways. The edge luminance adaptation, contrast masking, and structural distortion sensitivity are studied in subjective experiments, and the final JND model is established based on the edge profile reconstruction with tolerable variations. Extensive experiments are conducted to verify the proposed JND model, which confirm that it is accurate in predicting the JND profile, and outperforms the state-of-the-art schemes in terms of the distortion masking ability. Furthermore, we explore the applicability of the proposed JND model in the scenario of perceptually lossless SCI compression, and experimental results show that the proposed scheme can outperform the conventional JND guided compression schemes by providing better visual quality at the same coding bits.

Book ChapterDOI
17 Oct 2016
TL;DR: This work solves the IQA problem using the principles behind the working of the HVS and proposes a novel algorithm that calculates saliency values for every image pixel at multiple scales to capture global and local image information.
Abstract: Retinal image quality assessment (IQA) algorithms use different hand crafted features without considering the important role of the human visual system (HVS). We solve the IQA problem using the principles behind the working of the HVS. Unsupervised information from local saliency maps and supervised information from trained convolutional neural networks (CNNs) are combined to make a final decision on image quality. A novel algorithm is proposed that calculates saliency values for every image pixel at multiple scales to capture global and local image information. This extracts generalized image information in an unsupervised manner while CNNs provide a principled approach to feature learning without the need to define hand-crafted features. The individual classification decisions are fused by weighting them according to their confidence scores. Experimental results on real datasets demonstrate the superior performance of our proposed algorithm over competing methods.