scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2020"


Book
14 Feb 2020
TL;DR: This paper presents a meta-modelling framework for 3D Vision Applications that automates the very labor-intensive and therefore time-heavy and expensive process of 3D image processing.
Abstract: Preface. Acknowledgements. Notation and Abbreviations. Part I. 1 Introduction. 1.1 Stereo-pair Images and Depth Perception. 1.2 3D Vision Systems. 1.3 3D Vision Applications. 1.4 Contents Overview: The 3D Vision Task in Stages. 2 Brief History of Research on Vision. 2.1 Abstract. 2.2 Retrospective of Vision Research. 2.3 Closure. Part II. 3 2D and 3D Vision Formation. 3.1 Abstract. 3.2 Human Visual System. 3.3 Geometry and Acquisition of a Single Image. 3.4 Stereoscopic Acquisition Systems. 3.5 Stereo Matching Constraints. 3.6 Calibration of Cameras. 3.7 Practical Examples. 3.8 Appendix: Derivation of the Pin-hole Camera Transformation. 3.9 Closure. 4 Low-level Image Processing for Image Matching. 4.1 Abstract. 4.2 Basic Concepts. 4.3 Discrete Averaging. 4.4 Discrete Differentiation. 4.5 Edge Detection. 4.6 Structural Tensor. 4.7 Corner Detection. 4.8 Practical Examples. 4.9 Closure. 5 Scale-space Vision. 5.1 Abstract. 5.2 Basic Concepts. 5.3 Constructing a Scale-space. 5.4 Multi-resolution Pyramids. 5.5 Practical Examples. 5.6 Closure. 6 Image Matching Algorithms. 6.1 Abstract. 6.2 Basic Concepts. 6.3 Match Measures. 6.4 Computational Aspects of Matching. 6.5 Diversity of Stereo Matching Methods. 6.6 Area-based Matching. 6.7 Area-based Elastic Matching. 6.8 Feature-based Image Matching. 6.9 Gradient-based Matching. 6.10 Method of Dynamic Programming. 6.11 Graph Cut Approach. 6.12 Optical Flow. 6.13 Practical Examples. 6.14 Closure. 7 Space Reconstruction and Multiview Integration. 7.1 Abstract. 7.2 General 3D Reconstruction. 7.3 Multiview Integration. 7.4 Closure. 8 Case Examples. 8.1 Abstract. 8.2 3D System for Vision-Impaired Persons. 8.3 Face and Body Modelling. 8.4 Clinical and Veterinary Applications. 8.5 Movie Restoration. 8.6 Closure. Part III. 9 Basics of the Projective Geometry. 9.1 Abstract. 9.2 Homogeneous Coordinates. 9.3 Point, Line and the Rule of Duality. 9.4 Point and Line at Infinity. 9.5 Basics on Conics. 9.6 Group of Projective Transformations. 9.7 Projective Invariants. 9.8 Closure. 10 Basics of Tensor Calculus for Image Processing. 10.1 Abstract. 10.2 Basic Concepts. 10.3 Change of a Base. 10.4 Laws of Tensor Transformations. 10.5 The Metric Tensor. 10.6 Simple Tensor Algebra. 10.7 Closure. 11 Distortions and Noise in Images. 11.1 Abstract. 11.2 Types and Models of Noise. 11.3 Generating Noisy Test Images. 11.4 Generating Random Numbers with Normal Distributions. 11.5 Closure. 12 Image Warping Procedures. 12.1 Abstract. 12.2 Architecture of the Warping System. 12.3 Coordinate Transformation Module. 12.4 Interpolation of Pixel Values. 12.5 The Warp Engine. 12.6 Software Model of the Warping Schemes. 12.7 Warp Examples. 12.8 Finding the Linear Transformation from Point Correspondences. 12.9 Closure. 13 Programming Techniques for Image Processing and Computer Vision. 13.1 Abstract. 13.2 Useful Techniques and Methodology. 13.3 Design Patterns. 13.4 Object Lifetime and Memory Management. 13.5 Image Processing Platforms. 13.6 Closure. 14 Image Processing Library. References. Index.

365 citations


Journal ArticleDOI
TL;DR: This survey provides a general overview of classical algorithms and recent progresses in the field of perceptual image quality assessment and describes the performances of the state-of-the-art quality measures for visual signals.
Abstract: Perceptual quality assessmentplays a vital role in the visual communication systems owing to theexistence of quality degradations introduced in various stages of visual signalacquisition, compression, transmission and display.Quality assessment for visual signals can be performed subjectively andobjectively, and objective quality assessment is usually preferred owing to itshigh efficiency and easy deployment. A large number of subjective andobjective visual quality assessment studies have been conducted during recent years.In this survey, we give an up-to-date and comprehensivereview of these studies.Specifically, the frequently used subjective image quality assessment databases are firstreviewed, as they serve as the validation set for the objective measures.Second, the objective image quality assessment measures are classified and reviewed according to the applications and the methodologies utilized in the quality measures.Third, the performances of the state-of-the-artquality measures for visual signals are compared with an introduction of theevaluation protocols.This survey provides a general overview of classical algorithms andrecent progresses in the field of perceptual image quality assessment.

281 citations


Book ChapterDOI
23 Aug 2020
TL;DR: Wang et al. as mentioned in this paper proposed a self-supervised video representation learning from a new perspective by video pace prediction, i.e., given a video played in natural pace, randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
Abstract: This paper addresses the problem of self-supervised video representation learning from a new perspective – by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a widely used technique in film making. Specifically, given a video played in natural pace, we randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip. The assumption here is that the network can only succeed in such a pace reasoning task when it understands the underlying video content and learns representative spatio-temporal features. In addition, we further introduce contrastive learning to push the model towards discriminating different paces by maximizing the agreement on similar video content. To validate the effectiveness of the proposed method, we conduct extensive experiments on action recognition and video retrieval tasks with several alternative network architectures. Experimental evaluations show that our approach achieves state-of-the-art performance for self-supervised video representation learning across different network architectures and different benchmarks. The code and pre-trained models are available at https://github.com/laura-wang/video-pace.

153 citations


Journal ArticleDOI
TL;DR: A novel IQA-orientated CNN method is developed for blind IQA (BIQA), which can efficiently represent the quality degradation and the Cascaded CNN with HDC (named as CaHDC) is introduced, demonstrating the superiority of CaH DC compared with existing BIQA methods.
Abstract: The deep convolutional neural network (CNN) has achieved great success in image recognition. Many image quality assessment (IQA) methods directly use recognition-oriented CNN for quality prediction. However, the properties of IQA task is different from image recognition task. Image recognition should be sensitive to visual content and robust to distortion, while IQA should be sensitive to both distortion and visual content. In this paper, an IQA-oriented CNN method is developed for blind IQA (BIQA), which can efficiently represent the quality degradation. CNN is large-data driven, while the sizes of existing IQA databases are too small for CNN optimization. Thus, a large IQA dataset is firstly established, which includes more than one million distorted images (each image is assigned with a quality score as its substitute of Mean Opinion Score (MOS), abbreviated as pseudo-MOS). Next, inspired by the hierarchical perception mechanism (from local structure to global semantics) in human visual system, a novel IQA-orientated CNN method is designed, in which the hierarchical degradation is considered. Finally, by jointly optimizing the multilevel feature extraction, hierarchical degradation concatenation (HDC) and quality prediction in an end-to-end framework, the Cascaded CNN with HDC (named as CaHDC) is introduced. Experiments on the benchmark IQA databases demonstrate the superiority of CaHDC compared with existing BIQA methods. Meanwhile, the CaHDC (with about 0.73M parameters) is lightweight comparing to other CNN-based BIQA models, which can be easily realized in the microprocessing system. The dataset and source code of the proposed method are available at https://web.xidian.edu.cn/wjj/paper.html .

113 citations


Journal ArticleDOI
TL;DR: Simulation outcomes conducted on different types of medical images disclose that the proposed scheme demonstrates superior transparency and robustness against signal and compression attacks compared with the related hybrid optimized algorithms.

94 citations


Journal ArticleDOI
TL;DR: The proposed watermarking method based on 4 × 4 image blocks using redundant wavelet transform with singular value decomposition considering human visual system (HVS) characteristics expressed by entropy values provides high robustness especially under image processing attacks, JPEG2000 and JPEG XR attacks.
Abstract: With the rapid growth of internet technology, image watermarking method has become a popular copyright protection method for digital images. In this paper, we propose a watermarking method based on $$4\times 4$$ image blocks using redundant wavelet transform with singular value decomposition considering human visual system (HVS) characteristics expressed by entropy values. The blocks which have the lower HVS entropies are selected for embedding the watermark. The watermark is embedded by examining $$U_{2,1}$$ and $$U_{3,1}$$ components of the orthogonal matrix obtained from singular value decomposition of the redundant wavelet transformed image block where an optimal threshold value based on the trade-off between robustness and imperceptibility is used. In order to provide additional security, a binary watermark is scrambled by Arnold transform before the watermark is embedded into the host image. The proposed scheme is tested under various image processing, compression and geometrical attacks. The test results are compared to other watermarking schemes that use SVD techniques. The experimental results demonstrate that our method can achieve higher imperceptibility and robustness under different types of attacks compared to existing schemes. Our method provides high robustness especially under image processing attacks, JPEG2000 and JPEG XR attacks. It has been observed that the proposed method achieves better performance over the recent existing watermarking schemes.

76 citations


Journal ArticleDOI
TL;DR: A novel Bayesian fusion model is established for infrared and visible images that can generate better fused images with highlighting targets and rich texture details, which may potentially improve the reliability of the target automatic detection and recognition system.

71 citations


Journal ArticleDOI
TL;DR: A mixed datasets training strategy for training a single VQA model with multiple datasets is explored and the superior performance of the unified model in comparison with the state-of-the-art models is proved.
Abstract: Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at this https URL.

70 citations


Journal ArticleDOI
TL;DR: Experimental results show the superiority of the proposed PW-JND model to the conventional JND models, and a sliding window based search strategy to predict PW- JND based on the prediction results of the perceptually lossy/lossless predictor.
Abstract: Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately. In this paper, we propose a deep learning based PW-JND prediction model for image compression. Firstly, we formulate the task of predicting PW-JND as a multi-class classification problem, and propose a framework to transform the multi-class classification problem to a binary classification problem solved by just one binary classifier. Secondly, we construct a deep learning based binary classifier named perceptually lossy/lossless predictor which can predict whether an image is perceptually lossy to another or not. Finally, we propose a sliding window based search strategy to predict PW-JND based on the prediction results of the perceptually lossy/lossless predictor. Experimental results show that the mean accuracy of the perceptually lossy/lossless predictor reaches 92%, and the absolute prediction error of the proposed PW-JND model is 0.79 dB on average, which show the superiority of the proposed PW-JND model to the conventional JND models.

68 citations


Proceedings Article
Baifeng Shi1, Dinghuai Zhang1, Qi Dai2, Jingdong Wang2, Zhanxing Zhu1, Yadong Mu1 
12 Jul 2020
TL;DR: This work proposes a light-weight model-agnostic method, namely Informative Dropout (InfoDrop), to improve interpretability and reduce texture bias in CNNs, and adopts a Dropout-like algorithm to decorrelate the model output from the local texture.
Abstract: Convolutional Neural Networks (CNNs) are known to rely more on local texture rather than global shape when making decisions. Recent work also indicates a close relationship between CNN’s texture-bias and its robustness against distribution shift, adversarial perturbation, random corruption, etc. In this work, we attempt at improving various kinds of robustness universally by alleviating CNN’s texture bias. With inspiration from the human visual system, we propose a light-weight model-agnostic method, namely Informative Dropout (InfoDrop), to improve interpretability and reduce texture bias. Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture. Through extensive experiments, we observe enhanced robustness under various scenarios (domain generalization, few-shot classification, image corruption, and adversarial perturbation). To the best of our knowledge, this work is one of the earliest attempts to improve different kinds of robustness in a unified model, shedding new light on the relationship between shape-bias and robustness, also on new approaches to trustworthy machine learning algorithms. Code is available on GitHub.

61 citations


Posted Content
TL;DR: This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction -- by introducing contrastive learning to push the model towards discriminating different paces by maximizing the agreement on similar video content.
Abstract: This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a widely used technique in film making. Specifically, given a video played in natural pace, we randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip. The assumption here is that the network can only succeed in such a pace reasoning task when it understands the underlying video content and learns representative spatio-temporal features. In addition, we further introduce contrastive learning to push the model towards discriminating different paces by maximizing the agreement on similar video content. To validate the effectiveness of the proposed method, we conduct extensive experiments on action recognition and video retrieval tasks with several alternative network architectures. Experimental evaluations show that our approach achieves state-of-the-art performance for self-supervised video representation learning across different network architectures and different benchmarks. The code and pre-trained models are available at this https URL.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: Experimental results show that the proposed model can predict subjective video quality more accurately than the publicly available video quality models representing the state-of-the-art.
Abstract: Due to the wide range of different natural temporal and spatial distortions appearing in user generated video content, blind assessment of natural video quality is a challenging research problem. In this study, we combine the hand-crafted statistical temporal features used in a state-of-the-art video quality model and spatial features obtained from convolutional neural network trained for image quality assessment via transfer learning. Experimental results on two recently published natural video quality databases show that the proposed model can predict subjective video quality more accurately than the publicly available video quality models representing the state-of-the-art. The proposed model is also competitive in terms of computational complexity.

Journal ArticleDOI
Jun Wang1, Wenbo Wan1, Xiao Xiao Li1, Jian De Sun1, Hua Xiang Zhang1 
TL;DR: A novel color image watermarking scheme in discrete cosine transform (DCT) domain based on JND, which takes both orientation diversity and color complexity features into account, and experimental results show that the proposed scheme is reliable and effective.
Abstract: The Just Noticeable Distortion (JND) can reliably measure the perceptual strength in image watermarking, but, it remains a challenge to computationally model the process of embedding watermark without prior knowledge of the image contents. This paper proposed a novel color image watermarking scheme in discrete cosine transform (DCT) domain based on JND, which takes both orientation diversity and color complexity features into account. Firstly, two indicator was introduced which take into account the differences in the texture types and orientation diversity of the Human Visual System (HVS) in the proposed JND contrast masking (CM) processing. In addition, a novel color complexity weight from Cb-channel is used to guarantee the scheme robustness. Then, a novel JND model combined with the proposed contrast masking and color complexity is applied into quantization watermarking scheme. Compared with the state-of-the-art methods for color image watermarking, experimental results using publicly available images show that our proposed scheme is reliable and effective.

Journal ArticleDOI
TL;DR: A new high capacity image steganography method based on deep learning using the Discrete Cosine Transform to transform the secret image, and then the transformed image is encrypted by Elliptic Curve Cryptography to improve the anti-detection property of the obtained image.
Abstract: Image steganography is a technology that hides sensitive information into an image. The traditional image steganography method tends to securely embed secret information in the host image so that the payload capacity is almost ignored and the steganographic image quality needs to be improved for the Human Visual System(HVS). Therefore, in this work, we propose a new high capacity image steganography method based on deep learning. The Discrete Cosine Transform(DCT) is used to transform the secret image, and then the transformed image is encrypted by Elliptic Curve Cryptography(ECC) to improve the anti-detection property of the obtained image. To improve steganographic capacity, the SegNet Deep Neural Network with a set of Hiding and Extraction networks enables steganography and extraction of full-size images. The experimental results show that the method can effectively allocate each pixel in the image so that the relative capacity of steganography reaches 1. Besides, the image obtained using this steganography method has higher Peak Signal-to-Noise Ratio(PSNR) and Structural Similarity Index(SSIM) values, reaching 40dB and 0.96, respectively.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: This work proposes a novel no-reference VQA framework named Recurrent-In-Recurrent Network (RIRNet), which integrates concepts from motion perception in human visual system (HVS), which is manifested in the designed network structure composed of low- and high- level processing.
Abstract: Video quality assessment (VQA), which is capable of automatically predicting the perceptual quality of source videos especially when reference information is not available, has become a major concern for video service providers due to the growing demand for video quality of experience (QoE) by end users. While significant advances have been achieved from the recent deep learning techniques, they often lead to misleading results in VQA tasks given their limitations on describing 3D spatio-temporal regularities using only fixed temporal frequency. Partially inspired by psychophysical and vision science studies revealing the speed tuning property of neurons in visual cortex when performing motion perception (i.e., sensitive to different temporal frequencies), we propose a novel no-reference (NR) VQA framework named Recurrent-In-Recurrent Network (RIRNet) to incorporate this characteristic to prompt an accurate representation of motion perception in VQA task. By fusing motion information derived from different temporal frequencies in a more efficient way, the resulting temporal modeling scheme is formulated to quantify the temporal motion effect via a hierarchical distortion description. It is found that the proposed framework is in closer agreement with quality perception of the distorted videos since it integrates concepts from motion perception in human visual system (HVS), which is manifested in the designed network structure composed of low- and high- level processing. A holistic validation of our methods on four challenging video quality databases demonstrates the superior performances over the state-of-the-art methods.

Journal ArticleDOI
TL;DR: Experimental results on three publicly available SR image quality databases demonstrate the effectiveness and generalization ability of the proposed DeepSRQ method compared with state-of-the-art image quality assessment algorithms.

Journal ArticleDOI
TL;DR: It is found that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures, much like the human visual system.
Abstract: Deep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with neural representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.

Journal ArticleDOI
TL;DR: This paper proposes an interpolation-based RDH (IRDH) scheme that improves Lee and Huang’s scheme and Malik et al.
Abstract: Reversible data hiding (RDH) within images is the process of securing data into cover images without degradation. Its challenge is to hide a large payload while taking into account the human visual system so that the distortion of the stego-image is negligible. It is highly desired for the images that have special requirements like the ones in the medical and military fields where the original images are required to be regenerated with no loss after extracting the data. In this paper, we propose an interpolation-based RDH (IRDH) scheme that improves Lee and Huang’s scheme and Malik et al.’s scheme by combining their embedding techniques along with the optimal pixel adjustment process (OPAP) in a way that increases the embedding capacity and the visual quality of both the schemes. In this presented scheme, we start by stretching the size of the original image using the existing enhanced neighbor mean interpolation (ENMI) interpolation technique then the data is embedded into the interpolated pixels using our novel embedding method that depends on the intensity of the pixels and the maximized difference values. This innovative scheme presented all steps covering generation of the interpolated image, data embedding, data extraction and image recovery, making it in testing situation to be compared fairly with others. The experimental results demonstrate that the achieved embedding capacity by our hiding technique is more than 537 Kb for all the test images. Also, the experiments show that our proposed scheme has the highest embedding capacity among five current schemes which are Jung and Yoo’s scheme, Lee and Huang’s scheme, Chang et al.’s scheme, Zhang et al.’s scheme and Malik et al.’s scheme with attractive image security quality.

Journal ArticleDOI
TL;DR: The approach taken aims at exploiting the merits of the wavelet transform: sparsity, multi-resolution structure, and similarity with the human visual system, to adapt an unsupervised dictionary learning algorithm for creating a dictionary devoted to noise reduction.
Abstract: Image denoising plays an important role in image processing, which aims to separate clean images from noisy images. A number of methods have been presented to deal with this practical problem over the past several years. The best currently available wavelet-based denoising methods take advantage of the merits of the wavelet transform. Most of these methods, however, still have difficulties in defining the threshold parameter which can limit their capability. In this paper, we propose a novel wavelet denoising approach based on unsupervised learning model. The approach taken aims at exploiting the merits of the wavelet transform: sparsity, multi-resolution structure, and similarity with the human visual system, to adapt an unsupervised dictionary learning algorithm for creating a dictionary devoted to noise reduction. Using the K-Singular Value Decomposition (K-SVD) algorithm, we obtain an adaptive dictionary by learning over the wavelet decomposition of the noisy image. Experimental results on benchmark test images show that our proposed method achieves very competitive denoising performance and outperforms state-of-the-art denoising methods, especially in the peak signal to noise ratio (PSNR), the structural similarity (SSIM) index, and visual effects with different noise levels.

Journal ArticleDOI
19 Oct 2020-Entropy
TL;DR: Most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail and relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed.
Abstract: Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end.

Journal ArticleDOI
TL;DR: A novel framework for both multiplicative noise suppression and robust contrast enhancement and its effectiveness using a wide range of clinical ultrasound scans is demonstrated and generates superior performance compared with other well-established methods.
Abstract: Speckle noise reduction algorithms are extensively used in the field of ultrasound image analysis with the aim of improving image quality and diagnostic accuracy. However, significant speckle filtering induces blurring, and this requires the enhancement of features and fine details. We propose a novel framework for both multiplicative noise suppression and robust contrast enhancement and demonstrate its effectiveness using a wide range of clinical ultrasound scans. Our approach to noise suppression uses a novel algorithm based on a convolutional neural network that is first trained on synthetically modeled ultrasound images and then applied on real ultrasound videos. The feature improvement stage uses an improved contrast-limited adaptive histogram equalization (CLAHE) method for enhancing texture features, contrast, resolvable details, and image structures to which the human visual system is sensitive in ultrasound video frames. The proposed CLAHE algorithm also considers an automatic system for evaluating the grid size using entropy, and three different target distribution functions (uniform, Rayleigh, and exponential), and interpolation techniques (B-spline, cubic, and Lanczos-3). An extensive comparative study has been performed to find the most suitable distribution and interpolation techniques and also the optimal clip limit for ultrasound video feature enhancement after speckle suppression. Subjective assessments by four radiologists and experimental validation using three quality metrics clearly indicate that the proposed framework generates superior performance compared with other well-established methods. The processing pipeline reduces speckle effectively while preserving essential information and enhancing the overall visual quality and therefore could find immediate applications in real-time ultrasound video segmentation and classification algorithms.

Journal ArticleDOI
TL;DR: The results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity.
Abstract: Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons' receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.

Journal ArticleDOI
TL;DR: It is shown that when CNNs are trained end-to-end they learn to classify images based on whatever feature is predictive of a category within the dataset, which raises doubts over the assumption that simply learning end- to-end in standard CNNs leads to the emergence of similar representations to the human visual system.

Journal ArticleDOI
TL;DR: This work uses visual crowding as a well-controlled, specific probe to test global shape computations of ffCNN and provides evidence that ffCNNs cannot produce human-like global shape computation for principled architectural reasons.

Journal ArticleDOI
TL;DR: Experimental results on three public benchmark databases demonstrate that the performance of the proposed perceptual quality measure by spatial continuity (PQSC) is superior to the current blind image quality assessment methods, even better than some full reference image quality Assessment counterparts.
Abstract: In this paper, we propose an effective blind quality assessment method for screen content images (SCIs), called perceptual quality measure by spatial continuity (PQSC). With the center-surround mechanism in the human visual system (HVS), the proposed method extracts the statistical features on chromatic and textural variations in SCIs to measure the visual distortion. First, by considering the chromatic continuity between spatially adjacent pixels, photo-metric invariant chromatic descriptors are extracted as zero-order and first-order features. Second, motivated by the perceptual mechanism that the HVS is sensitive to image texture variation, we employ local ternary pattern operator to effectively depict the spatial continuity of texture. With these extracted chromatic and textural features, we further adopt histogram to compute the statistical chromatic and textural features. Support vector regression (SVR) is used to train the quality prediction model from visual features to human ratings. Experimental results on three public benchmark databases demonstrate that the performance of our method is superior to the current blind image quality assessment methods, even better than some full reference image quality assessment counterparts.

Journal ArticleDOI
TL;DR: A robust image watermark algorithm based on a generative adversarial network that includes two modules, generator and adversary, which has a better visual performance and is more robust against noise interference than state-of-art schemes.
Abstract: Digital watermark embeds information bits into digital cover such as images and videos to prove the creator's ownership of his work. In this paper, we propose a robust image watermark algorithm based on a generative adversarial network. This model includes two modules, generator and adversary. Generator is mainly used to generate images embedded with watermark, and decode the image damaged by noise to obtain the watermark. Adversary is used to discriminate whether the image is embedded with watermark and damage the image by noise. Based on the model Hidden (hiding data with deep networks), we add a high-pass filter in front of the discriminator, making the watermark tend to be embedded in the mid-frequency region of the image. Since the human visual system pays more attention to the central area of the image, we give a higher weight to the image center region, and a lower weight to the edge region when calculating the loss between cover and embedded image. The watermarked image obtained by this scheme has a better visual performance. Experimental results show that the proposed architecture is more robust against noise interference compared with the state-of-art schemes.

Posted Content
TL;DR: A landmark-guided attention branch is proposed to find and discard corrupted features from occluded regions so that they are not used for recognition and results in more diverse and discriminative features, enabling the expression recognition system to re-cover even though the face is partially occluding.
Abstract: Recognizing the expressions of partially occluded faces is a challenging computer vision problem. Previous expression recognition methods, either overlooked this issue or resolved it using extreme assumptions. Motivated by the fact that the human visual system is adept at ignoring the occlusion and focus on non-occluded facial areas, we propose a landmark-guided attention branch to find and discard corrupted features from occluded regions so that they are not used for recognition. An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions. To further improve robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and task each block to predict the expression independently. This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded. Depending on the synergistic effects of the two branches, our occlusion-adaptive deep network significantly outperforms state-of-the-art methods on two challenging in-the-wild benchmark datasets and three real-world occluded expression datasets.

Proceedings ArticleDOI
01 Jan 2020
TL;DR: The HVS as a whole is explored to consider not just the eye globe movement but also the eyelid, extraocular muscles, cells, and surrounding nerves in the HVS to enhance authentication stability and present OcuLock, an HVS-based system for reliable and unobservable VR HMD authentication.
Abstract: The increasing popularity of virtual reality (VR) in a wide spectrum of applications has generated sensitive personal data such as medical records and credit card information. While protecting such data from unauthorized access is critical, directly applying traditional authentication methods (e.g., PIN) through new VR input modalities such as remote controllers and head navigation would cause security issues. The authentication action can be purposefully observed by attackers to infer the authentication input. Unlike any other mobile devices, VR presents immersive experience via a head-mounted display (HMD) that fully covers users’ eye area without public exposure. Leveraging this feature, we explore human visual system (HVS) as a novel biometric authentication tailored for VR platforms. While previous works used eye globe movement (gaze) to authenticate smartphones or PCs, they suffer from a high error rate and low stability since eye gaze is highly dependent on cognitive states. In this paper, we explore the HVS as a whole to consider not just the eye globe movement but also the eyelid, extraocular muscles, cells, and surrounding nerves in the HVS. Exploring HVS biostructure and unique HVS features triggered by immersive VR content can enhance authentication stability. To this end, we present OcuLock, an HVS-based system for reliable and unobservable VR HMD authentication. OcuLock is empowered by an electrooculography (EOG) based HVS sensing framework and a record-comparison driven authentication scheme. Experiments through 70 subjects show that OcuLock is resistant against common types of attacks such as impersonation attack and statistical attack with Equal Error Rates as low as 3.55% and 4.97% respectively. More importantly, OcuLock maintains a stable performance over a 2month period and is preferred by users when compared to other potential approaches.

Journal ArticleDOI
TL;DR: This work shows that more 'flexible' network architectures, with more layers and a higher degree of nonlinearity, may actually have a worse capability of reproducing visual illusions, implying a word of caution on using CNNs to study human vision.

Journal ArticleDOI
TL;DR: This work proposes a general neuroprosthesis framework composed of several task-oriented and visual encoding modules and designs a tool - Neurolight - that allows these models to be interfaced with intracortical microelectrodes in order to create electrical stimulation patterns that can evoke useful perceptions.
Abstract: Visual neuroprosthesis, that provide electrical stimulation along several sites of the human visual system, constitute a potential tool for vision restoration for the blind. Scientific and technological progress in the fields of neural engineering and artificial vision comes with new theories and tools that, along with the dawn of modern artificial intelligence, constitute a promising framework for the further development of neurotechnology. In the framework of the development of a Cortical Visual Neuroprosthesis for the blind (CORTIVIS), we are now facing the challenge of developing not only computationally powerful tools and flexible approaches that will allow us to provide some degree of functional vision to individuals who are profoundly blind. In this work, we propose a general neuroprosthesis framework composed of several task-oriented and visual encoding modules. We address the development and implementation of computational models of the firing rates of retinal ganglion cells and design a tool - Neurolight - that allows these models to be interfaced with intracortical microelectrodes in order to create electrical stimulation patterns that can evoke useful perceptions. In addition, the developed framework allows the deployment of a diverse array of state-of-the-art deep-learning techniques for task-oriented and general image pre-processing, such as semantic segmentation and object detection in our system's pipeline. To the best of our knowledge, this constitutes the first deep-learning-based system designed to directly interface with the visual brain through an intracortical microelectrode array. We implement the complete pipeline, from obtaining a video stream to developing and deploying task-oriented deep-learning models and predictive models of retinal ganglion cells' encoding of visual inputs under the control of a neurostimulation device able to send electrical train pulses to a microelectrode array implanted at the visual cortex.