scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Memorability-based image compression

01 Jul 2019-Iet Image Processing (The Institution of Engineering and Technology)-Vol. 13, Iss: 9, pp 1490-1501
TL;DR: The comparative analysis shows that the memorability-based compression outperforms the state-of-the-art compression techniques.
Abstract: This study is concerned with achieving the image compression using the concept of memorability. The authors have used memorability of an image, as a perceptual measure while image coding. In the proposed approach, a region-of-interest-based memorability preserving image compression algorithm which is accomplished via two sub-processes namely, memorability prediction and image compression is introduced. The memorability of images is predicted using convolutional neural network and restricted Boltzmann machine features. Based on these features, the memorability score of individual patches in an image is calculated and these scores are used to generate the memorability map. These memorability map values are used for optimised image compression. In order to validate the results, an eye tracking experiment with human participants is performed. The comparative analysis shows that the memorability-based compression outperforms the state-of-the-art compression techniques.
Citations
More filters
Journal ArticleDOI
TL;DR: This review paper attempts to systematically summarize environment perception technology and discuss the new challenges currently faced, including the advantages, disadvantages and applicable occasions of several commonly used sensing methods to provide a clear selection guide.
Abstract: Environmental perception technology is the guarantee of the safety of driverless vehicles. At present, there are a lot of researches and reviews on environmental perception, aiming to realize unmanned driving while ensuring the safety of human life. However, the technology is facing new challenges in the new era. This review paper attempts to systematically summarize environment perception technology and discuss the new challenges currently faced. To this end, we first summarized the advantages, disadvantages and applicable occasions of several commonly used sensing methods to provide a clear selection guide. The new challenges faced by environmental perception technology are discussed from three aspects: technology, external environment and applications. Finally, the article also points out the future development trends and efforts of environmental perception technology.

62 citations

Posted Content
TL;DR: In this paper, an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims is proposed, which relies on the cognitive judgments of human beings.
Abstract: This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human's role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors. The comparison with the state-of-the-art techniques demonstrates that the proposed approach yields ceiling-level performance with reduced computational cost in summarising the videos.

1 citations

Proceedings ArticleDOI
01 Dec 2019
TL;DR: The results show that the performance of the network improves significantly when trained iteratively with increasing level of blur, and the model trained on gradually decreasing blurriness on Dog vs Cat dataset for classification task.
Abstract: It is claimed that convolutional neural networks are inspired by human vision systems. Based on the literature of development of human visual system, we know that newly born child has blurred vision initially due to rapid eye movements. This rapid eye movement is termed as Nystagmus. This paper is concerned with a novel approach to quantify the nystagmus and implementing an artificial system that can mimic the visual learning of a newly born child or person with nystagmus. To quantify the nystagmus, we have recorded 10 seconds of eye movement videos from 3 subjects and 10 trials. We estimate the eye movement frequency by tracking the eye pupil through image processing, which is then used to create a database. To simulate a suitable learning environment, we have trained our model on gradually decreasing blurriness on Dog vs Cat dataset for classification task. The novelty of the paper is in the type of training which is elicited by human visual learning system. The results show that the performance of the network improves significantly when trained iteratively with increasing level of blur.

1 citations


Cites background from "Memorability-based image compressio..."

  • ...There are several studies using eye-movement data for reading comprehension[2], memorability prediction [3], understanding the recognition performance[4]....

    [...]

Posted Content
TL;DR: In this article, the congruence of information gathering strategies between humans and deep neural networks has been examined in a character recognition task, where the authors use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions.
Abstract: Human observers engage in selective information uptake when classifying visual patterns. The same is true of deep neural networks, which currently constitute the best performing artificial vision systems. Our goal is to examine the congruence, or lack thereof, in the information-gathering strategies of the two systems. We have operationalized our investigation as a character recognition task. We have used eye-tracking to assay the spatial distribution of information hotspots for humans via fixation maps and an activation mapping technique for obtaining analogous distributions for deep networks through visualization maps. Qualitative comparison between visualization maps and fixation maps reveals an interesting correlate of congruence. The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters. On the other hand, when the focused regions are different for humans and deep nets, the characters are typically misclassified by the latter. Hence, we propose to use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions. We find that such supervision improves the model's performance significantly and does not require any additional parameters. This approach has the potential to find applications in diverse domains such as medical analysis and surveillance in which explainability helps to determine system fidelity.
References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525 citations

Proceedings ArticleDOI
27 Jun 2004
TL;DR: The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.
Abstract: Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and cannot incorporate prior information into the learning process. In addition, no algorithm presented in the literature has been tested on more than a handful of object categories. We present an method for learning object categories from just a few training images. It is quick and it uses prior information in a principled way. We test it on a dataset composed of images of objects belonging to 101 widely varied categories. Our proposed method is based on making use of prior information, assembled from (unrelated) object categories which were previously learnt. A generative probabilistic model is used, which represents the shape and appearance of a constellation of features belonging to the object. The parameters of the model are learnt incrementally in a Bayesian manner. Our incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood. The incremental and batch versions have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible. Both Bayesian methods outperform maximum likelihood on small training sets.

2,924 citations

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This paper collects eye tracking data of 15 viewers on 1003 images and uses this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features.
Abstract: For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features This large database of eye tracking data is publicly available with this paper

2,093 citations

Journal ArticleDOI
TL;DR: Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric and it is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure.
Abstract: Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric. It is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure. However, when the content is changed, correlation between subjective quality and PSNR is highly reduced. Hence PSNR cannot be a reliable method for assessing the video quality across different video contents.

1,899 citations