scispace - formally typeset
Search or ask a question
Author

P. Le Callet

Bio: P. Le Callet is an academic researcher from University of Nantes. The author has contributed to research in topics: Human visual system model & Video quality. The author has an hindex of 21, co-authored 47 publications receiving 2398 citations. Previous affiliations of P. Le Callet include École polytechnique de l'université de Nantes & Institut de Recherche en Communications et Cybernétique de Nantes.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents a coherent computational approach to the modeling of the bottom-up visual attention, mainly based on the current understanding of the HVS behavior, which includes Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions.
Abstract: Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding, watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the visual attention system. The first published computational models have been based on some basic and well-understood human visual system (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.

675 citations

Journal ArticleDOI
TL;DR: This paper considers the DIBR-based synthesized view evaluation problem, and provides hints for a new objective measure for 3DTV quality assessment.
Abstract: 3DTV technology has brought out new challenges such as the question of synthesized views evaluation. Synthesized views are generated through a depth image-based rendering (DIBR) process. This process induces new types of artifacts whose impact on visual quality has to be identified considering various contexts of use. While visual quality assessment has been the subject of many studies in the last 20 years, there are still some unanswered questions regarding new technological improvement. DIBR is bringing new challenges mainly because it deals with geometric distortions. This paper considers the DIBR-based synthesized view evaluation problem. Different experiments have been carried out. They question the protocols of subjective assessment and the reliability of the objective quality metrics in the context of 3DTV, in these specific conditions (DIBR-based synthesized views), and they consist in assessing seven different view synthesis algorithms through subjective and objective measurements. Results show that usual metrics are not sufficient for assessing 3-D synthesized views, since they do not correctly render human judgment. Synthesized views contain specific artifacts located around the disoccluded areas, but usual metrics seem to be unable to express the degree of annoyance perceived in the whole image. This study provides hints for a new objective measure. Two approaches are proposed: the first one is based on the analysis of the shifts of the contours of the synthesized view; the second one is based on the computation of a mean SSIM score of the disoccluded areas.

218 citations

Proceedings ArticleDOI
12 Nov 2007
TL;DR: The results show that applying the visual attention to image quality assessment is not trivial, even with the ground truth, and that an artefact is likely more annoying in a salient region than in other areas.
Abstract: The aim of an objective image quality assessment is to find an automatic algorithm that evaluates the quality of pictures or video as a human observer would do. To reach this goal, researchers try to simulate the Human Visual System (HVS). Visual attention is a main feature of the HVS, but few studies have been done on using it in image quality assessment. In this work, we investigate the use of the visual attention information in their final pooling step. The rationale of this choice is that an artefact is likely more annoying in a salient region than in other areas. To shed light on this point, a quality assessment campaign has been conducted during which eye movements have been recorded. The results show that applying the visual attention to image quality assessment is not trivial, even with the ground truth.

188 citations

Journal ArticleDOI
TL;DR: This paper has designed a perceptual full reference video quality assessment metric by focusing on the temporal evolutions of the spatial distortions, and has validated this metric with a dataset built from video sequences of various contents.
Abstract: The temporal distortions such as flickering, jerkiness, and mosquito noise play a fundamental part in video quality assessment. A temporal distortion is commonly defined as the temporal evolution, or fluctuation, of the spatial distortion on a particular area which corresponds to the image of a specific object in the scene. Perception of spatial distortions over time can be largely modified by their temporal changes, such as increase or decrease in the distortions, or as periodic changes in the distortions. In this paper, we have designed a perceptual full reference video quality assessment metric by focusing on the temporal evolutions of the spatial distortions. As the perception of the temporal distortions is closely linked to the visual attention mechanisms, we have chosen to first evaluate the temporal distortion at eye fixation level. In this short-term temporal pooling, the video sequence is divided into spatio-temporal segments in which the spatio-temporal distortions are evaluated, resulting in spatio-temporal distortion maps. Afterwards, the global quality score of the whole video sequence is obtained by the long-term temporal pooling in which the spatio-temporal maps are spatially and temporally pooled. Consistent improvement over objective existing video quality assessment methods is observed. Our validation has been realized with a dataset built from video sequences of various contents.

170 citations

Proceedings ArticleDOI
24 Nov 2003
TL;DR: A new method to evaluate the quality if distorted images based on a comparison between the structural information extracted from the distorted image and from the original image, which is highly correlated with human judgments (mean opinion score).
Abstract: This paper presents a new method to evaluate the quality if distorted images. This method is based on a comparison between the structural information extracted from the distorted image and from the original image. The interest of our method is that it uses reduced references containing perceptual structural information. First, a quick overview of image quality evaluation methods is given. Then the implementation of our human visual system (HVS) model is detailed. At last, results are given for quality evaluation of JPEG and JPEG2000 coded images. They show that our method provides results which are highly correlated with human judgments (mean opinion score). This method has been implemented in an application available on the Internet.

142 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.
Abstract: In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

2,319 citations

Journal ArticleDOI
TL;DR: A taxonomy of nearly 65 models of attention provides a critical comparison of approaches, their capabilities, and shortcomings, and addresses several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures.
Abstract: Modeling visual attention-particularly stimulus-driven, saliency-based attention-has been a very active research area over the past 25 years. Many different models of attention are now available which, aside from lending theoretical contributions to other fields, have demonstrated successful applications in computer vision, mobile robotics, and cognitive systems. Here we review, from a computational perspective, the basic concepts of attention implemented in these models. We present a taxonomy of nearly 65 models, which provides a critical comparison of approaches, their capabilities, and shortcomings. In particular, 13 criteria derived from behavioral and computational studies are formulated for qualitative comparison of attention models. Furthermore, we address several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures. Finally, we highlight current research trends in attention modeling and provide insights for future.

1,817 citations

Journal ArticleDOI
TL;DR: A new type of saliency is proposed—context-aware saliency—which aims at detecting the image regions that represent the scene, and a detection algorithm is presented which is based on four principles observed in the psychological literature.
Abstract: We propose a new type of saliency—context-aware saliency—which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definition, we present a detection algorithm which is based on four principles observed in the psychological literature. The benefits of the proposed approach are evaluated in two applications where the context of the dominant objects is just as essential as the objects themselves. In image retargeting, we demonstrate that using our saliency prevents distortions in the important regions. In summarization, we show that our saliency helps to produce compact, appealing, and informative summaries.

1,708 citations

Journal ArticleDOI
TL;DR: A quality assessment method [most apparent distortion (MAD)], which attempts to explicitly model these two separate strategies, local luminance and contrast masking and changes in the local statistics of spatial-frequency components are used to estimate appearance-based perceived distortion in low-quality images.
Abstract: The mainstream approach to image quality assessment has centered around accurately modeling the single most relevant strategy employed by the human visual system (HVS) when judging image quality (e.g., detecting visible differences, and extracting image structure/information). In this work, we suggest that a single strategy may not be sufficient; rather, we advocate that the HVS uses multiple strategies to determine image quality. For images containing near-threshold distortions, the image is most apparent, and thus the HVS attempts to look past the image and look for the distortions (a detection-based strategy). For images containing clearly visible distortions, the distortions are most apparent, and thus the HVS attempts to look past the distortion and look for the image's subject matter (an appearance-based strategy). Here, we present a quality assessment method [most apparent distortion (MAD)], which attempts to explicitly model these two separate strategies. Local luminance and contrast masking are used to estimate detection-based perceived distortion in high-quality images, whereas changes in the local statistics of spatial-frequency components are used to estimate appearance-based perceived distortion in low-quality images. We show that a combination of these two measures can perform well in predicting subjective ratings of image quality.

1,651 citations

Journal ArticleDOI
TL;DR: It is found that the models designed specifically for salient object detection generally work better than models in closely related areas, which provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems.
Abstract: We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

1,372 citations