scispace - formally typeset
Search or ask a question

Showing papers by "Mohamed-Chaker Larabi published in 2022"


Journal ArticleDOI
18 Jul 2022
TL;DR: A patch-based scheme dedicated to 360-degree IQA, including patches selection and extraction based on latitude to account for the importance of the equatorial region, data normalization, CNN-based architecture and a weighted average pooling of predicted local qualities is proposed.
Abstract: Since the introduction of 360-degree images, a significant number of deep learning based image quality assessment (IQA) models have been introduced. Most of them are based on multichannel architectures where several convolutional neural networks (CNNs) are used together. Despite the competitive results, these models come with a higher cost in terms of complexity. To significantly reduce the complexity and ease the training of the CNN model, this paper proposes a patch-based scheme dedicated to 360-degree IQA. Our framework is developed including patches selection and extraction based on latitude to account for the importance of the equatorial region, data normalization, CNN-based architecture and a weighted average pooling of predicted local qualities. We evaluate the proposed model on two widely used databases and show the superiority to state-of-the-art models, even multichannel ones. Furthermore, the cross-database assessment revealed the good generalization ability, demonstrating the robustness of the proposed model.

3 citations


Journal ArticleDOI
01 Nov 2022
TL;DR: Cross-database evaluations demonstrate that the nature and variety of the content impact the generalization ability of the models, and it is shown that conclusions coming from other image processing communities may not hold for IQA.
Abstract: In this paper, we conduct an extensive study on the use of pre-trained convolutional neural networks (CNNs) for omnidirectional image quality assessment (IQA). To cope with the lack of available IQA databases, transfer learning from seven pre-trained CNN models is investigated over retraining on standard 2D databases. In addition, we explore the influence of various image representations and training strategies on the model’s performance. A comparison of the use of projected versus radial content, and multichannel CNN versus patch-wise training is also covered. The experimental results on two publicly available databases are used to draw conclusions about which strategy best fits the visual quality prediction and at which computational cost. The analysis shows that retraining CNN models on 2D IQA databases improves the prediction accuracy. The latter and the required computational time are found to be significantly affected by the training strategy. Cross-database evaluations demonstrate that the nature and variety of the content impact the generalization ability of the models. Finally, we show that conclusions coming from other image processing communities may not hold for IQA. The provided discussion shall provide insights and recommendations when using pre-trained CNNs for omnidirectional IQA.

1 citations


Proceedings ArticleDOI
16 Oct 2022
TL;DR: In this article , transfer learning from two pre-trained versions of vision transformers (ViTs) and two ConveNets (ResNet-50 and EfficientNet-B3) for 360-degree image quality assessment is investigated.
Abstract: Currently, there are debates on the accuracy of vision transformers (ViTs) compared to ConvNets for image processing tasks. Image quality assessment (IQA) and particularly 360-IQA is lacking insights regarding their performances and robustness compared to the widely used ConvNets. This paper aims to investigate transfer learning from two pre-trained versions of ViTs and two ConveNets (ResNet-50 and EfficientNet-B3) for 360-degree image quality assessment with a focus on (i) the prediction accuracy and generalization ability and (ii) their adaptation to the specific characteristics of 360-degree images. Furthermore, the influence of adaptive patches sampling compared to simply using equirectangular content is analyzed with each architecture. Experimental findings on publicly available datasets (OIQA, CVIQ and MVAQD) show the superiority of ResNet-50 over ViTs and EfficientNet-B3 while requiring less computational time. Also, the base version of ViTs outperforms the larger one. Finally, except for CVIQ, both ViTs and ConveNets benefit from the adaptive sampling strategy, depicting the interest of taking 360-degree characteristics into account.

Proceedings ArticleDOI
16 Oct 2022
TL;DR: In this paper , the effect of normalization on model performance and then which normalization method best fits IQA among existing methods was investigated and statistically compared with three basic scaling methods.
Abstract: Prior to training convolutional neural networks (CNNs) for image quality assessment (IQA), input normalization is sometimes recommended and sometimes not, according to the literature. Although input normalization is known to improve model training and helps in learning important features, it may result in the loss of information such as contrast, color, and luminance. To better explore this issue, we conduct an empirical study to first investigate the effect of normalization on model performance and then which normalization method best fits IQA among existing methods. The performances of the selected methods are statistically compared with three basic scaling methods. The application of normalization is found to be statistically significant on three IQA databases. The performance improvement on the overall databases, as well as per-individual degradation, is demonstrated in the experimental results.


Journal ArticleDOI
TL;DR: In this paper , a group of 25 participants provided their gaze information wearing Tobii Pro Glasses 2 set up at a museum and the corresponding video stream was clipped into 20 videos corresponding to 20 museum exhibits and compensated for user's unwanted head movements.
Abstract: Egocentric vision data captures the first person perspective of a visual stimulus and helps study the gaze behavior in more natural contexts. In this work, we propose a new dataset collected in a free viewing style with an end-to-end data processing pipeline. A group of 25 participants provided their gaze information wearing Tobii Pro Glasses 2 set up at a museum. The gaze stream is post-processed for handling missing or incoherent information. The corresponding video stream is clipped into 20 videos corresponding to 20 museum exhibits and compensated for user’s unwanted head movements. Based on the velocity of directional shifts of the eye, the I-VT algorithm classifies the eye movements into either fixations or saccades. Representative scanpaths are built by generalizing multiple viewers’ gazing styles for all exhibits. Therefore, it is a dataset with both the individual gazing styles of many viewers and the generic trend followed by all of them towards a museum exhibit. The application of our dataset is demonstrated for characterizing the inherent gaze dynamics using state trajectory estimator based on ancestor sampling (STEAS) model in solving gaze data classification and retrieval problems. This dataset can also be used for addressing problems like segmentation, summarization using both conventional machine and deep learning approaches.

Journal ArticleDOI
TL;DR: In this paper , a patch-based training is proposed to account for the non-uniformity of quality distribution of a scene, a weighted pooling of patches’ scores is applied.
Abstract: 360-degree image quality assessment using deep neural networks is usually designed using a multi-channel paradigm exploiting possible viewports. This is mainly due to the high resolution of such images and the unavailability of ground truth labels (subjective quality scores) for individual viewports. The multi-channel model is hence trained to predict the score of the whole 360-degree image. However, this comes with a high complexity cost as multi neural networks run in parallel. In this paper, a patch-based training is proposed instead. To account for the non-uniformity of quality distribution of a scene, a weighted pooling of patches’ scores is applied. The latter relies on natural scene statistics in addition to perceptual properties related to immersive environments.

Proceedings ArticleDOI
16 Oct 2022
TL;DR: Wang et al. as discussed by the authors proposed a two-pathway CMRNet (TP-CMRNet) with effective feature integration of spatial and temporal domains at multiple scales for video saliency prediction.
Abstract: Existing dynamic saliency prediction models face challenges like inefficient spatio-temporal feature integration, ineffective multi-scale feature extraction, and lacking domain adaptation because of huge pre-trained backbone networks. In this paper, we propose a two pathway architecture with effective feature integration of spatial and temporal domains at multiple scales for video saliency prediction. Frame and optical flow pathways extract features from video frame and optical flow maps, respectively using a series of cross-concatenated multi-scale residual (CMR) blocks. We name this network as two-pathway CMRNet (TP-CMRNet). Every CMR block follows a feature fusion and attention module for merging features from two pathways and guiding the network to weigh salient regions, respectively. A bi-directional LSTM module is used for learning the task by looking at previous and next video frames. We build a simple decoder for feature reconstruction into the final attention map. TP-CMRNet is comprehensively evaluated using three benchmark datasets: DHF1K, Hollywood-2, and UCF sports. We observe that our model performs at par with other deep dynamic models. In particular, we outperform all the other models with a lesser number of model parameters and lower inference time.