scispace - formally typeset
Search or ask a question
Author

Weixia Zhang

Other affiliations: Wuhan University
Bio: Weixia Zhang is an academic researcher from Shanghai Jiao Tong University. The author has contributed to research in topics: Image quality & Convolutional neural network. The author has an hindex of 7, co-authored 19 publications receiving 235 citations. Previous affiliations of Weixia Zhang include Wuhan University.

Papers
More filters
Journal ArticleDOI
TL;DR: A deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images and achieves state-of-the-art performance on both synthetic and authentic IQA databases is proposed.
Abstract: We propose a deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images. Our model constitutes two streams of deep convolutional neural networks (CNNs), specializing in two distortion scenarios separately. For synthetic distortions, we first pre-train a CNN to classify the distortion type and the level of an input image, whose ground truth label is readily available at a large scale. For authentic distortions, we make use of a pre-train CNN (VGG-16) for the image classification task. The two feature sets are bilinearly pooled into one representation for a final quality prediction. We fine-tune the whole network on the target databases using a variant of stochastic gradient descent. The extensive experimental results show that the proposed model achieves state-of-the-art performance on both synthetic and authentic IQA databases. Furthermore, we verify the generalizability of our method on the large-scale Waterloo Exploration Database, and demonstrate its competitiveness using the group maximum differentiation competition methodology.

390 citations

Journal ArticleDOI
TL;DR: In this paper, a unified blind image quality assessment (BIQA) model was developed and an approach of training it for both synthetic and realistic distortions was proposed to confront the cross-distortion-scenario challenge.
Abstract: Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shift between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an approach of training it for both synthetic and realistic distortions. We first sample pairs of images from individual IQA databases, and compute a probability that the first image of each pair is of higher quality. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.

89 citations

Journal ArticleDOI
TL;DR: A unified BIQA model is developed and an approach of training it for both synthetic and realistic distortions is proposed, and the universality of the proposed training strategy is demonstrated by using it to improve existing BIZA models.
Abstract: Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shifts between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an effective approach of training it for both synthetic and realistic distortions. We first sample pairs of images from the same IQA databases and compute a probability that one image of each pair is of higher quality as the supervisory signal. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.

70 citations

Proceedings ArticleDOI
01 Oct 2020
TL;DR: A BIQA model and an approach of training it on multiple IQA databases (of different distortion scenarios) simultaneously are developed, demonstrating that the optimized model by the proposed training strategy is effective in blindly assessing image quality in the laboratory and wild, outperforming previous BIZA methods by a large margin.
Abstract: Computational models for blind image quality assessment (BIQA) are typically trained in well-controlled laboratory environments with limited generalizability to realistically distorted images. Similarly, BIQA models optimized for images captured in the wild cannot adequately handle synthetically distorted images. To face the cross-distortion-scenario challenge, we develop a BIQA model and an approach of training it on multiple IQA databases (of different distortion scenarios) simultaneously. A key step in our approach is to create and combine image pairs within individual databases as the training set, which effectively bypasses the issue of perceptual scale realignment. We compute a continuous quality annotation for each pair from the corresponding human opinions, indicating the probability of one image having better perceptual quality. We train a deep neural network for BIQA over the training set of massive image pairs by minimizing the fidelity loss. Experiments on six IQA databases demonstrate that the optimized model by the proposed training strategy is effective in blindly assessing image quality in the laboratory and wild, outperforming previous BIQA methods by a large margin.

45 citations

Journal ArticleDOI
TL;DR: A cross-modal grounding module is designed, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities and further exploit the advantages of both these two learning schemes via adversarial learning.
Abstract: The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the expert demonstrations, i.e ., the shortest-path to the target location specified by associated language instructions. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Existing VLN approaches address this issue by sampling the next action from its predicted probability distribution during the training process. This allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the golden shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.

34 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images, and proposes a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set.
Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models ( $512\times 384$ ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

299 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes a self-adaptive hyper network architecture to blind assess image quality in the wild, which outperforms the state-of-the-art methods on challenging authentic image databases but also achieves competing performances on synthetic image databases, though it is not explicitly designed for the synthetic task.
Abstract: Blind image quality assessment (BIQA) for authentically distorted images has always been a challenging problem, since images captured in the wild include varies contents and diverse types of distortions. The vast majority of prior BIQA methods focus on how to predict synthetic image quality, but fail when applied to real-world distorted images. To deal with the challenge, we propose a self-adaptive hyper network architecture to blind assess image quality in the wild. We separate the IQA procedure into three stages including content understanding, perception rule learning and quality predicting. After extracting image semantics, perception rule is established adaptively by a hyper network, and then adopted by a quality prediction network. In our model, image quality can be estimated in a self-adaptive manner, thus generalizes well on diverse images captured in the wild. Experimental results verify that our approach not only outperforms the state-of-the-art methods on challenging authentic image databases but also achieves competing performances on synthetic image databases, though it is not explicitly designed for the synthetic task.

246 citations

01 Jan 2016
TL;DR: The handbook of image and video processing is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for reading handbook of image and video processing. As you may know, people have search numerous times for their favorite novels like this handbook of image and video processing, but end up in infectious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful bugs inside their laptop. handbook of image and video processing is available in our book collection an online access to it is set as public so you can download it instantly. Our digital library hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the handbook of image and video processing is universally compatible with any devices to read.

189 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Zhang et al. as mentioned in this paper proposed a no-reference IQA metric based on deep meta-learning, which learns the meta-knowledge shared by human when evaluating the quality of images with various distortions, which can then be adapted to unknown distortions easily.
Abstract: Recently, increasing interest has been drawn in exploiting deep convolutional neural networks (DCNNs) for no-reference image quality assessment (NR-IQA). Despite of the notable success achieved, there is a broad consensus that training DCNNs heavily relies on massive annotated data. Unfortunately, IQA is a typical small sample problem. Therefore, most of the existing DCNN-based IQA metrics operate based on pre-trained networks. However, these pre-trained networks are not designed for IQA task, leading to generalization problem when evaluating different types of distortions. With this motivation, this paper presents a no-reference IQA metric based on deep meta-learning. The underlying idea is to learn the meta-knowledge shared by human when evaluating the quality of images with various distortions, which can then be adapted to unknown distortions easily. Specifically, we first collect a number of NR-IQA tasks for different distortions. Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions. Finally, the quality prior model is fine-tuned on a target NR-IQA task for quickly obtaining the quality model. Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin. Furthermore, the meta-model learned from synthetic distortions can also be easily generalized to authentic distortions, which is highly desired in real-world applications of IQA metrics.

158 citations