scispace - formally typeset
Search or ask a question
Journal ArticleDOI

End-to-End Blind Image Quality Assessment Using Deep Neural Networks

01 Mar 2018-IEEE Transactions on Image Processing (IEEE)-Vol. 27, Iss: 3, pp 1202-1213
TL;DR: This work demonstrates the strong competitiveness of MEON against state-of-the-art BIQA models using the group maximum differentiation competition methodology and empirically demonstrates that GDN is effective at reducing model parameters/layers while achieving similar quality prediction performance.
Abstract: We propose a multi-task end-to-end optimized deep neural network (MEON) for blind image quality assessment (BIQA). MEON consists of two sub-networks—a distortion identification network and a quality prediction network—sharing the early layers. Unlike traditional methods used for training multi-task networks, our training process is performed in two steps. In the first step, we train a distortion type identification sub-network, for which large-scale training samples are readily available. In the second step, starting from the pre-trained early layers and the outputs of the first sub-network, we train a quality prediction sub-network using a variant of the stochastic gradient descent method. Different from most deep neural networks, we choose biologically inspired generalized divisive normalization (GDN) instead of rectified linear unit as the activation function. We empirically demonstrate that GDN is effective at reducing model parameters/layers while achieving similar quality prediction performance. With modest model complexity, the proposed MEON index achieves state-of-the-art performance on four publicly available benchmarks. Moreover, we demonstrate the strong competitiveness of MEON against state-of-the-art BIQA models using the group maximum differentiation competition methodology.
Citations
More filters
Journal ArticleDOI
TL;DR: A deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images and achieves state-of-the-art performance on both synthetic and authentic IQA databases is proposed.
Abstract: We propose a deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images. Our model constitutes two streams of deep convolutional neural networks (CNNs), specializing in two distortion scenarios separately. For synthetic distortions, we first pre-train a CNN to classify the distortion type and the level of an input image, whose ground truth label is readily available at a large scale. For authentic distortions, we make use of a pre-train CNN (VGG-16) for the image classification task. The two feature sets are bilinearly pooled into one representation for a final quality prediction. We fine-tune the whole network on the target databases using a variant of stochastic gradient descent. The extensive experimental results show that the proposed model achieves state-of-the-art performance on both synthetic and authentic IQA databases. Furthermore, we verify the generalizability of our method on the large-scale Waterloo Exploration Database, and demonstrate its competitiveness using the group maximum differentiation competition methodology.

390 citations


Cites background or methods or result from "End-to-End Blind Image Quality Asse..."

  • ...Until recently, there has been limited effort towards end-to-end optimized BIQA using deep convolutional neural networks (CNN) [10], [11], primarily due to the lack of sufficient ground truth labels such as the mean opinion scores (MOS) for training....

    [...]

  • ...gMAD competition results between DB-CNN and MEON [11]....

    [...]

  • ...For synthetic distortions, inspired by previous works [11], [18], [20], we construct a large-scale pre-training set based on the Waterloo Exploration Database [19] and PASCAL VOC 2012 [21], where the images are synthesized with nine distortion types and...

    [...]

  • ...Other methods [11], [18] take advantage of the known synthetic degradation processes (e....

    [...]

  • ...Specifically, DB-CNN fails to disprove MEON [11] in (a), which reveal its weakness in...

    [...]

Journal ArticleDOI
TL;DR: This work presents a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images, and proposes a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set.
Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models ( $512\times 384$ ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

299 citations


Additional excerpts

  • ...[41] proposed the MEON...

    [...]

Journal ArticleDOI
TL;DR: This survey provides a general overview of classical algorithms and recent progresses in the field of perceptual image quality assessment and describes the performances of the state-of-the-art quality measures for visual signals.
Abstract: Perceptual quality assessmentplays a vital role in the visual communication systems owing to theexistence of quality degradations introduced in various stages of visual signalacquisition, compression, transmission and display.Quality assessment for visual signals can be performed subjectively andobjectively, and objective quality assessment is usually preferred owing to itshigh efficiency and easy deployment. A large number of subjective andobjective visual quality assessment studies have been conducted during recent years.In this survey, we give an up-to-date and comprehensivereview of these studies.Specifically, the frequently used subjective image quality assessment databases are firstreviewed, as they serve as the validation set for the objective measures.Second, the objective image quality assessment measures are classified and reviewed according to the applications and the methodologies utilized in the quality measures.Third, the performances of the state-of-the-artquality measures for visual signals are compared with an introduction of theevaluation protocols.This survey provides a general overview of classical algorithms andrecent progresses in the field of perceptual image quality assessment.

281 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes a self-adaptive hyper network architecture to blind assess image quality in the wild, which outperforms the state-of-the-art methods on challenging authentic image databases but also achieves competing performances on synthetic image databases, though it is not explicitly designed for the synthetic task.
Abstract: Blind image quality assessment (BIQA) for authentically distorted images has always been a challenging problem, since images captured in the wild include varies contents and diverse types of distortions. The vast majority of prior BIQA methods focus on how to predict synthetic image quality, but fail when applied to real-world distorted images. To deal with the challenge, we propose a self-adaptive hyper network architecture to blind assess image quality in the wild. We separate the IQA procedure into three stages including content understanding, perception rule learning and quality predicting. After extracting image semantics, perception rule is established adaptively by a hyper network, and then adopted by a quality prediction network. In our model, image quality can be estimated in a self-adaptive manner, thus generalizes well on diverse images captured in the wild. Experimental results verify that our approach not only outperforms the state-of-the-art methods on challenging authentic image databases but also achieves competing performances on synthetic image databases, though it is not explicitly designed for the synthetic task.

246 citations


Cites background from "End-to-End Blind Image Quality Asse..."

  • ...[27] proposed a deeper network to learn distortion type and image quality simultaneously....

    [...]

Journal ArticleDOI
TL;DR: A CNN-based NR-IQA framework that can effectively solve the challenge of applying a deep CNN to no-reference image quality assessment and a way to visualize perceptual error maps to analyze what was learned by the deep CNN model is proposed.
Abstract: Image recognition based on convolutional neural networks (CNNs) has recently been shown to deliver the state-of-the-art performance in various areas of computer vision and image processing. Nevertheless, applying a deep CNN to no-reference image quality assessment (NR-IQA) remains a challenging task due to critical obstacles, i.e., the lack of a training database. In this paper, we propose a CNN-based NR-IQA framework that can effectively solve this problem. The proposed method—deep image quality assessor (DIQA)—separates the training of NR-IQA into two stages: 1) an objective distortion part and 2) a human visual system-related part. In the first stage, the CNN learns to predict the objective error map, and then the model learns to predict subjective score in the second stage. To complement the inaccuracy of the objective error map prediction on the homogeneous region, we also propose a reliability map. Two simple handcrafted features were additionally employed to further enhance the accuracy. In addition, we propose a way to visualize perceptual error maps to analyze what was learned by the deep CNN model. In the experiments, the DIQA yielded the state-of-the-art accuracy on the various databases.

225 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations


"End-to-End Blind Image Quality Asse..." refers background or methods in this paper

  • ...mization steps adopt the Adam optimization algorithm [51] with a mini-batch of 40....

    [...]

  • ...Other parameters in Adam are set by default [51]....

    [...]

Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"End-to-End Blind Image Quality Asse..." refers background in this paper

  • ...[17] significantly increased the depth of DNN by stacking ten convolutional and two fully connected layers, whose architecture was inspired by the VGG16 network [10] for image classification....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations