scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2020"


Journal ArticleDOI
TL;DR: This work presents a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images, and proposes a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set.
Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models ( $512\times 384$ ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

299 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and proposes an effective content-aware global-local filtering module that significantly improves performance by considering not only global dependencies but also by dynamically exploiting neighboring pixel information.
Abstract: This paper tackles the problem of motion deblurring of dynamic scenes. Although end-to-end fully convolutional designs have recently advanced the state-of-the-art in non-uniform motion deblurring, their performance-complexity trade-off is still sub-optimal. Existing approaches achieve a large receptive field by increasing the number of generic convolution layers and kernel-size, but this comesat the expense of of the increase in model size and inference speed. In this work, we propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and process each test image adaptively. We also propose an effective content-aware global-local filtering module that significantly improves performance by considering not only global dependencies but also by dynamically exploiting neighboring pixel information. We use a patch-hierarchical attentive architecture composed of the above module that implicitly discovers the spatial variations in the blur present in the input image and in turn, performs local and global modulation of intermediate features. Extensive qualitative and quantitative comparisons with prior art on deblurring benchmarks demonstrate that our design offers significant improvements over the state-of-the-art in accuracy as well as speed.

201 citations


Journal ArticleDOI
TL;DR: A pure deep learning method for segmenting concrete cracks in images and shows that the SDDNet segments cracks effectively unless the features are too faint, which is 46 times faster than in a recent work.
Abstract: This article reports the development of a pure deep learning method for segmenting concrete cracks in images. The objectives are to achieve the real-time performance while effectively negating a wide range of various complex backgrounds and crack-like features. To achieve the goals, an original convolutional neural network is proposed. The model consists of standard convolutions, densely connected separable convolution modules, a modified atrous spatial pyramid pooling module, and a decoder module. The semantic damage detection network (SDDNet) is trained on a manually created crack dataset, and the trained network records the mean intersection-over-union of 0.846 on the test set. Each test image is analyzed, and the representative segmentation results are presented. The results show that the SDDNet segments cracks effectively unless the features are too faint. The proposed model is also compared with the most recent models, which show that it returns better evaluation metrics even though its number of parameters is 88 times less than in the compared models. In addition, the model processes in real-time (36 FPS) images at 1025 × 512 pixels, which is 46 times faster than in a recent work.

169 citations


Journal ArticleDOI
TL;DR: A novel “Noisy-As-Clean” (NAC) strategy of training self-supervised denoising networks, where the corrupted test image is directly taken as the “clean” target, while the inputs are synthetic images consisted of this corrupted image and a second yet similar corruption.
Abstract: Supervised deep networks have achieved promising performance on image denoising, by learning image priors and noise statistics on plenty pairs of noisy and clean images. Unsupervised denoising networks are trained with only noisy images. However, for an unseen corrupted image, both supervised and unsupervised networks ignore either its particular image prior, the noise statistics, or both. That is, the networks learned from external images inherently suffer from a domain gap problem: the image priors and noise statistics are very different between the training and test images. This problem becomes more clear when dealing with the signal dependent realistic noise. To circumvent this problem, in this work, we propose a novel “Noisy-As-Clean” (NAC) strategy of training self-supervised denoising networks. Specifically, the corrupted test image is directly taken as the “clean” target, while the inputs are synthetic images consisted of this corrupted image and a second yet similar corruption. A simple but useful observation on our NAC is: as long as the noise is weak, it is feasible to learn a self-supervised network only with the corrupted image, approximating the optimal parameters of a supervised network learned with pairs of noisy and clean images. Experiments on synthetic and realistic noise removal demonstrate that, the DnCNN and ResNet networks trained with our self-supervised NAC strategy achieve comparable or better performance than the original ones and previous supervised/unsupervised/self-supervised networks. The code is publicly available at https://github.com/csjunxu/Noisy-As-Clean .

109 citations


Journal ArticleDOI
28 Feb 2020-Entropy
TL;DR: This paper proposes a novel system that is computationally less expensive and provided a higher level of security in chaotic-based encryption schemes based on a shuffling process with fractals key along with three-dimensional Lorenz chaotic map.
Abstract: Chaos-based encryption schemes have attracted many researchers around the world in the digital image security domain. Digital images can be secured using existing chaotic maps, multiple chaotic maps, and several other hybrid dynamic systems that enhance the non-linearity of digital images. The combined property of confusion and diffusion was introduced by Claude Shannon which can be employed for digital image security. In this paper, we proposed a novel system that is computationally less expensive and provided a higher level of security. The system is based on a shuffling process with fractals key along with three-dimensional Lorenz chaotic map. The shuffling process added the confusion property and the pixels of the standard image is shuffled. Three-dimensional Lorenz chaotic map is used for a diffusion process which distorted all pixels of the image. In the statistical security test, means square error (MSE) evaluated error value was greater than the average value of 10000 for all standard images. The value of peak signal to noise (PSNR) was 7.69(dB) for the test image. Moreover, the calculated correlation coefficient values for each direction of the encrypted images was less than zero with a number of pixel change rate (NPCR) higher than 99%. During the security test, the entropy values were more than 7.9 for each grey channel which is almost equal to the ideal value of 8 for an 8-bit system. Numerous security tests and low computational complexity tests validate the security, robustness, and real-time implementation of the presented scheme.

89 citations


Proceedings ArticleDOI
06 Jul 2020
TL;DR: A novel three-branch convolution neural network, namely RRDNet (short for Robust Retinex Decomposition Network), is proposed to decompose the input image into three components, illumination, reflectance and noise.
Abstract: Underexposed images often suffer from serious quality degradation such as poor visibility and latent noise in the dark. Most previous methods for underexposed images restoration ignore the noise and amplify it during stretching contrast. We predict the noise explicitly to achieve the goal of denoising while restoring the underexposed image. Specifically, a novel three-branch convolution neural network, namely RRDNet (short for Robust Retinex Decomposition Network), is proposed to decompose the input image into three components, illumination, reflectance and noise. As an image-specific network, RRDNet doesn’t need any prior image examples or prior training. Instead, the weights of RRDNet will be updated by a zero-shot scheme of iteratively minimizing a specially designed loss function. Such a loss function is devised to evaluate the current decomposition of the test image and guide noise estimation. Experiments demonstrate that RRDNet can achieve robust correction with overall naturalness and pleasing visual quality. To make the results reproducible, the source code has been made publicly available at https://aaaaangel.github.io/RRDNet-Homepage.

88 citations


Book ChapterDOI
18 May 2020
TL;DR: A novel unsupervised domain adaptation (UDA) method, named Domain Adaptive Relational Reasoning (DARR), to generalize 3D multi-organ segmentation models to medical data collected from different scanners and/or protocols (domains).
Abstract: In this paper, we present a novel unsupervised domain adaptation (UDA) method, named Domain Adaptive Relational Reasoning (DARR), to generalize 3D multi-organ segmentation models to medical data collected from different scanners and/or protocols (domains). Our method is inspired by the fact that the spatial relationship between internal structures in medical images is relatively fixed, e.g., a spleen is always located at the tail of a pancreas, which serves as a latent variable to transfer the knowledge shared across multiple domains. We formulate the spatial relationship by solving a jigsaw puzzle task, i.e., recovering a CT scan from its shuffled patches, and jointly train it with the organ segmentation task. To guarantee the transferability of the learned spatial relationship to multiple domains, we additionally introduce two schemes: 1) Employing a super-resolution network also jointly trained with the segmentation model to standardize medical images from different domain to a certain spatial resolution; 2) Adapting the spatial relationship for a test image by test-time jigsaw puzzle training. Experimental results show that our method improves the performance by \(29.60\%\) DSC on target datasets on average without using any data from the target domain during training.

67 citations


Journal ArticleDOI
TL;DR: Extensive experimental results show that the proposed CNN based scheme outperforms some state-of-the-art methods not only in image splicing detection and localization performance, but also in robustness against JPEG compression.
Abstract: In this paper, a novel image splicing detection and localization scheme is proposed based on the local feature descriptor which is learned by deep convolutional neural network (CNN). A two-branch CNN, which serves as an expressive local descriptor is presented and applied to automatically learn hierarchical representations from the input RGB color or grayscale test images. The first layer of the proposed CNN model is used to suppress the effects of image contents and extract the diverse and expressive residual features, which is deliberately designed for image splicing detection applications. In specific, the kernels of the first convolutional layer are initialized with an optimized combination of the 30 linear high-pass filters used in calculation of residual maps in spatial rich model (SRM), and is fine-tuned through a constrained learning strategy to retain the high-pass filtering properties for the learned kernels. Both the contrastive loss and cross entropy loss are utilized to jointly improve the generalization ability of the proposed CNN model. With the block-wise dense features for a test image extracted by the pre-trained CNN-based local descriptor, an effective feature fusion strategy, known as block pooling, is adopted to obtain the final discriminative features for image splicing detection with SVM. Based on the pre-trained CNN model, an image splicing localization scheme is further developed by incorporating the fully connected conditional random field (CRF). Extensive experimental results on several public datasets show that the proposed CNN based scheme outperforms some state-of-the-art methods not only in image splicing detection and localization performance, but also in robustness against JPEG compression.

54 citations


Journal ArticleDOI
TL;DR: The authors' results showed varying performances in repeatability of MR radiomic features for GBM tumors due to test-retest and image registration, which have implications for appropriate usage in diagnostic and predictive models.
Abstract: Purpose To assess the repeatability of radiomic features in magnetic resonance (MR) imaging of glioblastoma (GBM) tumors with respect to test-retest, different image registration approaches and inhomogeneity bias field correction. Methods We analyzed MR images of 17 GBM patients including T1- and T2-weighted images (performed within the same imaging unit on two consecutive days). For image segmentation, we used a comprehensive segmentation approach including entire tumor, active area of tumor, necrotic regions in T1-weighted images, and edema regions in T2-weighted images (test studies only; registration to retest studies is discussed next). Analysis included N3, N4 as well as no bias correction performed on raw MR images. We evaluated 20 image registration approaches, generated by cross-combination of four transformation and five cost function methods. In total, 714 images (17 patients × 2 images × ((4 transformations × 5 cost functions) + 1 test image) and 2856 segmentations (714 images × 4 segmentations) were prepared for feature extraction. Various radiomic features were extracted, including the use of preprocessing filters, specifically wavelet (WAV) and Laplacian of Gaussian (LOG), as well as discretization into fixed bin width and fixed bin count (16, 32, 64, 128, and 256), Exponential, Gradient, Logarithm, Square and Square Root scales. Intraclass correlation coefficients (ICC) were calculated to assess the repeatability of MRI radiomic features (high repeatability defined as ICC ≥ 95%). Results In our ICC results, we observed high repeatability (ICC ≥ 95%) with respect to image preprocessing, different image registration algorithms, and test-retest analysis, for example: RLNU and GLNU from GLRLM, GLNU and DNU from GLDM, Coarseness and Busyness from NGTDM, GLNU and ZP from GLSZM, and Energy and RMS from first order. Highest fraction (percent) of repeatable features was observed, among registration techniques, for the method Full Affine transformation with 12 degrees of freedom using Mutual Information cost function (mean 32.4%), and among image processing methods, for the method Laplacian of Gaussian (LOG) with Sigma (2.5-4.5 mm) (mean 78.9%). The trends were relatively consistent for N4, N3, or no bias correction. Conclusion Our results showed varying performances in repeatability of MR radiomic features for GBM tumors due to test-retest and image registration. The findings have implications for appropriate usage in diagnostic and predictive models.

49 citations


Journal ArticleDOI
TL;DR: The proposed novel sub-pixel convolutional generative adversarial network (GAN) to learn compressed sensing reconstruction of images is superior to some state-of-the-art deep learning based and iterative optimization based algorithms, in terms of both time complexity and reconstruction quality.

44 citations


Posted Content
Seobin Park1, Jinsu Yoo1, Donghyeon Cho1, Jiwon Kim2, Tae Hyun Kim 
TL;DR: In this paper, a self-supervised super-resolution (SISR) method is proposed to exploit additional information given from the input image to improve the performance of SISR.
Abstract: Conventional supervised super-resolution (SR) approaches are trained with massive external SR datasets but fail to exploit desirable properties of the given test image. On the other hand, self-supervised SR approaches utilize the internal information within a test image but suffer from computational complexity in run-time. In this work, we observe the opportunity for further improvement of the performance of SISR without changing the architecture of conventional SR networks by practically exploiting additional information given from the input image. In the training stage, we train the network via meta-learning; thus, the network can quickly adapt to any input image at test time. Then, in the test stage, parameters of this meta-learned network are rapidly fine-tuned with only a few iterations by only using the given low-resolution image. The adaptation at the test time takes full advantage of patch-recurrence property observed in natural images. Our method effectively handles unknown SR kernels and can be applied to any existing model. We demonstrate that the proposed model-agnostic approach consistently improves the performance of conventional SR networks on various benchmark SR datasets.

Journal ArticleDOI
TL;DR: The proposed clustering and ranking stages lead to using only 11% of the whole database in classifying test images, which means more reduced computation complexity and more enhanced classification results are achieved compared to recent existing systems.
Abstract: Recently, deep learning techniques demonstrated efficiency in building better performing machine learning models which are required in the field of offline Arabic handwriting recognition. Our ancient civilizations presented valuable handwritten manuscripts that need to be documented digitally. If we compared between Latin and the isolated Arabic character recognition, the latter is much more challenging due to the similarity between characters, and the variability of the writing styles. This paper proposes a multi-stage cascading system to serve the field of offline Arabic handwriting recognition. The approach starts with applying the Hierarchical Agglomerative Clustering (HAC) technique to split the database into partially inter-related clusters. The inter-relations between the constructed clusters support representing the database as a big search tree model and help to attain a reduced complexity in matching each test image with a cluster. Cluster members are then ranked based on our new proposed ranking algorithm. This ranking algorithm starts with computing Pyramid Histogram of Oriented Gradients (PHoG), and is followed by measuring divergence by Kullback-Leibler method. Eventually, the classification process is applied only to the highly ranked matching classes. A comparative study is made to assess the effect of six different deep Convolution Neural Networks (DCNNs) on the final recognition rates of the proposed system. Experiments are done using the IFN/ENIT Arabic database. The proposed clustering and ranking stages lead to using only 11% of the whole database in classifying test images. Accordingly, more reduced computation complexity and more enhanced classification results are achieved compared to recent existing systems.

Book ChapterDOI
09 Jan 2020
TL;DR: The proposed model-agnostic approach consistently improves the performance of conventional SR networks on various benchmark SR datasets and effectively handles unknown SR kernels and can be applied to any existing model.
Abstract: Conventional supervised super-resolution (SR) approaches are trained with massive external SR datasets but fail to exploit desirable properties of the given test image. On the other hand, self-supervised SR approaches utilize the internal information within a test image but suffer from computational complexity in run-time. In this work, we observe the opportunity for further improvement of the performance of single-image super-resolution (SISR) without changing the architecture of conventional SR networks by practically exploiting additional information given from the input image. In the training stage, we train the network via meta-learning; thus, the network can quickly adapt to any input image at test time. Then, in the test stage, parameters of this meta-learned network are rapidly fine-tuned with only a few iterations by only using the given low-resolution image. The adaptation at the test time takes full advantage of patch-recurrence property observed in natural images. Our method effectively handles unknown SR kernels and can be applied to any existing model. We demonstrate that the proposed model-agnostic approach consistently improves the performance of conventional SR networks on various benchmark SR datasets.

Journal ArticleDOI
TL;DR: An objective quality model for MEF of dynamic scenes is developed that significantly outperforms the state-of-the-art and is demonstrated to have the promise of the proposed model in parameter tuning of MEF methods.
Abstract: A common approach to high dynamic range (HDR) imaging is to capture multiple images of different exposures followed by multi-exposure image fusion (MEF) in either radiance or intensity domain. A predominant problem of this approach is the introduction of the ghosting artifacts in dynamic scenes with camera and object motion. While many MEF methods (often referred to as deghosting algorithms) have been proposed for reduced ghosting artifacts and improved visual quality, little work has been dedicated to perceptual evaluation of their deghosting results. Here we first construct a database that contains 20 multi-exposure sequences of dynamic scenes and their corresponding fused images by nine MEF algorithms. We then carry out a subjective experiment to evaluate fused image quality, and find that none of existing objective quality models for MEF provides accurate quality predictions. Motivated by this, we develop an objective quality model for MEF of dynamic scenes. Specifically, we divide the test image into static and dynamic regions, measure structural similarity between the image and the corresponding sequence in the two regions separately, and combine quality measurements of the two regions into an overall quality score. Experimental results show that the proposed method significantly outperforms the state-of-the-art. In addition, we demonstrate the promise of the proposed model in parameter tuning of MEF methods. The subjective database and the MATLAB code of the proposed model are made publicly available at https://github.com/h4nwei/MEF-SSIMd .

Journal ArticleDOI
TL;DR: A novel universal face photo-sketch style transfer method that does not need any image from the source domain for training and flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way is presented.
Abstract: Face photo-sketch style transfer aims to convert a representation of a face from the photo (or sketch) domain to the sketch (respectively, photo) domain while preserving the character of the subject. It has wide-ranging applications in law enforcement, forensic investigation and digital entertainment. However, conventional face photo-sketch synthesis methods usually require training images from both the source domain and the target domain, and are limited in that they cannot be applied to universal conditions where collecting training images in the source domain that match the style of the test image is unpractical. This problem entails two major challenges: 1) designing an effective and robust domain translation model for the universal situation in which images of the source domain needed for training are unavailable, and 2) preserving the facial character while performing a transfer to the style of an entire image collection in the target domain. To this end, we present a novel universal face photo-sketch style transfer method that does not need any image from the source domain for training. The regression relationship between an input test image and the entire training image collection in the target domain is inferred via a deep domain translation framework, in which a domain-wise adaption term and a local consistency adaption term are developed. To improve the robustness of the style transfer process, we propose a multiview domain translation method that flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way. Qualitative and quantitative comparisons are provided for universal unconstrained conditions of unavailable training images from the source domain, demonstrating the effectiveness and superiority of our method for universal face photo-sketch style transfer.

Journal ArticleDOI
03 Apr 2020
TL;DR: A simple yet effective regression model, denoted by RestoreNet, which learns a class agnostic transformation on the image feature to move the image closer to the class center in the feature space is proposed.
Abstract: One-shot image classification aims to train image classifiers over the dataset with only one image per category. It is challenging for modern deep neural networks that typically require hundreds or thousands of images per class. In this paper, we adopt metric learning for this problem, which has been applied for few- and many-shot image classification by comparing the distance between the test image and the center of each class in the feature space. However, for one-shot learning, the existing metric learning approaches would suffer poor performance because the single training image may not be representative of the class. For example, if the image is far away from the class center in the feature space, the metric-learning based algorithms are unlikely to make correct predictions for the test images because the decision boundary is shifted by this noisy image. To address this issue, we propose a simple yet effective regression model, denoted by RestoreNet, which learns a class agnostic transformation on the image feature to move the image closer to the class center in the feature space. Experiments demonstrate that RestoreNet obtains superior performance over the state-of-the-art methods on a broad range of datasets. Moreover, RestoreNet can be easily combined with other methods to achieve further improvement.

Journal ArticleDOI
TL;DR: Empirical results show that SESRC & LDF achieves the highest recognition rates, outperforming many algorithms including some state-of-the-art ones, such as PLR, MDFR and OPR.

Journal ArticleDOI
TL;DR: A forgery detection technique is proposed which exploits the artifacts originated due to manipulations performed on JPEG encoded images which showcases better detection rates compared with the state-of-the-art methods.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method can detect various types of defects on the homogeneously textured surface, including miscellaneous defects, even for tiny defects and under the low contrast condition, and the precision, recall, and F-measure of the proposed algorithm are better than the state-of-the-art algorithms.
Abstract: Automatic vision-based defect detection on the steel surface is a challenging task due to miscellaneous patterns of defects, low contrast between the defect and the background, and so on. Image-decomposition-based method can analyze the structure and texture to inspect the defective objects. Currently, the state of the art of image decomposition-based defect detection methods is the one guided by a given fixed template. However, a fixed template cannot be suitable for all situations. In this article, a new self-reference template-guided image decomposition algorithm for strip steel surface defect detection is developed. Combined with the statistical characteristics of a large number of defect-free images, a specific template can be built for each test defect image. Then, a total variation (TV)-based image decomposition algorithm guided by the self-reference template is developed to decompose the test image into the structural component and textural component. Moreover, the decomposition is optimized by developing a new index–gradient similarity to measure the similarity between the self-reference template and decomposed textural component. Experimental results show that the proposed method can detect various types of defects on the homogeneously textured surface, including miscellaneous defects, even for tiny defects and under the low contrast condition, and the precision, recall, and F-measure of the proposed algorithm are better than the state-of-the-art algorithms.

Posted Content
TL;DR: The thorough experiments verify that DFNet is able to capture and mine the underlying relations of images and discover the common foreground objects and achieve much more efficient performance in the inference phase.
Abstract: In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task. To capture the inherent correlation among video frames, we learn discriminative features (D-features) from the input images that reveal feature distribution from a global perspective. The D-features are then used to establish correspondence with all features of test image under conditional random field (CRF) formulation, which is leveraged to enforce consistency between pixels. The experiments verify that DFNet outperforms state-of-the-art methods by a large margin with a mean IoU score of 83.4% and ranks first on the DAVIS-2016 leaderboard while using much fewer parameters and achieving much more efficient performance in the inference phase. We further evaluate DFNet on the FBMS dataset and the video saliency dataset ViSal, reaching a new state-of-the-art. To further demonstrate the generalizability of our framework, DFNet is also applied to the image object co-segmentation task. We perform experiments on a challenging dataset PASCAL-VOC and observe the superiority of DFNet. The thorough experiments verify that DFNet is able to capture and mine the underlying relations of images and discover the common foreground objects.

Journal ArticleDOI
TL;DR: Effectiveness of the proposed method is evaluated on different car datasets stressing various imaging conditions and the obtained results show that the method achieves significant improvements compared to published methods.
Abstract: For intelligent traffic monitoring systems and related applications, detecting vehicles on roads is a vital step. However, robust and efficient vehicles detection is still a challenging problem due to variations in the appearance of the vehicles and complicated background of the roads. In this paper, we propose a simple and effective vehicle detection method based on local vehicle's texture and appearance histograms feed into clustering forests. The interdependency of vehicle's parts locations is incorporating within a clustering forests framework. Local binary pattern-like descriptors are utilized for texture feature extraction. Through utilizing the LBP descriptors, the local structures of vehicles, such as edge, contour and flat region can be effectively depicted. The align set of histograms generated concurrence with LBPs spatial for random sampled local regions are used to measure the dissimilarity between regions of all training images. Evaluating the fit between histograms is built in clustering forests. That is, clustering discriminative codebooks of latent features are used to search between different LBP features of the random regions utilizing the Chi-square dissimilarity measure. Besides, saliency maps built by the learnt latent features are adopted to determine the vehicles locations in test image. Effectiveness of the proposed method is evaluated on different car datasets stressing various imaging conditions and the obtained results show that the method achieves significant improvements compared to published methods.

Book ChapterDOI
01 Jan 2020
TL;DR: Experimental results indicated that the PFCM supported hybrid model is far more superior in segmenting medical images than traditional image segmentation methods.
Abstract: Segmentation is considered as one of the challenging and important processes in the field of digital image processing, and there are numerous applications like medical image analysis and satellite data processing where the digital image processing can beneficial. Various algorithms have been developed by many researchers to analyze different medical images like MRI and X-rays. Nuclear image analysis and interpretation also a promising research topic in medical image analysis. For example, positron emission tomography (PET) image can be used to accurately localize disease to help doctors in providing the right treatment and saving valuable time. In recent years, there is a significant advancement in the biomedical imaging domain with the increasing accessibility of computational power as well as automated systems; medical image analysis has become one of the most interesting research areas. Microscopic image analysis is also valuable in the domain of medical imaging as well as medicine. For example, different cell determination, identification, and counting are an important and almost unavoidable step that assists to diagnose some precise diseases. Many computer vision and digital image analysis instances require a basic segmentation phase to find different objects or separate the test image into distinct segments, which can be treated homogeneous depending upon a given property, such as color and texture. Region growing and fuzzy C-Means are two efficient and popular segmentation techniques. In this chapter, we have proposed a hybrid scheme for image segmentation using fuzzy C-Means clustering, region growing method, and thresholding. The fuzzy C-Means (FCM) clustering is used as a preprocessing step. It helps to process the image more accurately in further stages. To eliminate the noise sensitivity property of fuzzy C-Means (FCM), we have used the PFCM, i.e., penalized FCM clustering algorithm. In this work, to find the appropriate region, we have used the region growing segmentation technique coupled with the thresholding. In the proposed work, a similarity feature depending on pixel intensity is used. The threshold value can be calculated using different techniques, such as iterative approach, Otsu’s technique, local thresholding, and manual selection, to determine the optimal threshold. The results are obtained and derived by applying the proposed method on different images obtained from publicly available benchmark datasets of human brain MRI images. Experimental results indicated that the PFCM supported hybrid model is far more superior in segmenting medical images than traditional image segmentation methods.

Journal ArticleDOI
TL;DR: The results indicate that DGANet-ISE outperforms the other 14 methods in the remote sensing image SR, and the cross-database test results demonstrate that the method exhibits satisfactory generalization performance in adapting to new data.
Abstract: Super-resolution (SR) is able to improve the spatial resolution of remote sensing images, which is critical for many practical applications such as fine urban monitoring. In this paper, a new single-image SR method, deep gradient-aware network with image-specific enhancement (DGANet-ISE) was proposed to improve the spatial resolution of remote sensing images. First, DGANet was proposed to model the complex relationship between low- and high-resolution images. A new gradient-aware loss was designed in the training phase to preserve more gradient details in super-resolved remote sensing images. Then, the ISE approach was proposed in the testing phase to further improve the SR performance. By using the specific features of each test image, ISE can further boost the generalization capability and adaptability of our method on inexperienced datasets. Finally, three datasets were used to verify the effectiveness of our method. The results indicate that DGANet-ISE outperforms the other 14 methods in the remote sensing image SR, and the cross-database test results demonstrate that our method exhibits satisfactory generalization performance in adapting to new data.

Journal ArticleDOI
TL;DR: A hybrid deep learning-Gaussian process (DL-GP) network is proposed to segment a scene image into lane and background regions, which combines a compact convolutional encoder–decoder net and a powerful nonparametric hierarchical GP classifier.
Abstract: Pedestrian lane detection is an important task in many assistive and autonomous navigation systems. This article presents a new approach for pedestrian lane detection in unstructured environments, where the pedestrian lanes can have arbitrary surfaces with no painted markers. In this approach, a hybrid deep learning-Gaussian process (DL-GP) network is proposed to segment a scene image into lane and background regions. The network combines a compact convolutional encoder–decoder net and a powerful nonparametric hierarchical GP classifier. The resulting network with a smaller number of trainable parameters helps mitigate the overfitting problem while maintaining the modeling power. In addition to the segmentation output for each test image, the network also generates a map of uncertainty—a measure that is negatively correlated with the confidence level with which we can trust the segmentation. This measure is important for pedestrian lane-detection applications, since its prediction affects the safety of its users. We also introduce a new data set of 5000 images for training and evaluating the pedestrian lane-detection algorithms. This data set is expected to facilitate research in pedestrian lane detection, especially the application of DL in this area. Evaluated on this data set, the proposed network shows significant performance improvements compared with several existing methods.

Journal ArticleDOI
TL;DR: A new layer for CNNs that increases their robustness to several types of corruptions of the input images, called a ‘push–pull’ layer, which compute its response as the combination of two half-wave rectified convolutions, with kernels of different size and opposite polarity.
Abstract: Convolutional neural networks (CNNs) lack robustness to test image corruptions that are not seen during training. In this paper, we propose a new layer for CNNs that increases their robustness to several types of corruptions of the input images. We call it a ‘push–pull’ layer and compute its response as the combination of two half-wave rectified convolutions, with kernels of different size and opposite polarity. Its implementation is based on a biologically motivated model of certain neurons in the visual system that exhibit response suppression, known as push–pull inhibition. We validate our method by replacing the first convolutional layer of the LeNet, ResNet and DenseNet architectures with our push–pull layer. We train the networks on original training images from the MNIST and CIFAR data sets and test them on images with several corruptions, of different types and severities, that are unseen by the training process. We experiment with various configurations of the ResNet and DenseNet models on a benchmark test set with typical image corruptions constructed on the CIFAR test images. We demonstrate that our push–pull layer contributes to a considerable improvement in robustness of classification of corrupted images, while maintaining state-of-the-art performance on the original image classification task. We released the code and trained models at the url http://github.com/nicstrisc/Push-Pull-CNN-layer .

Posted Content
TL;DR: This work improves interpretability by automatically enhancing prototypes with extra information about visual characteristics considered important by the model, and quantifies the influence of color hue, shape, texture, contrast and saturation in a prototype.
Abstract: Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learned by the classification model. Hence, only visualising prototypes can be insufficient for a user to understand what a prototype exactly represents, and why the model considers a prototype and an image to be similar. We address this ambiguity and argue that prototypes should be explained. We improve interpretability by automatically enhancing visual prototypes with textual quantitative information about visual characteristics deemed important by the classification model. Specifically, our method clarifies the meaning of a prototype by quantifying the influence of colour hue, shape, texture, contrast and saturation and can generate both global and local explanations. Because of the generality of our approach, it can improve the interpretability of any similarity-based method for prototypical image recognition. In our experiments, we apply our method to the existing Prototypical Part Network (ProtoPNet). Our analysis confirms that the global explanations are generalisable, and often correspond to the visually perceptible properties of a prototype. Our explanations are especially relevant for prototypes which might have been interpreted incorrectly otherwise. By explaining such 'misleading' prototypes, we improve the interpretability and simulatability of a prototype-based classification model. We also use our method to check whether visually similar prototypes have similar explanations, and are able to discover redundancy. Code is available at this https URL .

Journal ArticleDOI
TL;DR: A model referred as weight-KNN is presented which firstly introduces the CNN feature to address the problem that traditional models only work well with well-designed manual feature representations and incorporates a multi-label linear discriminant approach to compute the weighting which improves the accuracy in the subsequent procedures of distance calculation.
Abstract: Automatic image annotation becomes a hot research area because of its efficiency on shrinking the semantic gap between images and their semantic meanings. We present a model referred as weight-KNN which firstly introduces the CNN feature to address the problem that traditional models only work well with well-designed manual feature representations. Additionally, in order to employ the simplicity and generality of the KNN-based model for annotation, the proposed model incorporates a multi-label linear discriminant approach to compute the weighting which improves the accuracy in the subsequent procedures of distance calculation. Moreover, we take the advantage of the KNN-based model to acquire the test image’s k-nearest neighbors in each label category and get the prediction of the image according to the contribution of its neighbors. At last, the experiments are performed on three typical image data sets, corel 5k, esp game and laprtc12, which verify the effectiveness of the proposed model.

Journal ArticleDOI
TL;DR: Extensive simulations conducted on two commonly-used test datasets have clearly demonstrated that the proposed PCR framework is able to constantly boost the performance of any image demosaicing method, in terms of objective and subjective performance evaluations.
Abstract: In this paper, a progressive collaborative representation (PCR) framework is proposed that is able to incorporate any existing color image demosaicing method for further boosting its demosaicing performance. Our PCR consists of two phases: (i) offline training and (ii) online refinement. In phase (i), multiple training-and-refining stages will be performed. In each stage, a new dictionary will be established through the learning of a large number of feature-patch pairs, extracted from the demosaicked images of the current stage and their corresponding original full-color images. After training, a projection matrix will be generated and exploited to refine the current demosaicked image. The updated image with improved image quality will be used as the input for the next training-and-refining stage and performed the same processing likewise. At the end of phase (i), all the projection matrices generated as above-mentioned will be exploited in phase (ii) to conduct online demosaicked image refinement of the test image. Extensive simulations conducted on two commonly-used test datasets (i.e., IMAX and Kodak) for evaluating the demosaicing algorithms have clearly demonstrated that our proposed PCR framework is able to constantly boost the performance of any image demosaicing method we experimented, in terms of objective and subjective performance evaluations.

Posted Content
TL;DR: This paper proposes a novel self domain adapted network (SDA-Net) that can rapidly adapt itself to a single test subject at the testing stage, without using extra data or training a UDA model.
Abstract: Domain shift is a major problem for deploying deep networks in clinical practice. Network performance drops significantly with (target) images obtained differently than its (source) training data. Due to a lack of target label data, most work has focused on unsupervised domain adaptation (UDA). Current UDA methods need both source and target data to train models which perform image translation (harmonization) or learn domain-invariant features. However, training a model for each target domain is time consuming and computationally expensive, even infeasible when target domain data are scarce or source data are unavailable due to data privacy. In this paper, we propose a novel self domain adapted network (SDA-Net) that can rapidly adapt itself to a single test subject at the testing stage, without using extra data or training a UDA model. The SDA-Net consists of three parts: adaptors, task model, and auto-encoders. The latter two are pre-trained offline on labeled source images. The task model performs tasks like synthesis, segmentation, or classification, which may suffer from the domain shift problem. At the testing stage, the adaptors are trained to transform the input test image and features to reduce the domain shift as measured by the auto-encoders, and thus perform domain adaptation. We validated our method on retinal layer segmentation from different OCT scanners and T1 to T2 synthesis with T1 from different MRI scanners and with different imaging parameters. Results show that our SDA-Net, with a single test subject and a short amount of time for self adaptation at the testing stage, can achieve significant improvements.

Journal ArticleDOI
16 Nov 2020
TL;DR: This work focuses on early detection and gradation of Knee Osteoarthritis utilizing Hu's invariant moments to understand the geometric transformation of the cartilage region in Knee X-ray images.
Abstract: Significant information extraction from the images that are geometrically distorted or transformed is mainstream procedure in image processing. It becomes difficult to retrieve the relevant region when the images get distorted by some geometric deformation. Hu's moments are helpful in extracting information from such distorted images due to their unique invariance property. This work focuses on early detection and gradation of Knee Osteoarthritis utilizing Hu's invariant moments to understand the geometric transformation of the cartilage region in Knee X-ray images. The seven invariant moments are computed for the rotated version of the test image. The results demonstrated are found to be more competitive and promising, which are validated by ortho surgeons and rheumatologists.