scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2021"


Book ChapterDOI
27 Sep 2021
TL;DR: In this paper, a network based on self-attention between neighboring patches and without any convolution operations was proposed to achieve better segmentation performance than a traditional CNN model for medical image segmentation.
Abstract: Like other applications in computer vision, medical image segmentation and his email address have been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. However, the convolution operation also has important shortcomings: it performs a fixed operation on every test image regardless of the content and it cannot efficiently model long-range interactions. In this work we show that a network based on self-attention between neighboring patches and without any convolution operations can achieve better results. Given a 3D image block, our network divides it into \(n^3\) 3D patches, where \(n=3 \text { or } 5\) and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch embeddings. We show that the proposed model can achieve higher segmentation accuracies than a state of the art CNN. For scenarios with very few labeled images, we propose methods for pre-training the network on large corpora of unlabeled images. Our experiments show that with pre-training the advantage of our proposed network over CNNs can be significant when labeled training data is small.

69 citations


Journal ArticleDOI
TL;DR: In this article, a concatenation of two sub-networks, a relatively shallow image normalization network and a deep CNN segmentation network, is proposed for medical image segmentation.

66 citations


Posted Content
TL;DR: A method for estimating neural scenes representations of objects given only a single image based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code.
Abstract: We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object's 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.

53 citations


Journal ArticleDOI
TL;DR: In this article, a negative-positive prototypical part network (NP-ProtoPNet) is proposed to imitate human reasoning for image recognition while comparing the parts of a test image with the corresponding parts of the images from known classes.
Abstract: Interpretation of the reasoning process of a prediction made by a deep learning model is always desired. However, when it comes to the predictions of a deep learning model that directly impacts on the lives of people then the interpretation becomes a necessity. In this paper, we introduce a deep learning model: negative-positive prototypical part network (NP-ProtoPNet). This model attempts to imitate human reasoning for image recognition while comparing the parts of a test image with the corresponding parts of the images from known classes. We demonstrate our model on the dataset of chest $X$ -ray images of Covid-19 patients, pneumonia patients and normal people. The accuracy and precision that our model receives is on par with the best performing non-interpretable deep learning models.

41 citations


Proceedings ArticleDOI
Zhixiang Chi1, Yang Wang1, Yuanhao Yu1, Jin Tang1
01 Jun 2021
TL;DR: Li et al. as mentioned in this paper proposed a self-supervised meta-auxiliary learning to improve the performance of deblurring by integrating both external and internal learning, which is able to exploit the internal information at test time via the auxiliary task to enhance the performance.
Abstract: In this paper, we tackle the problem of dynamic scene deblurring. Most existing deep end-to-end learning approaches adopt the same generic model for all unseen test images. These solutions are sub-optimal, as they fail to utilize the internal information within a specific image. On the other hand, a self-supervised approach, SelfDeblur, enables internal training within a test image from scratch, but it does not fully take advantage of large external datasets. In this work, we propose a novel self-supervised meta-auxiliary learning to improve the performance of deblurring by integrating both external and internal learning. Concretely, we build a self-supervised auxiliary reconstruction task that shares a portion of the network with the primary deblurring task. The two tasks are jointly trained on an external dataset. Furthermore, we propose a meta-auxiliary training scheme to further optimize the pretrained model as a base learner, which is applicable for fast adaptation at test time. During training, the performance of both tasks is coupled. Therefore, we are able to exploit the internal information at test time via the auxiliary task to enhance the performance of deblurring. Extensive experimental results across evaluation datasets demonstrate the effectiveness of test-time adaptation of the proposed method.

30 citations


Proceedings ArticleDOI
01 Jan 2021
TL;DR: DualSR as discussed by the authors proposes a dual-path architecture that learns an image-specific low-to-high resolution mapping using only patches of the input test image, where a downsampler learns the degradation process using a generative adversarial network, and an up-sampler learns to super-resolve that specific image.
Abstract: Advanced methods for single image super-resolution (SISR) based upon Deep learning have demonstrated a remarkable reconstruction performance on downscaled images. However, for real-world low-resolution images (e.g. images captured straight from the camera) they often generate blurry images and highlight unpleasant artifacts. The main reason is the training data that does not reflect the real-world super-resolution problem. They train the net-work using images downsampled with an ideal (usually bicubic) kernel. However, for real-world images the degradation process is more complex and can vary from image to image. This paper proposes a new dual-path architecture (DualSR) that learns an image-specific low-to-high resolution mapping using only patches of the input test image. For every image, a downsampler learns the degradation process using a generative adversarial network, and an up-sampler learns to super-resolve that specific image. In the DualSR architecture, the upsampler and downsampler are trained simultaneously and they improve each other using cycle consistency losses. For better visual quality and eliminating undesired artifacts, the upsampler is constrained by a masked interpolation loss. On standard benchmarks with unknown degradation kernels, DualSR outperforms recent blind and non-blind super-resolution methods in term of SSIM and generates images with higher perceptual quality. On real-world LR images it generates visually pleasing and artifact-free results.

29 citations


Posted Content
TL;DR: This paper harnesses the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images and achieves for the first time, to the best of the authors' knowledge, facial texture reconstruction with high-frequency details.
Abstract: A lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the recent works, the texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction is still not capable of modeling facial texture with high-frequency details. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful facial texture prior \edit{from a large-scale 3D texture dataset}. Then, we revisit the original 3D Morphable Models (3DMMs) fitting making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. In order to be robust towards initialisation and expedite the fitting process, we propose a novel self-supervised regression based approach. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

27 citations


Journal ArticleDOI
TL;DR: In this article, a self-domain-adapted network (SDA-Net) is proposed, which consists of three parts, which are all neural networks: a task model, which performs the image analysis task like segmentation; a set of autoencoders, and adaptors, which transform the test image and its features to minimize the domain shift.

24 citations


Proceedings ArticleDOI
17 May 2021
TL;DR: In this paper, the authors combine YOLO (You Only Look Once) algorithm with the VGG16 pre-trained convolutional neural network to propose an improvement for face detection systems.
Abstract: Face detection is not only one of the most studied topics in the computer vision field but also a very important task in many applications, such as security access control systems, video surveillance, human-computer interface, and image database management. Nowadays, various methods were developed for face detection systems like Viola-Jones, RCNN, SSD, and so on. Many researchers are still trying to improve face detection systems with various illustrations, poses, skin colors, and real-time detection. This paper intends to combine YOLO (You Only Look Once) algorithm with the VGG16 pre-trained convolutional neural network to propose an improvement for face detection systems. Experimental results show that proposed method has detected the test image set with over 95 % of average precision. Also, our proposed method considerably increased face detection speed in real-time live video. The experiment of this work was using the Image Processing Toolbox and the Deep Learning Toolbox in MATLAB.

20 citations


Journal ArticleDOI
TL;DR: Convolutional neural network architecture consisting of five types of layers, convolutional layer, an activation layer, Pooling layer and Fully connected layer followed by a Softmax layer which gives the probability of the output for every genre is proposed.
Abstract: Computer-aided diagnosis and design in the medical province is an exciting domain owing to drastic growth in Medical images. Earlier handcraft feature learning techniques failed to achieve the targeted result in practical aspects. In this paper, we have adopted a deep learning artifice to reduce the semantic gap which exists between the low-level information captured by imaging devices and high-level information preserved by a human. The proposed work has twofold: first, we propose convolutional neural network architecture consisting of five types of layers, convolutional layer, an activation layer, Pooling layer and Fully connected layer followed by a Softmax layer which gives the probability of the output for every genre. The second contribution towards this paper is to find the solution of an unsolved problem in medical image analysis: “Uses of a pretrained model with adequate fine-tuning to eliminate the extra effort of making a new CNN architecture from scratch”. To address this puzzle, we employed a pretrained VGG-16 model (a famous CNN architecture trained on Image Net dataset) to train the same dataset. Grad-CAM is used for visualizing the model performance with respect to a test image. The proposed methods are evaluated on famous publicly available NIH dataset called Chest X-Ray 14 and have created a new benchmark performance that has achieved state-of-the-art results 83.671% (scratch CNN) and 97.81% (transfer learning), which are much higher as compared to the other methods. Moreover, we also introduce in-depth comparison with the current existing works.

18 citations


Journal ArticleDOI
TL;DR: In this paper, an ensemble of improved convolutional neural networks combined with a test-time regularly spaced shifting technique was proposed for skin lesion classification, which showed a significant improvement on the well-known HAM10000 dataset in terms of accuracy and F-score.
Abstract: Skin lesions are caused due to multiple factors, like allergies, infections, exposition to the sun, etc. These skin diseases have become a challenge in medical diagnosis due to visual similarities, where image classification is an essential task to achieve an adequate diagnostic of different lesions. Melanoma is one of the best-known types of skin lesions due to the vast majority of skin cancer deaths. In this work, we propose an ensemble of improved convolutional neural networks combined with a test-time regularly spaced shifting technique for skin lesion classification. The shifting technique builds several versions of the test input image, which are shifted by displacement vectors that lie on a regular lattice in the plane of possible shifts. These shifted versions of the test image are subsequently passed on to each of the classifiers of an ensemble. Finally, all the outputs from the classifiers are combined to yield the final result. Experiment results show a significant improvement on the well-known HAM10000 dataset in terms of accuracy and F-score. In particular, it is demonstrated that our combination of ensembles with test-time regularly spaced shifting yields better performance than any of the two methods when applied alone.

Journal ArticleDOI
TL;DR: The potential vulnerability of pre-trained convolutional neural network algorithms to the FGSM attack in terms of two frequently used models, VGG16 and Inception-v3 is explored, showing that correct class probability of any test image can drop for both considered models and with increased perturbation.
Abstract: The COVID-19 pandemic requires the rapid isolation of infected patients. Thus, high-sensitivity radiology images could be a key technique to diagnose patients besides the polymerase chain reaction approach. Deep learning algorithms are proposed in several studies to detect COVID-19 symptoms due to the success in chest radiography image classification, cost efficiency, lack of expert radiologists, and the need for faster processing in the pandemic area. Most of the promising algorithms proposed in different studies are based on pre-trained deep learning models. Such open-source models and lack of variation in the radiology image-capturing environment make the diagnosis system vulnerable to adversarial attacks such as fast gradient sign method (FGSM) attack. This study therefore explored the potential vulnerability of pre-trained convolutional neural network algorithms to the FGSM attack in terms of two frequently used models, VGG16 and Inception-v3. Firstly, we developed two transfer learning models for X-ray and CT image-based COVID-19 classification and analyzed the performance extensively in terms of accuracy, precision, recall, and AUC. Secondly, our study illustrates that misclassification can occur with a very minor perturbation magnitude, such as 0.009 and 0.003 for the FGSM attack in these models for X-ray and CT images, respectively, without any effect on the visual perceptibility of the perturbation. In addition, we demonstrated that successful FGSM attack can decrease the classification performance to 16.67% and 55.56% for X-ray images, as well as 36% and 40% in the case of CT images for VGG16 and Inception-v3, respectively, without any human-recognizable perturbation effects in the adversarial images. Finally, we analyzed that correct class probability of any test image which is supposed to be 1, can drop for both considered models and with increased perturbation; it can drop to 0.24 and 0.17 for the VGG16 model in cases of X-ray and CT images, respectively. Thus, despite the need for data sharing and automated diagnosis, practical deployment of such program requires more robustness.

Journal ArticleDOI
TL;DR: It is shown that it is possible to segment experimental materials science data using a SegNet-based CNN that was trained only using simple phase field simulations, and the CNN trained on phase field images segmented the experimental test image with 99% accuracy.

Journal ArticleDOI
TL;DR: In this article, a no-reference image quality assessment method is proposed using a set of quality-aware features which globally characterizes the statistics of a given test image, such as extended local fractal dimension distribution feature, extended first digit distribution features using different domains, Bilaplacian features, image moments, and a wide variety of perceptual features.
Abstract: The perceptual quality of digital images is often deteriorated during storage, compression, and transmission The most reliable way of assessing image quality is to ask people to provide their opinions on a number of test images However, this is an expensive and time-consuming process which cannot be applied in real-time systems In this study, a novel no-reference image quality assessment method is proposed The introduced method uses a set of novel quality-aware features which globally characterizes the statistics of a given test image, such as extended local fractal dimension distribution feature, extended first digit distribution features using different domains, Bilaplacian features, image moments, and a wide variety of perceptual features Experimental results are demonstrated on five publicly available benchmark image quality assessment databases: CSIQ, MDID, KADID-10k, LIVE In the Wild, and KonIQ-10k

Journal ArticleDOI
TL;DR: An unsupervised feature extraction approach for BIQA based on Karhunen-Loéve transform (KLT), where a normalization operation is firstly applied to the test image by calculating its mean subtracted contrast normalized (MSCN) coefficient, and generalized Gaussian distribution is employed to model the KLT coefficients distribution in different spectral components as quality relevant features.
Abstract: Blind image quality assessment (BIQA) plays an important role in image services as independent of the reference image. Herein, the perceptual relevant feature design is the core of BIQA methods, but their performance is still not satisfied at present. In this work, we propose an unsupervised feature extraction approach for BIQA based on Karhunen-Loeve transform (KLT). Specifically, a normalization operation is firstly applied to the test image by calculating its mean subtracted contrast normalized (MSCN) coefficient. Then, KLT is employed as a data-driven feature extraction approach to extract image structural features, wherein kernels with different sizes are utilized to perform multi-scale analysis. Finally, generalized Gaussian distribution (GGD) is employed to model the KLT coefficients distribution in different spectral components as quality relevant features. Extensive experiments conducted on four widely utilized IQA databases have demonstrated that the proposed Multi-scale KLT (MsKLT) BIQA metric compares favorably with existing BIQA methods in terms of high accordance with human subjective scores on both common and uncommon distortion types.

Proceedings ArticleDOI
19 May 2021
TL;DR: Experimental results show that the proposed framework produces images of superior spatial and spectral resolution compared to the current leading methods, whether model-or DL-based.
Abstract: Hyperspectral (HS) images contain detailed spectral information that has proven crucial in applications like remote sensing, surveillance, and astronomy. However, because of hardware limitations of HS cameras, the captured images have low spatial resolution. To improve them, the low-resolution hyperspectral images are fused with conventional high-resolution RGB images via a technique known as fusion based HS image super-resolution. Currently, the best performance in this task is achieved by deep learning (DL) methods. Such methods, however, cannot guarantee that the input measurements are satisfied in the recovered image, since the learned parameters by the network are applied to every test image. Conversely, model-based algorithms can typically guarantee such measurement consistency. Inspired by these observations, we propose a framework that integrates learning and model based methods. Experimental results show that our method produces images of superior spatial and spectral resolution compared to the current leading methods, whether model- or DL-based.

Posted ContentDOI
18 Feb 2021-bioRxiv
TL;DR: In this article, fixed biological filter banks, in particular banks of Gabor filters, are used to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerant to noise.
Abstract: Deep Convolutional Neural Networks (DNNs) have achieved superhuman accuracy on standard image classification benchmarks. Their success has reignited significant interest in their use as models of the primate visual system, bolstered by claims of their architectural and representational similarities. However, closer scrutiny of these models suggests that they rely on various forms of shortcut learning to achieve their impressive performance, such as using texture rather than shape information. Such superficial solutions to image recognition have been shown to make DNNs brittle in the face of more challenging tests such as noise-perturbed or out-of-domain images, casting doubt on their similarity to their biological counterparts. In the present work, we demonstrate that adding fixed biological filter banks, in particular banks of Gabor filters, helps to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerant to noise. Importantly, they also gained around 20 − 30% improved accuracy when generalising to our novel out-of-domain test image sets over standard end-to-end trained architectures. We take these findings to suggest that these properties of the primate visual system should be incorporated into DNNs to make them more able to cope with real-world vision and better capture some of the more impressive aspects of human visual perception such as generalisation.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of support vector machine (SVM), random forest (RF), and deep neural network (DNN) models on a single-and dual-polarized X-band SAR image.
Abstract: It is well known that the polarization characteristics in X-band synthetic aperture radar (SAR) image analysis can provide us with additional information for marine target classification and detection. Normally, dual-and single-polarized SAR images are acquired by SAR satellites, and then we must determine how accurate the marine mapping performance from dual-polarized (pol) images is versus the marine mapping performance from the single-pol images in a given machine learning model. The purpose of this study is to compare the performance of single- and dual-pol SAR image classification achieved by the support vector machine (SVM), random forest (RF), and deep neural network (DNN) models. The test image is a TerraSAR-X dual-pol image acquired from the 2007 Kerch Strait oil spill event. For this, 824,026 pixels and 1,648,051 pixels were extracted from the image for the training and test, respectively, and sea, ship, oil, and land objects were classified from the image by using the three machine learning methods. The mean f1-scores of the SVM, RF, and DNN models resulting from the single-pol image were approximately 0.822, 0.882, and 0.889, respectively, and those from the dual-pol image were about 0.852, 0.908, and 0.898, respectively. The performance improvement achieved by dual-pol was about 3.6%, 2.9%, and 1% in SVM, RF, and DNN, respectively. The DNN model had the best performance (0.889) in the single-pol test while the RF model was best (0.908) in the dual-pol test. The performance improvement was approximately 2.1% and not noticeable. If the condition that dual-pol images have two-times lower spatial resolution versus single-pol images in the azimuth direction is considered, a small improvement may not be valuable. Therefore, the results show that the performance improvement by X-band dual-pol image may be not remarkable when classifying the sea, ships, oil spills, and sea and land surfaces.

Journal ArticleDOI
TL;DR: In this article, a fast and secure encryption algorithm for medical images based on the 1D logistic map associated with pseudo-random numbers has been proposed, which has been tested for robustness and effectiveness using the standard tests available.
Abstract: A new, fast, and secure encryption algorithm for medical images based on the 1D logistic map associated with pseudo-random numbers has been proposed. Initial values and parameters of the logistic map play an important role (as secret keys) to generate key matrices for shuffling and substituting pixels in the image. The proposed algorithm has been designed to provide the user control over the level of security required by increasing or decreasing the number of rounds of the encryption process. During the encryption process, two pseudo-random rows and two pseudo-random columns have been inserted on each side of the original image to counter chosen and known plain-image attacks. The proposed algorithm has been tested for robustness and effectiveness using the standard tests available. Further, differential and noise attacks have also been analyzed. Cryptanalysis of the proposed algorithm has been performed by testing it against most of the frequently used attacks, such as known and chosen plain-image attacks. The run time for different images has been recorded to check the efficiency of the proposed algorithm. The tests were performed on 50 grayscale and 50 RGB images. The average entropy and NPCR of encrypted images were approximately 7.99 and 99.6%, respectively, for the selected images. Some medical images, such as the human brain, MRI, and lungs, have been selected to demonstrate the output of the proposed algorithm. Similarly, the proposed algorithm has been tested for a standard non-medical test image as well. The obtained results have also been compared with existing competing algorithms. The proposed algorithm can be apt for practical use.

Journal ArticleDOI
TL;DR: A novel distance measurement scheme for NNC and applies it to SSFR, called dissimilarity-based nearest neighbor classifier (DNNC), first segments each image into non-overlapping patches with a given size and then generates an ordered image patch set.
Abstract: In single-sample face recognition (SSFR) tasks, the nearest neighbor classifier (NNC) is the most popular method for its simplicity in implementation. However, in complex situations with light, posture, expression, and obscuration, NNC cannot achieve good recognition performance when applying common distance measurements, such as the Euclidean distance. Thus, this paper proposes a novel distance measurement scheme for NNC and applies it to SSFR. The proposed method, called dissimilarity-based nearest neighbor classifier (DNNC), first segments each (training or test) image into non-overlapping patches with a given size and then generates an ordered image patch set. The dissimilarities between the given test image patch set and the training image patch sets are computed and taken as the distance measurement of NNC. The smaller the dissimilarity of image patch sets is, the closer is the distance from the test image to the training image. Therefore, the category of the test image can be determined according to the smallest dissimilarity. Extensive experiments on the AR face database demonstrate the effectiveness of DNNC, especially for the case of obscuration.

Book ChapterDOI
27 Sep 2021
TL;DR: In this article, the authors proposed a new pruning criterion that allows a fixed network to learn new data domains sequentially over time, without requiring access to their training data, while simultaneously avoiding catastrophic forgetting and maintaining accurate performance.
Abstract: Despite recent advances in deep learning based medical image computing, clinical implementations in patient-care settings have been limited with lack of sufficiently diverse data during training remaining a pivotal impediment to robust real-life model performance. Continual learning (CL) offers a desirable property of deep neural network models (DNNs), namely the ability to continually learn from new data to accumulate knowledge whilst retaining what has been previously learned. In this work we present a simple and effective CL approach for sequential multi-domain learning (MDL) and showcase its utility in the skin lesion image classification task. Specifically, we propose a new pruning criterion that allows for a fixed network to learn new data domains sequentially over time. Our MDL approach incrementally builds on knowledge gained from previously learned domains, without requiring access to their training data, while simultaneously avoiding catastrophic forgetting and maintaining accurate performance on all domain data learned. Our new pruning criterion detects culprit units associated with wrong classification in each domain and releases these units so they are dedicated for subsequent learning on new domains. To reduce the computational cost associated with retraining the network post pruning, we implement MergePrune, which efficiently merges the pruning and training stages into one step. Furthermore, at inference time, instead of using a test-time oracle, we design a smart gate using Siamese networks to assign a test image to the most appropriate domain and its corresponding learned model. We present extensive experiments on 6 skin lesion image databases, representing different domains with varying levels of data bias and class imbalance, including quantitative comparisons against multiple baselines and state-of-the-art methods, which demonstrate superior performance and efficient computations of our proposed method.

Book ChapterDOI
27 Sep 2021
Abstract: We propose a novel unsupervised out-of-distribution detection method for medical images based on implicit fields image representations. In our approach, an auto-decoder feed-forward neural network learns the distribution of healthy images in the form of a mapping between spatial coordinates and probabilities over a proxy for tissue types. At inference time, the learnt distribution is used to retrieve, from a given test image, a restoration, i.e. an image maximally consistent with the input one but belonging to the healthy distribution. Anomalies are localized using the voxel-wise probability predicted by our model for the restored image. We tested our approach in the task of unsupervised localization of gliomas on brain MR images and compared it to several other VAE-based anomaly detection methods. Results show that the proposed technique substantially outperforms them (average DICE 0.640 vs 0.518 for the best performing VAE-based alternative) while also requiring considerably less computing time.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a robust watermarking algorithm for medical images based on Harris-SURF-DCT, which can extract the watermark from the test image without the original image.
Abstract: To protect the patient information in medical images, this article proposes a robust watermarking algorithm for medical images based on Harris-SURF-DCT. First, the corners of the medical image are extracted using the Harris corner detection algorithm, and then, the previously extracted corners are described using the method of describing feature points in the SURF algorithm to generate the feature descriptor matrix. Then, the feature descriptor matrix is processed through the perceptual hash algorithm to obtain the feature vector of the medical image, which is a binary feature vector with a size of 32 bits. Secondly, to enhance the security of the watermark information, the logistic map algorithm is used to encrypt the watermark before embedding the watermark. Finally, with the help of cryptography knowledge, third party, and zero-watermarking technology, the algorithm can embed the watermark without modifying the medical image. When extracting the watermark, the algorithm can extract the watermark from the test image without the original image. In addition, the algorithm has strong robustness to conventional attacks and geometric attacks. Especially under geometric attacks, the algorithm performs better.

Journal ArticleDOI
TL;DR: A method of 2D pose-invariant face recognition that assumes the search database contains only frontal view faces, and performs with acceptable and similar accuracy to conventional methods, while using only frontal faces in the test database.
Abstract: Personal identification systems that use face recognition work well for test images with frontal view face, but often fail when the input face is a pose view. Most face databases come from picture ID sources such as passports or driver’s licenses. In such databases, only the frontal view is available. This paper proposes a method of 2D pose-invariant face recognition that assumes the search database contains only frontal view faces. Given a non-frontal view of a test face, the pose-view angle is first calculated by matching the test image with a database of canonical faces with head rotations to find the best matched image. This database of canonical faces is used only to find the head rotation. The database does not contain images of the test face itself, but has a selection of template faces, each face having rotation images of − 45°, − 30°, − 15°, 0°, 15°, 30°, and 45°. The landmark features in the best matched rotated canonical face such as say rotation 15° and it’s corresponding frontal face of rotation 0° are used to create a warp transformation to convert the 15° rotated test face to a frontal face. This warp will introduce some distortion artifacts since some features of the non-frontal input face are not visible due to self-occlusion. The warped image is, therefore, enhanced by mixing intensities using the left/right facial symmetry assumption. The enhanced synthesized frontal face image is then used to find the best match target in the frontal face database. We test our approach using CMU Multi-PIE database images. Our method performs with acceptable and similar accuracy to conventional methods, while using only frontal faces in the test database.

Journal ArticleDOI
TL;DR: In this article, a block-level perceptual image compression framework is proposed, including a blocklevel just noticeable difference (JND) prediction model and a preprocessing scheme, which is able to achieve 16.75% bit saving as compared to the state-of-the-art method with similar subjective quality.
Abstract: A block-level perceptual image compression framework is proposed in this work, including a block-level just noticeable difference (JND) prediction model and a preprocessing scheme. Specifically speaking, block-level JND values are first deduced by utilizing the OTSU method based on the variation of block-level structural similarity values between two adjacent picture-level JND values in the MCL-JCI dataset. After the JND value for each image block is generated, a convolutional neural network–based prediction model is designed to forecast block-level JND values for a given target image. Then, a preprocessing scheme is devised to modify the discrete cosine transform coefficients during JPEG compression on the basis of the distribution of block-level JND values of the target test image. Finally, the test image is compressed by the max JND value across all of its image blocks in the light of the initial quality factor setting. The experimental results demonstrate that the proposed block-level perceptual image compression method is able to achieve 16.75% bit saving as compared to the state-of-the-art method with similar subjective quality. The project page can be found at https://mic.tongji.edu.cn/43/3f/c9778a148287/page.htm.

Journal ArticleDOI
TL;DR: An approach based on the evaluation of the histogram from a common class of images that is considered as the target, which shows that, at least when the images of the considered datasets are homogeneous enough, it is not really needed to resort to complex-to-implement DL techniques, in order to attain an effective detection of the COVID-19 disease.
Abstract: The global COVID-19 pandemic certainly has posed one of the more difficult challenges for researchers in the current century. The development of an automatic diagnostic tool, able to detect the disease in its early stage, could undoubtedly offer a great advantage to the battle against the pandemic. In this regard, most of the research efforts have been focused on the application of Deep Learning (DL) techniques to chest images, including traditional chest X-rays (CXRs) and Computed Tomography (CT) scans. Although these approaches have demonstrated their effectiveness in detecting the COVID-19 disease, they are of huge computational complexity and require large datasets for training. In addition, there may not exist a large amount of COVID-19 CXRs and CT scans available to researchers. To this end, in this paper, we propose an approach based on the evaluation of the histogram from a common class of images that is considered as the target. A suitable inter-histogram distance measures how this target histogram is far from the histogram evaluated on a test image: if this distance is greater than a threshold, the test image is labeled as anomaly, i.e., the scan belongs to a patient affected by COVID-19 disease. Extensive experimental results and comparisons with some benchmark state-of-the-art methods support the effectiveness of the developed approach, as well as demonstrate that, at least when the images of the considered datasets are homogeneous enough (i.e., a few outliers are present), it is not really needed to resort to complex-to-implement DL techniques, in order to attain an effective detection of the COVID-19 disease. Despite the simplicity of the proposed approach, all the considered metrics (i.e., accuracy, precision, recall, and F-measure) attain a value of 1.0 under the selected datasets, a result comparable to the corresponding state-of-the-art DNN approaches, but with a remarkable computational simplicity.

Proceedings ArticleDOI
12 Apr 2021
TL;DR: In this paper, a combinatorial approach is proposed to generate test images by applying a set of combinations of some basic image transformation operations to a seed image, and then design an input parameter model based on the valid transformations and generate a t-way (t=2) test set.
Abstract: Recent advancements in the field of deep learning have enabled its application in Autonomous Driving Systems (ADS). A Deep Neural Network (DNN) model is often used to perform tasks such as pedestrian detection, object detection, and steering control in ADS. Unfortunately, DNN models could exhibit incorrect or unexpected behavior in real-world scenarios. There is a need to rigorously test these models with real-world driving scenarios so that safety-critical bugs can be detected before their deployment in the real world.In this paper, we propose a combinatorial approach to testing DNN models. Our approach generates test images by applying a set of combinations of some basic image transformation operations to a seed image. First, we identify a set of valid transformation operations or simply transformations. Next, we design an input parameter model based on the valid transformations and generate a t-way (t=2) combinatorial test set. Each test represents a combination of transformations, and can be used to produce a test image. We execute the test images on a DNN model and distinguish between consistent and inconsistent behavior using a relation. We conducted an experimental evaluation of our approach on three DNN models that are used in the Udacity challenge. Our results suggest that test images generated by our approach can effectively identify inconsistent behaviors and can significantly increase neuron coverage. To the best of our knowledge, our work is the first effort to use a combinatorial testing approach to generating test images based on image transformations for testing DNNs used in ADS.

DOI
24 Sep 2021
TL;DR: In this paper, a new decision making model for face recognition from the original image fused with their true and partial diagonal images by integrating the type-2 fuzzy set based approach to mitigate the factors that pretend the face recognition accuracy.
Abstract: This paper projects a new decision making model for face recognition from the original image fused with their true and partial diagonal images by integrating the type-2 fuzzy set based approach to mitigate the factors that pretend the face recognition accuracy. The G2DFLD based feature vectors corresponding to a test image are given input to neural network based classifier trained with the feature vectors of the fused images to generate the merit weights with respect to different classes (subjects) under consideration. A new scheme has been introduced in the present approach to generate a score by employing a fuzzy type-2 set based treatment. These scores with respect to each of the classes under consideration are rendered from the feature vectors of the test image and those of the diagonally fused training samples. For each class, the score is fused weighted by the corresponding merit weights to generate the concluding score. These class-wise concluding scores are deliberated in recognizing the test face image. Faces from the well-known databases (gallery) with varied pose, illumination and occlusion are used to evaluate the performance of the model. It has been found that our model exhibits more accurate classification performance than existing similar kind of image level fusion method.

Journal ArticleDOI
05 Oct 2021
TL;DR: Zhang et al. as mentioned in this paper designed and developed an integrated object recognition and super-resolution framework by proposing an image superresolution technique that improves object recognition accuracy. But in actual object recognition processes, recognition accuracy is often degraded due to resolution mismatches between training and test image data.
Abstract: Object detection and recognition are crucial in the field of computer vision and are an active area of research. However, in actual object recognition processes, recognition accuracy is often degraded due to resolution mismatches between training and test image data. To solve this problem, we designed and developed an integrated object recognition and super-resolution framework by proposing an image super-resolution technique that improves object recognition accuracy. In detail, we collected a number of license plate training images through web-crawling and artificial data generation, and the image super-resolution artificial neural network was trained by defining an objective function to be robust to image flips. To verify the performance of the proposed algorithm, we experimented with the trained image super-resolution and recognition on representative test images and confirmed that the proposed super-resolution technique improves the accuracy of character recognition. For character recognition with the 4× magnification, the proposed method remarkably increased the mean average precision by 49.94% compared to the existing state-of-the-art method.

Journal ArticleDOI
TL;DR: The adversarial haze attack problem is addressed using the dark channel prior (DCP) de-hazing method, and a feature fusion model is proposed to fuse handcrafted features and a pre-trained network model to obtain robust and discriminative features.